Entropy - learning decision trees:
Through putting together a decision of tree is all a matter of choosing that attribute to test at each node in the tree. Further we shall define a measure that is information gain that will be used to decide which attribute to test at each node. By considering through information gain is itself calculated utilising a measure called entropy that we first define for the case of a binary decision problem so then define for the general case.
As per given a binary categorisation like C, and a set of examples, S, that are utilising the proportion of examples categorised as positive by C is p+ or the proportion of examples categorised like negative by C is p-, then the entropy of S is as:
Now next here instantly we defined entropy first for a binary decision problem is easier to get an impression of what it is trying to calculate. As Tom Mitchell puts this quite well: like
"In order to define an information gain precisely so we begin by defining a measure commonly utilising in information theory that is called entropy in which characterizes the (im)purity of as an arbitrary collection of examples."