Information Gain:
Now next here instantly return to the problem of trying to determine the best attribute to choose for a particular node in a tree. As in the following measure calculates a numerical value for a given attribute and A and with respect to a set of examples and S. However notice that there the values of attribute A will range over a set of possibilities that we call Values(A), so that for a particular value from that set, v, we write S_{v} for the set of examples that have value v for attribute A.
Moreover the information gain of attribute A, that is relative to a collection of examples, S, is calculated as:
Thus the information gain of an attribute can be seen as the expected reduction in entropy caused through knowing the value of attribute A