Pruning and Sorting:
This means we can test where each hypothesis explains as entails a common example that we can associate to a hypothesis a set of positive elements in which it explains and a similar set of negative elements. Moreover there is also a similar analogy with general and specific hypotheses as described above as: whether a hypothesis G is more practical than hypothesis S so then the examples explained by S will be a subset of those explained by G.
In fact we will assume the following generic search strategy for an ILP system as: (i) is a set of current hypotheses is maintained and QH (ii) is at each step in the search, a hypothesis H is taken from QH and some inference rules applied to it in order to generate some new hypotheses that are then added to the set as we say that H has been expanded (iii) is, this continues until a termination criteria is met. However this leaves many questions unanswered. By looking first at the question of that hypothesis to expand at a particular stage, ILP systems associate a label with each hypothesis generated that expresses a probability of the hypothesis holding which is given the background knowledge and examples are true. After then there hypotheses with a higher probability are expanded rather than those with a lower probability and hypotheses with zero probability are pruned from the set QH entirely. However this probability calculation is derived using Bayesian mathematics and we do not go into the derivation here. Moreover we hint at two aspects of the calculation in the paragraphs below.
In just specific to general ILP systems there the inference rules are inductive so each operator takes a hypothesis and generalizes it. However as mentioned above that this means like the hypothesis generated will explain more examples than the original hypothesis. In fact as the search gradually makes hypotheses more generally there will come a stage where a newly formed hypothesis H is common enough to explain a negative example as e- . Thus this should therefore score zero for the probability calculation is just because it cannot possibly hold given the background and examples being true. This means the operators only generalize so there is no way through H can be fixed to not explain e-, so pruning it from QH means the zero probability score is a good decision.