In the figure above the vertical deviations of the individual points from the line are shown as the short vertical lines joining the points to the least squares line. These deviations will be denoted by the symbol the value of e varies from one point to another. In some cases it is positive in others it is negative. If the line drawn happens to be the least squares line then the value of ∑e2 is the least possible. It is because of this feature the methods is known as least squares methods.
Why we insist on minimizing the sum of squared deviations is a question that needs explanation. It we denote the deviation from the actual value y to estimated value (Y - Y )or e it is logical that we want the ∑ ( Y-Y) or ∑ e to be as small as possible however mere examining ∑ ( Y-Y) or ∑ e is inappropriate since any e can be positive or negative and large positive value and large negative values could cancel one another out. But large values of e regardless of their sign indicate a poor prediction. Even if we ignore the sings while working out ∑e the difficulties may continue to be there . hence the standard procedure is to eliminate the effect of sings by squaring each observation. Squaring each term accomplishes two purposes viz.