Assumptions in Regression
To understand the properties underlying the regression line, let us go back to the example of model exam and main exam. Now we can find an estimate of a student's main exam points, if we also know his or her points on the model exam. As we have stated, a student with score of 85 in the model exam should receive points for the main exam in the vicinity of 75 to 95.
If we knew the model exam scores of all students along with their main exam scores, we would then have the population of values. The mean and the variance of the population of the model exam would be μ_{x} and σ_{x}^{2} and respectively. The measurements for the main exam points are μy and σ_{y}^{2} .
The assumptions in regression are:
The relationship between the distributions X and Y is linear, which implies the formula E(Y|X=x) = A + Bx at any given value of X = x.
At each X, the distribution of Yx is normal, and the variances σ_{x}^{2} are equal. This implies that E_{x }'s have the same variance, σ^{2}.
The Y-values are independent of each other.
No assumption is made regarding the distribution of X.
Since we do not have all of the students' course points and main exam points we must estimate the regression line E(Y|X = x) = A + BX.
The figure shows a line that has been constructed on the scatter diagram. Note that the line seems to be drawn through the collective mid-point of the plotted points. The term is the estimate of the true mean of Y's at any particular X = x.
Figure 8