Hypothesis Testing About The Difference Between Two Proportions
Hypothesis testing about the difference between two proportions is used to test the difference between the proportions of a described attribute found in two random samples.
The null hypothesis is that there is no difference between the population proportions. It means two samples are from the same population.
Hence
H0 : π_{1} = π_{2}
The best estimate of the standard error of the difference of P1 and P2 is given by pooling the samples and finding the pooled sample proportions (P) thus
P = (p_{1}n_{1} + p_{2}n_{2})/ (n_{1} + n_{2})
Standard error of difference between proportions
S(P_{1} - P_{2}) = √{(pq/n_{1}) + √(pq/n_{1})}
And Z = ¦ {(P_{1} - P_{2})/S (P_{1} - P_{2})}¦
Illustration
In a random sample of 100 persons obtained from village A, 60 are found to be consuming tea. In another sample of 200 persons obtained from a village B, 100 persons are found to be consuming tea. Do the data reveal significant difference among the two villages so long as the habit of taking tea is concerned?
Solution
Assume us take the hypothesis that there is no significant difference among the two villages as much as the habit of taking tea is concerned that is: π_{1} = π_{2}
We are given
P_{1} = 0.6; n_{1} = 100
P_{2} = 0.5; n_{2 }= 200
Appropriate statistic to be utilized here is described by:
P = (p_{1}n_{1} + p_{2}n_{2})/ (n_{1} + n_{2})
= {(0.6)(100) + (0.5)(200)}/(100 + 200)
= 0.53
q = 1 - 0.53
= 0.47
S(P_{1} - P_{2}) = √{(pq/n_{1}) + √(pq/n_{1})}
= √{((0.53)(0.47)/100) + ((0.53)(0.53)/200)}
= 0.0608
Z = ¦ {(0.6 - 0.5)/0.0608}¦
= 1.64
Because the computed value of Z is less than the critical value of Z = 1.96 at 5 percent level of significance therefore we accept the hypothesis and conclude that there is no significant difference among in the habit of taking tea in the two villages A and B t-distribution as student's t distribution tests of hypothesis as test for small samples n < 30
For small samples n < 30, the method utilized in hypothesis testing is exactly similar to the one for large samples except that t values are used from t distribution at a specified degree of freedom v, instead of Z score, the standard error Se statistic used is different also.
Note that v = n - 1 for a single sample and n1 + n2 - 2 where two sample are involved.