Example:
H0:
All samples have the same frequency distribution.
Assumptions:
None realy, except that the observations must be independent.
Scale:
Nominal
Procedure:
Calculate the expected number of observations, Eij, under H0:
Eij = Ni * Oj / N, in which Oj are the total number of
observations of categories j (j from 1 to J, i.e., the column totals)
and Ni the sizes of samples i (i from 1 to I, i.e., the row
totals).
The test parameter is X^2 = Sum over all cells ( Oij - Eij )^2 / Eij
which follows a Chi-square distribution by approximation with
(J-1)*(I-1) Degrees of Freedom.
Although the above procedure is the one generally found in
text-books, it is not the best one. It ommits the continuity
correction that is needed because a discrete (multinomial)
distribution is approximated with a continuous (X^2) one.
A better test parameter is:
X^2 = Sum over all cells ( |Oij - Eij| - 0.5 )^2 / Eij
(|a-b| indicates the absolute value of the difference).
This is the approach actually used to calculate the X^2
value in this example.
Level of Significance:
Use a table to look up the level of significance associated with
X^2 and the Degrees of Freedom.
Approximation:
If the Degrees of Freedom > 30, the distribution of
z = {(X^2/DoF)^(1/3) - (1 - 2/(9*DoF))}/SQRT(2/(9*DoF))
can be approximated by a > Standard Normal Distribution.
Remarks:
This approach is an approximation, even with the continuity correction.
The Chi-square distribution can only be used if all expected values, i.e.,
all Eij, are larger than five. If this does not hold, combine
the rarer categories with larger ones.