# Chi-square test for equality of distributions

Example:
/dev/null> X^2 = , DoF = , p <=

Categories -->

Samples
Row and column names should start with an alphabetic character. Do not enter total scores

Characteristics:
This is the most widely used test on nominal data. Although the observations (i.e., the numbers) are bi- or multi-nomial distributed, it is impractical to calculate the levels of significance directly. Binomial distributions can be approximated by a normal distributions if the expected number of observations is large enough. This is used to calculate the "variance" of the observed distribution. Under H0 this "variance" has a Chi-square distribution.

H0:
All samples have the same frequency distribution.

Assumptions:
None realy, except that the observations must be independent.

Scale:
Nominal

Procedure:
Calculate the expected number of observations, Eij, under H0: Eij = Ni * Oj / N, in which Oj are the total number of observations of categories j (j from 1 to J, i.e., the column totals) and Ni the sizes of samples i (i from 1 to I, i.e., the row totals).
The test parameter is X^2 = Sum over all cells ( Oij - Eij )^2 / Eij which follows a Chi-square distribution by approximation with (J-1)*(I-1) Degrees of Freedom.
Although the above procedure is the one generally found in text-books, it is not the best one. It ommits the continuity correction that is needed because a discrete (multinomial) distribution is approximated with a continuous (X^2) one. A better test parameter is:
X^2 = Sum over all cells ( |Oij - Eij| - 0.5 )^2 / Eij
(|a-b| indicates the absolute value of the difference). This is the approach actually used to calculate the X^2 value in this example.

Level of Significance:
Use a table to look up the level of significance associated with X^2 and the Degrees of Freedom.

Approximation:
If the Degrees of Freedom > 30, the distribution of

z = {(X^2/DoF)^(1/3) - (1 - 2/(9*DoF))}/SQRT(2/(9*DoF))

can be approximated by a > Standard Normal Distribution.

Remarks:
This approach is an approximation, even with the continuity correction. The Chi-square distribution can only be used if all expected values, i.e., all Eij, are larger than five. If this does not hold, combine the rarer categories with larger ones.