Return to Statistics

Chi-square test for known distributions

Example:
/dev/null> X^2 = , DoF = , p <=

Categories -->


The first row contains the observed frequencies, the second row the expected frequencies, all other rows are ignored. Row and column should be named, and their names should start with an alphabetic character. Do not enter total scores

Characteristics:
This test checks whether an observed distribution differs from an expected distribution. Although the observations (i.e., the numbers on the first row) are bi- or multi-nomial distributed, it is impractical to calculate the levels of significance directly. Binomial distributions can be approximated by a normal distributions if the expected number of observations is large enough. This is used to calculate the "variance" of the observed distribution. Under H0 this "variance" has a Chi-square distribution.

H0:
The sample has the expected frequency distribution.

Assumptions:
None realy, except that the observations must be independent.

Scale:
Nominal

Procedure:
Calculate the test parameter X^2 = Sum over all columns ( Oi - Ei )^2 / Ei which follows a Chi-square distribution by approximation with (I-1) Degrees of Freedom (with I the number of columns).
Although the above procedure is the one generally found in text-books, it is not the best one. It ommits the continuity correction that is needed because a discrete (multinomial) distribution is approximated with a continuous (X^2) one. A better test parameter is:
X^2 = Sum over all columns ( |Oi - Ei| - 0.5 )^2 / Ei
(|a-b| indicates the absolute value of the difference). This is the approach actually used to calculate the X^2 value in this example.

Level of Significance:
Use a table to look up the level of significance associated with X^2 and the Degrees of Freedom.

Approximation:
If the Degrees of Freedom > 30, the distribution of

z = {(X^2/DoF)^(1/3) - (1 - 2/(9*DoF))}/SQRT(2/(9*DoF))

can be approximated by a > Standard Normal Distribution.

Remarks:
This approach is an approximation, even with the continuity correction. The Chi-square distribution can only be used if all expected values, i.e., all Ei, are larger than five. If this does not hold, combine the rarer categories with larger ones.


Return to: Statistics