*Example:*

**
/dev/null>
X^2 = ,
DoF = ,
p <=
**

This test checks whether an observed distribution differs from an expected distribution. Although the observations (i.e., the numbers on the first row) are bi- or multi-nomial distributed, it is impractical to calculate the levels of significance directly. Binomial distributions can be approximated by a normal distributions if the expected number of observations is large enough. This is used to calculate the "variance" of the observed distribution. Under

*H0:*

The sample has the expected frequency distribution.

*Assumptions:*

None realy, except that the observations must be independent.

*Scale:*

Nominal

*Procedure:*

Calculate the test parameter *X^2 = Sum over all columns
( Oi - Ei )^2 / Ei* which follows a Chi-square distribution by
approximation with *(I-1)* Degrees of Freedom (with *I*
the number of columns).

Although the above procedure is the one generally found in
text-books, it is not the best one. It ommits the continuity
correction that is needed because a discrete (multinomial)
distribution is approximated with a continuous (X^2) one.
A *better* test parameter is:

*X^2 = Sum over all columns ( |Oi - Ei| - 0.5 )^2 / Ei*

(|a-b| indicates the *absolute value* of the difference).
This is the approach actually used to calculate the *X^2*
value in this example.

*Level of Significance:*

Use a table to look up the level of significance associated with
*X^2* and the *Degrees of Freedom*.

*Approximation:*

If the *Degrees of Freedom* > 30, the distribution of

*z = {(X^2/DoF)^(1/3) - (1 - 2/(9*DoF))}/SQRT(2/(9*DoF))*

can be approximated by a > Standard Normal Distribution.

*Remarks:*

This approach is an approximation, even with the continuity correction.
The Chi-square distribution can only be used if all expected values,
i.e., all *Ei*, are larger than **five**. If this does not hold,
combine the rarer categories with larger ones.

Return to: Statistics