*Example:*

**
/dev/null>
X^2 = ,
DoF = ,
p <=
**

This is the most widely used test on nominal data. Although the observations (i.e., the numbers) are bi- or multi-nomial distributed, it is impractical to calculate the levels of significance directly. Binomial distributions can be approximated by a normal distributions if the expected number of observations is large enough. This is used to calculate the "variance" of the observed distribution. Under

*H0:*

All samples have the same frequency distribution.

*Assumptions:*

None realy, except that the observations must be independent.

*Scale:*

Nominal

*Procedure:*

Calculate the expected number of observations, *Eij*, under *H0*:
*Eij = Ni * Oj / N*, in which *Oj* are the total number of
observations of categories *j* (j from 1 to J, i.e., the column totals)
and *Ni* the sizes of samples *i* (i from 1 to I, i.e., the row
totals).

The test parameter is *X^2 = Sum over all cells ( Oij - Eij )^2 / Eij*
which follows a Chi-square distribution by approximation with
*(J-1)*(I-1)* Degrees of Freedom.

Although the above procedure is the one generally found in
text-books, it is not the best one. It ommits the continuity
correction that is needed because a discrete (multinomial)
distribution is approximated with a continuous (X^2) one.
A *better* test parameter is:

*X^2 = Sum over all cells ( |Oij - Eij| - 0.5 )^2 / Eij*

(|a-b| indicates the *absolute value* of the difference).
This is the approach actually used to calculate the *X^2*
value in this example.

*Level of Significance:*

Use a table to look up the level of significance associated with
*X^2* and the *Degrees of Freedom*.

*Approximation:*

If the *Degrees of Freedom* > 30, the distribution of

*z = {(X^2/DoF)^(1/3) - (1 - 2/(9*DoF))}/SQRT(2/(9*DoF))*

can be approximated by a > Standard Normal Distribution.

*Remarks:*

This approach is an approximation, even with the continuity correction.
The Chi-square distribution can only be used if all expected values, i.e.,
all *Eij*, are larger than **five**. If this does not hold, combine
the rarer categories with larger ones.

Return to: Statistics