Return to Statistics

The Binomial distribution

Example:


p >
x >
N >


Characteristics:
This is not a test, but a distribution. The above probabilities are one-sided probabilities. The binomial distribution is most popular when working with nominal data. When there are two observations possible, with and without a certain characteristic, each with a probability of 1/2, the two-sided variant of this "test" is identical to the two-sided Sign-test.

H0:
The probability of observing a certain characteristic is equal to p.

Assumptions:
The observations are binomial distributed.

Scale:
Nominal

Procedure:
Count the number of observations, x, from in the sample with a total size N.

Level of Significance:
The one-tailed level of significance is calculated as (if x < p*N):
p <= Sum (i=0 to x) {N!/(i!*(N-i)!)*p**i*(1-p)**(N-i)}
(with k! = k*(k-1)*(k-2)*...*1 is the factorial of k and 0! = 1)
If x > p*N, sum from x to N.

Approximation:
If N*p > 5 and N*(1-p) > 5, the distribution of:
Z = ( | x - N*p | - 0.5)/sqrt( N * p * (1-p) )
can be approximated with a Standard Normal distribution.

Remarks:
As p is only rarely known, this test is of limited use only. However, there is one application that can be very handy. Assume that a number of tests are applied to a group of data-sets that for some reason cannot be pooled readily (e.g., vowel formant measurements from a limited number of speakers), and each test individualy only reaches, e.g., a significance level of p <= 0.1, which is not convincing. If H0 is true, a "positive" test result at p <= 0.1 is expected to be observed with p <= 0.1. It can now be tested whether the number of positive test results is large enough to reject H0 at a significance level <= 0.05 or smaller.
This procedure has only an exploratory value. If it shows that H0 must be rejected, there should be a better test that will prove it on the data-sets themselves. Furthermore, this kind of statistics is not publishable.

In this example we calculate the exact probabilities upto N = 100.
Note that we give the one-tailed levels of significance.


Return to: Statistics