Example:
R = 0.7096, p <= 0.03635 (t = 2.665, DF = 7)
y = 0.7919 * x + 132.4
H0:
The values of the members of the pairs are uncorrelated, i.e., there are no
linear dependencies.
Assumptions:
The values of both members of the pairs are Normal (bivariate) distributed.
Scale:
Interval
Procedure:
The correlation coefficient R of the pairs ( x , y )
is calculated as:
R = { Sum( x * y ) - Sum(x) * Sum(y) /
N } /
sqrt(
{Sum( x**2 ) - Sum( x )**2 / N} *
{Sum( y**2 ) - Sum( y )**2 / N} )
The regression line y = a * x + b is
calculated as:
a = { Sum( x * y ) - Sum(x) * Sum(y) /
N } /
{Sum( x**2 ) - Sum(x)**2 / N}
b = Sum( y )/ N - a * Sum( x ) / N
Level of Significance:
The value of t = R * sqrt( ( N - 2 ) /
( 1 - R**2 ) ) has a
Student-t
distribution with Degrees of Freedom = N - 2.
Approximation:
If the Degrees of Freedom > 30, the distribution of t can
be approximated by a
Standard Normal Distribution.
Remarks:
This could be called the most mis-used of statistical procedures.
It is able to show whether two variables are connected. It is not
able to show that the variables are not connected. If one variable
depends on another, i.e., there is a causal relation, then it is always
possible to find some kind of correlation between the two variables.
However, if both variables depend on a third, they can show a sizable
correlation without any causal dependency between them. A famous example is the
fact that the position of the hands of all clocks are correlated, without one
clock being the cause of the position of the others. Another example is the
significant correlation between human birth rates and stork population sizes.
WARNING: the level of significance given here is only an
approximation, take care when using it! (use a table if necessary)