Canonical correlation analysis

Canonical correlation analysis

This tutorial will show you how to perform canonical correlation analysis with Praat.

1. Objective of canonical correlation analysis

In canonical correlation analysis we try to find the correlations between two data sets. One data set is called the dependent set, the other the independent set. In Praat these two sets must reside in one TableOfReal object. The lower numbered columns of this table will then be interpreted as the dependent part, the rest of the columns as the independent part. The dimension of (i.e. the number of columns in) the dependent part may not exceed the dimension of the independent part.

As an example, we will use the dataset from Pols et al. (1973) with the frequencies and levels of the first three formants from the 12 Dutch monophthongal vowels as spoken in a /h_t/ context by 50 male speakers. We will try to find the canonical correlation between formant frequencies (the dependent part) and levels (the independent part). The dimension of both groups of variates is 3. In the introduction of the discriminant analysis tutorial you can find how to get these data, how to take the logarithm of the formant frequency values and how to standardize them. The following script summarizes:

   pols50m = Create TableOfReal (Pols 1973): "yes"

   Formula: ~ if col < 4 then log10 (self) else self endif

   Standardize columns

Before we start with the canonical correlation analysis we will first have a look at the Pearson correlations of this table and calculate the Correlation matrix. It is given by:

          F1     F2     F3     L1     L2     L3

   F1   1     -0.338  0.191  0.384 -0.505 -0.014

   F2  -0.338  1      0.190 -0.106  0.526 -0.568

   F3   0.191  0.190  1      0.113 -0.038  0.019

   L1   0.384 -0.106  0.113  1     -0.038  0.085

   L2  -0.505  0.526 -0.038 -0.038  1      0.128

   L3  -0.014 -0.568  0.019  0.085  0.128  1

The following script summarizes:

   selectObject: pols50m

   To Correlation

   Draw as numbers: 1, 0, "decimal", 3

The correlation matrix shows that high correlations exist between some formant frequencies and some levels. For example, the correlation coefficient between F2 and L2 equals 0.526.

In a canonical correlation analysis of the dataset above, we try to find the linear combination u₁ of F₁, F₂ and F₃ that correlates maximally with the linear combination v₁ of L₁, L₂ and L₃. When we have found these u₁ and v₁ we next try to find a new combination u₂ of the formant frequencies and a new combination v₂ of the levels that have maximum correlation. These u₂ and v₂ should be uncorrelated with u₁ and v₁. When we express the above with formulas we have:

u₁ = y₁₁F₁+y₁₂F₂ + y₁₃F₃

v₁ = x₁₁L₁+x₁₂L₂ + x₁₃L₃

ρ(u₁, v₁) = maximum, ρ(u₂, v₂) = submaximum,

ρ(u₂, u₁) = ρ (u₂, v₁) = ρ (v₂, v₁) = ρ (v₂, u₁) = 0,

where the ρ(u_i, v_i) are the correlations between the canonical variates u_i and v_i and the y_ij's and x_ij's are the canonical coefficients for the dependent and the independent variates, respectively.

2. How to perform a canonical correlation analysis

Select the TableOfReal and choose from the dynamic menu the option To CCA.... This command is available in the "Multivariate statistics" action button. We fill out the form and supply 3 for Dimension of dependent variate. The resulting CCA object will bear the same name as the TableOfReal object. The following script summarizes:

   selectObject: pols50m

   cca = To CCA: 3

3. How to get the canonical correlation coefficients

You can get the canonical correlation coefficients by querying the CCA object. You will find that the three canonical correlation coefficients, ρ(u₁, v₁), ρ(u₂, v₂) and ρ(u₃, v₃) are approximately 0.86, 0.53 and 0.07, respectively. The following script summarizes:

   cc1 = Get correlation: 1

   cc2 = Get correlation: 2

   cc3 = Get correlation: 3

   writeInfoLine: "cc1 = ", cc1, ", cc2 = ", cc2, ", cc3 = ", cc3

4. How to obtain canonical scores

Canonical scores, also named canonical variates, are the linear combinations:

u_i = y_i1F₁+y_i2F₂ + y_i3F₃, and,

v_i = x_i1L₁+x_i2L₂ + x_i3L₃,

where the index i runs from 1 to the number of correlation coefficients.

You can get the canonical scores by selecting a CCA object together with the TableOfReal object and choose To TableOfReal (scores)...

When we now calculate the Correlation matrix of these canonical variates we get the following table:

          u1     u2     u3     v1     v2     v3

   u1     1      .      .    0.860    .      .

   u2     .      1      .      .    0.531    .

   u3     .      .      1      .      .    0.070

   v1   0.860    .      .      1      .      .

   v2     .    0.531    .      .      1      .

   v3     .      .    0.070    .      .      1

The scores with a dot are zero to numerical precision. In this table the only correlations that differ from zero are the canonical correlations. The following script summarizes:

   selectObject: cca, pols50m

   To TableOfReal (scores): 3

   To Correlation

   Draw as numbers if: 1, 0, "decimal", 2, ~ abs(self) > 1e-14

5. How to predict one dataset from the other

CCA & TableOfReal: Predict...

Additional information can be found in Weenink (2003).

1. Objective of canonical correlation analysis

2. How to perform a canonical correlation analysis

3. How to get the canonical correlation coefficients

4. How to obtain canonical scores

5. How to predict one dataset from the other

Links to this page