Canonical correlation analysis

This tutorial will show you how to perform canonical correlation analysis with Praat.

### 1. Objective of canonical correlation analysis

In canonical correlation analysis we try to find the correlations between two data sets. One data set is called the dependent set, the other the independent set. In Praat these two sets must reside in one TableOfReal object. The lower numbered columns of this table will then be interpreted as the dependent part, the rest of the columns as the independent part. The dimension of (i.e. the number of columns in) the dependent part may not exceed the dimension of the independent part.

As an example, we will use the dataset from Pols et al. (1973) with the frequencies and levels of the first three formants from the 12 Dutch monophthongal vowels as spoken in a /h_t/ context by 50 male speakers. We will try to find the canonical correlation between formant frequencies (the dependent part) and levels (the independent part). The dimension of both groups of variates is 3. In the introduction of the discriminant analysis tutorial you can find how to get these data, how to take the logarithm of the formant frequency values and how to standardize them. The following script summarizes:

```    pols50m = Create TableOfReal (Pols 1973): "yes"``` ```    Formula: ~ if col < 4 then log10 (self) else self endif``` ```    Standardize columns```

Before we start with the canonical correlation analysis we will first have a look at the Pearson correlations of this table and calculate the Correlation matrix. It is given by:

```           F1 F2 F3 L1 L2 L3``` ```    F1 1 -0.338 0.191 0.384 -0.505 -0.014``` ```    F2 -0.338 1 0.190 -0.106 0.526 -0.568``` ```    F3 0.191 0.190 1 0.113 -0.038 0.019``` ```    L1 0.384 -0.106 0.113 1 -0.038 0.085``` ```    L2 -0.505 0.526 -0.038 -0.038 1 0.128``` ```    L3 -0.014 -0.568 0.019 0.085 0.128 1```

The following script summarizes:

```    selectObject: pols50m``` ```    To Correlation``` ```    Draw as numbers: 1, 0, "decimal", 3```

The correlation matrix shows that high correlations exist between some formant frequencies and some levels. For example, the correlation coefficient between F2 and L2 equals 0.526.

In a canonical correlation analysis of the dataset above, we try to find the linear combination u1 of F1, F2 and F3 that correlates maximally with the linear combination v1 of L1, L2 and L3. When we have found these u1 and v1 we next try to find a new combination u2 of the formant frequencies and a new combination v2 of the levels that have maximum correlation. These u2 and v2 should be uncorrelated with u1 and v1. When we express the above with formulas we have:

 u1 = y11F1+y12F2 + y13F3
 v1 = x11L1+x12L2 + x13L3
 ρ(u1, v1) = maximum, ρ(u2, v2) = submaximum,
 ρ(u2, u1) = ρ (u2, v1) = ρ (v2, v1) = ρ (v2, u1) = 0,

where the ρ(ui, vi) are the correlations between the canonical variates ui and vi and the yij's and xij's are the canonical coefficients for the dependent and the independent variates, respectively.

### 2. How to perform a canonical correlation analysis

Select the TableOfReal and choose from the dynamic menu the option To CCA.... This command is available in the "Multivariate statistics" action button. We fill out the form and supply 3 for Dimension of dependent variate. The resulting CCA object will bear the same name as the TableOfReal object. The following script summarizes:

```    selectObject: pols50m``` ```    cca = To CCA: 3```

### 3. How to get the canonical correlation coefficients

You can get the canonical correlation coefficients by querying the CCA object. You will find that the three canonical correlation coefficients, ρ(u1, v1), ρ(u2, v2) and ρ(u3, v3) are approximately 0.86, 0.53 and 0.07, respectively. The following script summarizes:

```    cc1 = Get correlation: 1``` ```    cc2 = Get correlation: 2``` ```    cc3 = Get correlation: 3``` ```    writeInfoLine: "cc1 = ", cc1, ", cc2 = ", cc2, ", cc3 = ", cc3```

### 4. How to obtain canonical scores

Canonical scores, also named canonical variates, are the linear combinations:

 ui = yi1F1+yi2F2 + yi3F3, and,
 vi = xi1L1+xi2L2 + xi3L3,

where the index i runs from 1 to the number of correlation coefficients.

You can get the canonical scores by selecting a CCA object together with the TableOfReal object and choose To TableOfReal (scores)...

When we now calculate the Correlation matrix of these canonical variates we get the following table:

```           u1 u2 u3 v1 v2 v3``` ```    u1 1 . . 0.860 . .``` ```    u2 . 1 . . 0.531 .``` ```    u3 . . 1 . . 0.070``` ```    v1 0.860 . . 1 . .``` ```    v2 . 0.531 . . 1 .``` ```    v3 . . 0.070 . . 1```

The scores with a dot are zero to numerical precision. In this table the only correlations that differ from zero are the canonical correlations. The following script summarizes:

```    selectObject: cca, pols50m``` ```    To TableOfReal (scores): 3``` ```    To Correlation``` ```    Draw as numbers if: 1, 0, "decimal", 2, ~ abs(self) > 1e-14```

### 5. How to predict one dataset from the other

Additional information can be found in Weenink (2003).