- You worked with
`lm`

,`glm`

,`lmer`

, or`glmer`

. - You used a formula with at least two fixed factors (between- or within-subjects doesn’t matter).
- You were interested in the significance of at least two degrees of interaction of those factors, e.g. in the case of two factors you were interested both in the main effect(s) and in the interaction effect.

(result: 12 out of 48 people)

`contrasts`

command?(result: three or four)

A test especially for those who worked with this before.

```
score <- c(34, 45, 33, 54, 45, 23, 66, 35, 24, 50,
67, 50, 64, 43, 47, 50, 68, 56, 60, 39)
region <- c("D", "D", "D", "D", "D", "D", "D", "D", "D", "D",
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F")
geslacht = c("M", "V", "M", "M", "V", "M", "V", "V", "M", "V",
"M", "M", "M", "M", "M", "V", "V", "V", "V", "V")
table <- data.frame (region, geslacht, score)
table
```

```
## region geslacht score
## 1 D M 34
## 2 D V 45
## 3 D M 33
## 4 D M 54
## 5 D V 45
## 6 D M 23
## 7 D V 66
## 8 D V 35
## 9 D M 24
## 10 D V 50
## 11 F M 67
## 12 F M 50
## 13 F M 64
## 14 F M 43
## 15 F M 47
## 16 F V 50
## 17 F V 68
## 18 F V 56
## 19 F V 60
## 20 F V 39
```

(geslacht means gender)

We run a linear model:

```
model <- lm (score ~ region * geslacht, table)
summary (model)
```

```
##
## Call:
## lm(formula = score ~ region * geslacht, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.6 -7.8 -1.9 6.5 20.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.60 5.08 6.61 6e-06 ***
## regionF 20.60 7.19 2.87 0.011 *
## geslachtV 14.60 7.19 2.03 0.059 .
## regionF:geslachtV -14.20 10.16 -1.40 0.181
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.4 on 16 degrees of freedom
## Multiple R-squared: 0.411, Adjusted R-squared: 0.301
## F-statistic: 3.73 on 3 and 16 DF, p-value: 0.0331
```

So far, so good. We now decide to code gender not in Dutch but in English, i.e. by changing “V” to “F”.

- nothing
- the sign of the estimate for gender
- the sign of the estimates for gender and the interaction
- b plus the
*p*-value of the intercept - c plus the
*p*-value of the intercept - b plus more than one
*p*-value - c plus more than one
*p*-value

(result: 4 out of 48 people answered g, and 1 of the 12 “experts” answered g)

If you answered “yes” to the first question and either “no” to the second question or something different from “g” to the last question, you may be in trouble.

Many experimental designs can be captured in a linear model. Consider the Dutch–Flemish case again:

```
dutch = c(34, 45, 33, 54, 45, 23, 66, 35, 24, 50)
flemish = c(67, 50, 64, 43, 47, 50, 68, 56, 60, 39)
t.test (dutch, flemish, var.equal = TRUE)
```

```
##
## Two Sample t-test
##
## data: dutch and flemish
## t = -2.512, df = 18, p-value = 0.02176
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -24.791 -2.209
## sample estimates:
## mean of x mean of y
## 40.9 54.4
```

We can achieve the same result by regarding this design as a linear model with one independent variable (factor) and one dependent variable.

- the German-language proficiency score
- the region
- don’t know

The score is the dependent variable, and region is the factor. We will now put these data into a table.

- 10
- 20
- don’t know

The data are not paired. Dutch subject 3 has nothing to do with Flemish subject 3. In fact, the numbers of subjects didn’t have to be exactly the same in both groups. The *t*-test will work fine with 9 Dutch and 10 Flemish subjects. The data therefore doesn’t go in separate columns, and the table will have 20 rows:

```
score <- c(dutch, flemish)
region <- c(rep ("D", 10), rep ("F", 10))
table <- data.frame (region, score)
table
```

```
## region score
## 1 D 34
## 2 D 45
## 3 D 33
## 4 D 54
## 5 D 45
## 6 D 23
## 7 D 66
## 8 D 35
## 9 D 24
## 10 D 50
## 11 F 67
## 12 F 50
## 13 F 64
## 14 F 43
## 15 F 47
## 16 F 50
## 17 F 68
## 18 F 56
## 19 F 60
## 20 F 39
```

We can now run the command `lm`

, which means ‘linear model’:

```
model <- lm (score ~ region, table)
summary (model)
```

```
##
## Call:
## lm(formula = score ~ region, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.90 -7.53 -1.40 9.22 25.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 40.90 3.80 10.76 2.8e-09 ***
## regionF 13.50 5.37 2.51 0.022 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12 on 18 degrees of freedom
## Multiple R-squared: 0.26, Adjusted R-squared: 0.218
## F-statistic: 6.31 on 1 and 18 DF, p-value: 0.0218
```

The resulting model makes the following approximation:

\[ score \approxeq 40.90 + 13.50 \times region \]

where *region* is 0 for the Dutch and 1 for the Flemish.

The *p*-value of 0.022 for “regionF” is the same as the *p*-value of the *t*-test, which is good. The *t*-value is also the same as the -2.512 seen earlier, although the sign has changed. Whereas the *t*-test computed the *t* for the Dutch minus the Flemish, the *t*-value here is computed on the basis of going from the reference category (regionD) to the other category (regioF), which is up in score.

The intercept is 40.90, which is the estimate for the score of the reference category, “regionD”, i.e. Dutch (*region* = 0); this value of 40.90 is significantly greater than 0 (*p* = 2.8·10^{-9}). The coefficient for “regionF” is 13.50, because the estimated mean score goes up by 13.50 if you go from Dutch to Flemish (it becomes 54.40).

To summarize the summary, you can use the `anova`

command:

`anova (model)`

```
## Analysis of Variance Table
##
## Response: score
## Df Sum Sq Mean Sq F value Pr(>F)
## region 1 911 911 6.31 0.022 *
## Residuals 18 2599 144
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

This yields the same *p*-value as we saw in the summary of the model. You report this as:

The effect of region is significant (

F[1,18] = 6.31,p= 0.022), indicating that Flemish people are better at our German-language proficieny task than Dutch people.

You may think now that everything is under control, but there is a worrisome detail. Consider the summary again:

`summary (model)`

```
##
## Call:
## lm(formula = score ~ region, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.90 -7.53 -1.40 9.22 25.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 40.90 3.80 10.76 2.8e-09 ***
## regionF 13.50 5.37 2.51 0.022 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12 on 18 degrees of freedom
## Multiple R-squared: 0.26, Adjusted R-squared: 0.218
## F-statistic: 6.31 on 1 and 18 DF, p-value: 0.0218
```

- It comes before “F” in the alphabet.
- It comes earlier in the table than “F”.
- R chooses randomly.
- Don’t know.

By default (i.e. if you don’t specify otherwise), linear model computation in R will use alphabetical order to interpret the levels of a factor: for the factor “region”, the level “D” will be equated to 0 in the computations, and the level “F” will be equated to 1. That is, you would have obtained the exact same results if you had put “0” in the table instead of “D”, and “1” instead of “F”.

How bad is this dependence on the alphabet? We’ll investigate.

- nothing
- the estimate of the intercept
- b plus the sign of the estimate for the influence of the factor region
- c plus the
*p*-value of the intercept - don’t know

To figure this out, we add an additional factor called “region2”:

```
table $ region2 <- ifelse (table $ region == "D", "N", "F")
table
```

```
## region score region2
## 1 D 34 N
## 2 D 45 N
## 3 D 33 N
## 4 D 54 N
## 5 D 45 N
## 6 D 23 N
## 7 D 66 N
## 8 D 35 N
## 9 D 24 N
## 10 D 50 N
## 11 F 67 F
## 12 F 50 F
## 13 F 64 F
## 14 F 43 F
## 15 F 47 F
## 16 F 50 F
## 17 F 68 F
## 18 F 56 F
## 19 F 60 F
## 20 F 39 F
```

We can now compare the two summaries:

`summary (lm (score ~ region, table))`

```
##
## Call:
## lm(formula = score ~ region, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.90 -7.53 -1.40 9.22 25.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 40.90 3.80 10.76 2.8e-09 ***
## regionF 13.50 5.37 2.51 0.022 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12 on 18 degrees of freedom
## Multiple R-squared: 0.26, Adjusted R-squared: 0.218
## F-statistic: 6.31 on 1 and 18 DF, p-value: 0.0218
```

`summary (lm (score ~ region2, table))`

```
##
## Call:
## lm(formula = score ~ region2, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.90 -7.53 -1.40 9.22 25.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 54.40 3.80 14.32 2.8e-11 ***
## region2N -13.50 5.37 -2.51 0.022 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12 on 18 degrees of freedom
## Multiple R-squared: 0.26, Adjusted R-squared: 0.218
## F-statistic: 6.31 on 1 and 18 DF, p-value: 0.0218
```

As you can see from the word “region2N”, the reference category (“level”) for the region has changed to “F” (Flanders), because that comes before “N” (Netherlands) in the alphabet. The intercept has changed to 54.40, which is the observed mean of the Flemish (i.e. when the value of region2 is 0). Its *p*-value (comparison to 0) has become much lower, because the Flemish value is much farther from 0 then the Dutch value of 40.90 was. The sign of the estimate of the factor “region2” is negative, because you go down in score as you go from Flanders to the Netherlands. Fortunately, the *p*-value has not changed.

Something feels silly, though. Why isn’t the intercept the estimate of the mean of the Dutch and Flemish populations together? Well, we can accomplish that, namely by specifying explicit “contrasts”:

- In our first try, Dutch (“D”) was coded as 0 and Flemish (“F”) was coded as 1, making use of R’s default strategy of using alphabetical order.
- In our second try, the Netherlands (“N”) was coded as 1 and Flanders (“F”) as 0, again making use of R’s alphabetical default.
- In our third try, we will be coding Dutch as -0.5 and Flemish as +0.5.

```
table $ regionCentered <- table $ region
contrasts (table $ regionCentered) <- c(-0.5, +0.5)
table
```

```
## region score region2 regionCentered
## 1 D 34 N D
## 2 D 45 N D
## 3 D 33 N D
## 4 D 54 N D
## 5 D 45 N D
## 6 D 23 N D
## 7 D 66 N D
## 8 D 35 N D
## 9 D 24 N D
## 10 D 50 N D
## 11 F 67 F F
## 12 F 50 F F
## 13 F 64 F F
## 14 F 43 F F
## 15 F 47 F F
## 16 F 50 F F
## 17 F 68 F F
## 18 F 56 F F
## 19 F 60 F F
## 20 F 39 F F
```

You don’t see the difference between columns “region” and “regionCentered” in the table view, but you can see it by asking:

`table $ region`

```
## [1] D D D D D D D D D D F F F F F F F F F F
## Levels: D F
```

`table $ regionCentered`

```
## [1] D D D D D D D D D D F F F F F F F F F F
## attr(,"contrasts")
## [,1]
## D -0.5
## F 0.5
## Levels: D F
```

By the way, R assigned the first number (-0.5) to the level that comes first in the alphabet (“D”)…

Now compare the three summaries:

`summary (lm (score ~ region, table))`

```
##
## Call:
## lm(formula = score ~ region, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.90 -7.53 -1.40 9.22 25.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 40.90 3.80 10.76 2.8e-09 ***
## regionF 13.50 5.37 2.51 0.022 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12 on 18 degrees of freedom
## Multiple R-squared: 0.26, Adjusted R-squared: 0.218
## F-statistic: 6.31 on 1 and 18 DF, p-value: 0.0218
```

`summary (lm (score ~ regionCentered, table))`

```
##
## Call:
## lm(formula = score ~ regionCentered, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.90 -7.53 -1.40 9.22 25.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.65 2.69 17.73 7.6e-13 ***
## regionCentered1 13.50 5.37 2.51 0.022 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12 on 18 degrees of freedom
## Multiple R-squared: 0.26, Adjusted R-squared: 0.218
## F-statistic: 6.31 on 1 and 18 DF, p-value: 0.0218
```

`summary (lm (score ~ region2, table))`

```
##
## Call:
## lm(formula = score ~ region2, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.90 -7.53 -1.40 9.22 25.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 54.40 3.80 14.32 2.8e-11 ***
## region2N -13.50 5.37 -2.51 0.022 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12 on 18 degrees of freedom
## Multiple R-squared: 0.26, Adjusted R-squared: 0.218
## F-statistic: 6.31 on 1 and 18 DF, p-value: 0.0218
```

The sign of the estimate for “regionCentered” is positive, because we are going from Dutch to Flemish when our number goes up (from -0.5 to +0.5). The *p*-value is, reassuringly, again 0.022.

The intercept has now becomes 47.65, i.e. the mean of the two estimated means. The *p*-value is again lower.

- This mean is even farther from zero.
- This
*p*-value is based on more participants. - Don’t know.

The *p*-value is better because it is now based on 20 instead of 10 participants.

Short form: center your binary variables!

Long form:

Although it does not seem very important, because you are often not interested in the intercept, please make a habit of always specifying explicit contrasts, so as not to have to rely on R’s silly alphabetic default strategy. If you do want to use Dutch as the reference, use `c(1, 0)`

as the contrast, and if you want to use Flemish as the reference, use `c(0, 1)`

as the contrast; these are “treatment contrasts”, because in medical experiments you want “no treatment” to be the reference. usually, However, you will want symmetric contrasts whose elements sum to zero, such as c(-0.5, 0.5), because that gives you some additional information (in the one-factor case):

- the estimated intercept will be the estimated mean of means
- the
*p*-value, based on all data, will tell you whether that estimated mean of means differs reliably from zero

Now, finally, everything is under control. Even the analysis of variance behaves well:

`anova (lm (score ~ regionCentered, table))`

```
## Analysis of Variance Table
##
## Response: score
## Df Sum Sq Mean Sq F value Pr(>F)
## regionCentered 1 911 911 6.31 0.022 *
## Residuals 18 2599 144
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Same result again. As I said, everything is under control, at least for the one-factor case.

Many interesting datasets involve more than one factor. Let’s add gender (“geslacht”) in:

```
table $ geslacht = c("M", "V", "M", "M", "V", "M", "V", "V", "M", "V",
"M", "M", "M", "M", "M", "V", "V", "V", "V", "V")
table
```

```
## region score region2 regionCentered geslacht
## 1 D 34 N D M
## 2 D 45 N D V
## 3 D 33 N D M
## 4 D 54 N D M
## 5 D 45 N D V
## 6 D 23 N D M
## 7 D 66 N D V
## 8 D 35 N D V
## 9 D 24 N D M
## 10 D 50 N D V
## 11 F 67 F F M
## 12 F 50 F F M
## 13 F 64 F F M
## 14 F 43 F F M
## 15 F 47 F F M
## 16 F 50 F F V
## 17 F 68 F F V
## 18 F 56 F F V
## 19 F 60 F F V
## 20 F 39 F F V
```

```
model <- lm (score ~ region * geslacht, table)
summary (model)
```

```
##
## Call:
## lm(formula = score ~ region * geslacht, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.6 -7.8 -1.9 6.5 20.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.60 5.08 6.61 6e-06 ***
## regionF 20.60 7.19 2.87 0.011 *
## geslachtV 14.60 7.19 2.03 0.059 .
## regionF:geslachtV -14.20 10.16 -1.40 0.181
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.4 on 16 degrees of freedom
## Multiple R-squared: 0.411, Adjusted R-squared: 0.301
## F-statistic: 3.73 on 3 and 16 DF, p-value: 0.0331
```

The resulting model makes the following approximation:

\[ score \approxeq 33.60 + 20.60 \times region + 14.60 \times geslacht - 14.20 \times region \times geslacht \]

where *region* is 0 for the Dutch and 1 for the Flemish, and *geslacht* is 0 for males and 1 for females.

The estimated intercept of 33.60 is now the estimate for the mean score of the male Dutch population. The positive value for “regionF” means that your score goes up by 20.60 is you’re Flemish (*p* = 0.011) and by 14.60 if you’re a woman (almost significant: *p* = 0.059). The interaction estimate of -14.20 means that those benefits don’t add up if you are both Flemish (region = 1) and a woman (geslacht = 1): you have to subtract 14.20, although this number doesn’t differ significantly from zero (*p* = 0.181). That tells us that the four groups have the following estimated means (please also try substituting 0’s and 1’s in the above equation):

- male Dutch: 33.60
- male Flemish: 33.60 + 20.60 = 54.20
- female Dutch: 33.60 + 14.60 = 48.20
- female Flemish: 33.60 + 20.60 + 14.60 - 14.20 = 54.60

If the interaction had come out significant, we would have been able to conclude that the female advantage is greater for Dutch than for Flemish people, or, equivalently, that the Flemish advantage is greater for males than for females. Or perhaps we can report something that suggests to a lesser extent that there are indeed advantages everywhere (such as a female advantage for the Flemish), for instance “the difference score of females minus males is greater for Dutch people than for Flemish people.”

These averages can be checked with the `aggregate`

command:

`aggregate (score ~ region + geslacht, data = table, FUN = mean)`

```
## region geslacht score
## 1 D M 33.6
## 2 F M 54.2
## 3 D V 48.2
## 4 F V 54.6
```

So this table can be understood. As in the one-factor case, the analysis of variance yields a summary of the summary, but…

`anova (model)`

```
## Analysis of Variance Table
##
## Response: score
## Df Sum Sq Mean Sq F value Pr(>F)
## region 1 911 911 7.06 0.017 *
## geslacht 1 281 281 2.18 0.159
## region:geslacht 1 252 252 1.95 0.181
## Residuals 16 2066 129
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The *p*-values are different. Suddenly, gender is no longer nearly significant. That’s a problem that didn’t seem to occur in the one-factor case.

Then, let’s translate the letters in the coding of gender to English:

```
table $ gender = factor (ifelse (table $ geslacht == "V", "F", "M"))
table
```

```
## region score region2 regionCentered geslacht gender
## 1 D 34 N D M M
## 2 D 45 N D V F
## 3 D 33 N D M M
## 4 D 54 N D M M
## 5 D 45 N D V F
## 6 D 23 N D M M
## 7 D 66 N D V F
## 8 D 35 N D V F
## 9 D 24 N D M M
## 10 D 50 N D V F
## 11 F 67 F F M M
## 12 F 50 F F M M
## 13 F 64 F F M M
## 14 F 43 F F M M
## 15 F 47 F F M M
## 16 F 50 F F V F
## 17 F 68 F F V F
## 18 F 56 F F V F
## 19 F 60 F F V F
## 20 F 39 F F V F
```

- nothing
- the sign of the estimate for gender
- the sign of the estimates for gender and the interaction
- b plus the
*p*-value of the intercept - c plus the
*p*-value of the intercept - b plus more than one
*p*-value - c plus more than one
*p*-value

Let’s investigate the influence of this trivial change:

`summary (lm (score ~ region * geslacht, table))`

```
##
## Call:
## lm(formula = score ~ region * geslacht, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.6 -7.8 -1.9 6.5 20.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.60 5.08 6.61 6e-06 ***
## regionF 20.60 7.19 2.87 0.011 *
## geslachtV 14.60 7.19 2.03 0.059 .
## regionF:geslachtV -14.20 10.16 -1.40 0.181
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.4 on 16 degrees of freedom
## Multiple R-squared: 0.411, Adjusted R-squared: 0.301
## F-statistic: 3.73 on 3 and 16 DF, p-value: 0.0331
```

`summary (lm (score ~ region * gender, table))`

```
##
## Call:
## lm(formula = score ~ region * gender, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.6 -7.8 -1.9 6.5 20.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.20 5.08 9.48 5.7e-08 ***
## regionF 6.40 7.19 0.89 0.386
## genderM -14.60 7.19 -2.03 0.059 .
## regionF:genderM 14.20 10.16 1.40 0.181
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.4 on 16 degrees of freedom
## Multiple R-squared: 0.411, Adjusted R-squared: 0.301
## F-statistic: 3.73 on 3 and 16 DF, p-value: 0.0331
```

Apart from the estimate and *p*-value of the intercept, three things have changed:

- the sign of the estimate of the influence of gender
- the sign of the estimated interaction of gender and region
- the
*p*-value of the influence of region!!!

This last change occurs because in the earlier test the inluence of region was evaluated only for the males (the reference category “M”), and now it is evaluated only for the females (the reference category “V”). Sort of. This is analogous to what happened to the intercept in the one-factor case. This change of *p*-values in the main effects only happens if the interaction effect is present in the model (that’s why I said “sort of”).

All of you who have been throwing two factors into a linear model without setting explicit contrasts now have to think OOPS and tell their advisors that their analyses have to be redone. The goes not only for `lm`

, but for `glm`

, `lmer`

and `glmer`

as well! Judging from lots of presentations on research where these commands have been used, the alphabetical order is the researcher’s default!

- the V/M values
- the F/M values
- neither

None of those *p*-values are correct if you just want to evaluate the difference between the regions or between the genders. The solution should be clear now:

```
table $ genderCentered <- table $ gender
contrasts (table $ genderCentered) <- c(-0.5, 0.5)
model = lm (score ~ regionCentered * genderCentered, table)
summary (model)
```

```
##
## Call:
## lm(formula = score ~ regionCentered * genderCentered, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.6 -7.8 -1.9 6.5 20.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.65 2.54 18.75 2.6e-12 ***
## regionCentered1 13.50 5.08 2.66 0.017 *
## genderCentered1 -7.50 5.08 -1.48 0.159
## regionCentered1:genderCentered1 14.20 10.16 1.40 0.181
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.4 on 16 degrees of freedom
## Multiple R-squared: 0.411, Adjusted R-squared: 0.301
## F-statistic: 3.73 on 3 and 16 DF, p-value: 0.0331
```

The *p*-values are the same as they were above in the analysis of variance, which we can also try now:

`anova (model)`

```
## Analysis of Variance Table
##
## Response: score
## Df Sum Sq Mean Sq F value Pr(>F)
## regionCentered 1 911 911 7.06 0.017 *
## genderCentered 1 281 281 2.18 0.159
## regionCentered:genderCentered 1 252 252 1.95 0.181
## Residuals 16 2066 129
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Would the practical advice be to always use `anova`

, or to always use explicit symmetric contrasts? The use of an `anova`

that gives you all the effects is not always possible, especially if you want to add and/or drop single terms in an exploration of a big model.

Besides, `anova`

does *not* always give you the same values as `summary`

. We can see that after removing one Flemish female from the data:

```
table = table [1 : 19, ]
model = lm (score ~ regionCentered * genderCentered, table)
summary (model)
```

```
##
## Call:
## lm(formula = score ~ regionCentered * genderCentered, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.20 -7.85 -2.50 5.65 20.40
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.62 2.50 19.47 4.7e-12 ***
## regionCentered1 15.45 5.00 3.09 0.0074 **
## genderCentered1 -9.45 5.00 -1.89 0.0780 .
## regionCentered1:genderCentered1 10.30 9.99 1.03 0.3189
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.8 on 15 degrees of freedom
## Multiple R-squared: 0.487, Adjusted R-squared: 0.384
## F-statistic: 4.74 on 3 and 15 DF, p-value: 0.0161
```

`anova (model)`

```
## Analysis of Variance Table
##
## Response: score
## Df Sum Sq Mean Sq F value Pr(>F)
## regionCentered 1 1096 1096 9.33 0.008 **
## genderCentered 1 449 449 3.82 0.069 .
## regionCentered:genderCentered 1 125 125 1.06 0.319
## Residuals 15 1762 117
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The *p*-values for the main effects are different in `summary`

than in `anova`

. This is because in `summary`

the *p*-value for a term is based on comparing the full model with the model without that term (“type-III sums of squares”), whereas in `anova`

the terms are entered one by one starting from the null model (“type-I sums of squares”). The null model, by the way, is the model with only an intercept; you get it with the formula `score ~ 1`

.

So now for another silly change, namely a change in the order of the factors.

- Nothing.
- The
*p*-values change in`summary`

, but not in`anova`

. - The
*p*-values change in`anova`

, but not in`summary`

.

`summary (lm (score ~ regionCentered * genderCentered, table))`

```
##
## Call:
## lm(formula = score ~ regionCentered * genderCentered, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.20 -7.85 -2.50 5.65 20.40
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.62 2.50 19.47 4.7e-12 ***
## regionCentered1 15.45 5.00 3.09 0.0074 **
## genderCentered1 -9.45 5.00 -1.89 0.0780 .
## regionCentered1:genderCentered1 10.30 9.99 1.03 0.3189
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.8 on 15 degrees of freedom
## Multiple R-squared: 0.487, Adjusted R-squared: 0.384
## F-statistic: 4.74 on 3 and 15 DF, p-value: 0.0161
```

`summary (lm (score ~ genderCentered * regionCentered, table))`

```
##
## Call:
## lm(formula = score ~ genderCentered * regionCentered, data = table)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.20 -7.85 -2.50 5.65 20.40
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.62 2.50 19.47 4.7e-12 ***
## genderCentered1 -9.45 5.00 -1.89 0.0780 .
## regionCentered1 15.45 5.00 3.09 0.0074 **
## genderCentered1:regionCentered1 10.30 9.99 1.03 0.3189
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.8 on 15 degrees of freedom
## Multiple R-squared: 0.487, Adjusted R-squared: 0.384
## F-statistic: 4.74 on 3 and 15 DF, p-value: 0.0161
```

Since in `summary`

a *p*-value is based on comparing the full and the full-minus-one model, there is no difference. Both `regionCentered * genderCentered`

and `genderCentered * regionCentered`

specify the full model.

`anova (lm (score ~ regionCentered * genderCentered, table))`

```
## Analysis of Variance Table
##
## Response: score
## Df Sum Sq Mean Sq F value Pr(>F)
## regionCentered 1 1096 1096 9.33 0.008 **
## genderCentered 1 449 449 3.82 0.069 .
## regionCentered:genderCentered 1 125 125 1.06 0.319
## Residuals 15 1762 117
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

`anova (lm (score ~ genderCentered * regionCentered, table))`

```
## Analysis of Variance Table
##
## Response: score
## Df Sum Sq Mean Sq F value Pr(>F)
## genderCentered 1 373 373 3.18 0.0949 .
## regionCentered 1 1172 1172 9.98 0.0065 **
## genderCentered:regionCentered 1 125 125 1.06 0.3189
## Residuals 15 1762 117
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Since in `anova`

terms are entered one by one, the first `anova`

gives the *p*-value for comparing the null model with the model with only “regionCentered” as a factor, whereas the second `anova`

gives the *p*-value for comparing the model with only “genderCentered” as a factor with the model with both “genderCentered” and “regionCentered” as factors. If the number of participants is the same in all groups, there is no difference, but if the groups have different sizes, the *p*-values will differ.

If you try all of these options, then please mention in your paper that you have tried all of them, as on pages 9–10 by Wanrooij et al. (2014b).

Always use explicit contrasts. Period.

Because the authors of `lme4`

deemed them unreliable, because of a problem in determining the number of degrees of freedom. You can still get them by comparing the full model with the model minus that one term:

`anova (model, modelMinusThatOneTerm, test = "Chisq")`

The *p*-values from a \(\chi^2\)-test on the difference between a mixed-effects model and the same model with one term added or dropped, is just as unreliable as the *p*-values that the authors of `lme4`

decided to no longer give:

If you have very large samples an asymptotic approach (using Wald z or chi-square statistics) is probably just fine. However, the further you depart from conventional repeated measures ANOVA assumptions the harder it is to know how large a sample news [sic] to be before the asymptotics kick in. (Thom Baguley on his blog)

Practical advice: add terms if they contribute with *p* < 0.01 or *p* < 0.05 perhaps, but trust their significance only if their *p*-values are below 0.0001 or 0.001 perhaps.