For my thesis, I have data from a perception experiment using an ABX task. The RQ-answering variable I was interested in is whether Korean listeners perceive an artificially created creaky lenis token as belonging to the ‘fortis’ category or to the ‘lenis’ category. My data structure is as shown in “table” below.
Import libraries:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(minqa)
options(width=200)
Import the full datasheet. The name of the .txt file corresponds to the “fortis” responses being coded as 1 and “lenis” responses being coded 0:
table <- read.delim("anonymizedData.tsv", stringsAsFactors = TRUE)
head(table)
## Participant Token_number Type A X B Reverse Correct Decision Age Gender Nationality Region SeoulYears Mandarin fxlChoice Catch_trial SeoulRegion Word Order
## 1 P_01 1 Filler n_taju_l_01 n_taju_l_02 n_panu_a_03 1 1 z 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi <NA> <NA>
## 2 P_01 2 Filler n_taju_fc_01 n_taju_fc_02 m_kata_t_01 0 1 z 37 M Korean Gyeonggi Yes Yes 1 Trial Gyeonggi <NA> <NA>
## 3 P_01 3 Filler m_kata_t_01 m_kata_t_03 n_taju_l_02 0 0 m 37 M Korean Gyeonggi Yes Yes 0 Trial Gyeonggi <NA> <NA>
## 4 P_01 4 Filler n_taju_l_01 n_taju_l_02 n_panu_a_03 1 1 z 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi <NA> <NA>
## 5 P_01 5 Filler n_kaju_l_01 n_kasha_l_01 n_kasha_l_02 0 1 m 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi <NA> <NA>
## 6 P_01 6 Filler n_kaju_l_03 n_tasha_l_02 n_tasha_l_01 0 1 m 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi <NA> <NA>
## Creak
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 <NA>
## 5 <NA>
## 6 <NA>
I have a lot of fillers, which do not tell me anything about the RQ, so I filter them out:
exp_tokens <- filter(table, Type == "Exp")
head(exp_tokens)
## Participant Token_number Type A X B Reverse Correct Decision Age Gender Nationality Region SeoulYears Mandarin fxlChoice Catch_trial SeoulRegion Word Order
## 1 P_01 11 Exp i_taju_f_02 i_taju_lc_02 i_taju_l_06 0 0 m 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi taju ABX
## 2 P_01 15 Exp i_taju_f_03 i_taju_lc_03 i_taju_l_07 0 0 m 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi taju ABX
## 3 P_01 17 Exp i_tasha_l_02 i_tasha_lc_05 i_tasha_fc_01 1 1 z 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi tasha BAX
## 4 P_01 18 Exp i_tasha_l_04 i_tasha_lc_03 i_tasha_f_03 1 1 z 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi tasha BAX
## 5 P_01 36 Exp n_tasha_fc_01 i_tasha_lc_05 i_tasha_l_04 0 0 m 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi tasha ABX
## 6 P_01 38 Exp i_taju_l_01 i_taju_lc_05 i_taju_fc_01 1 1 m 37 M Korean Gyeonggi Yes Yes 1 NoTrial Gyeonggi taju BAX
## Creak
## 1 N
## 2 N
## 3 Y
## 4 N
## 5 Y
## 6 Y
The key variable here is the ‘fxlChoice’ column. A value of ‘f’ means the listener said that sound X (from the ABX set) is closer to the sound that was a ‘fortis’ sound (always sound A). Analogously, a value of ‘l’ means that the listener said that sound X was closer to the ‘lenis’ sound (always sound B). The column ‘Creak’ signifies if natural creakiness was present in the A sound. The column ‘Word’ contains the two words which we used in the experimental condition (tajut or tasha).
Each ABX set was played four times in two orders: ABX and BAX. This is signified in the ‘Order’ column.
For the demographic questions, we have Age (continuous variable), Gender (M or F), Region (6 levels), SeoulYears (whether the participant spent 2 or more years in Seoul; Yes or No), and Mandarin (knowledge of Mandarin; Yes or No). The important thing about Region is: We expected the Seoul region to score differently than the other regions. BUT, the “Gyeonggi” region can still be considered to be a part of “The greater Seoul” and the dialect there is not known to differ very much from the Seoul dialect. Therefore, we decided to group Regions into three levels in the column SeoulRegion: Seoul, Gyeonggi, and Other (all other regions). Other columns are irrelevant at this point.
#Gender
contrast <- cbind (c(-1/2, +1/2))
colnames (contrast) <- c("-F+M")
contrasts (exp_tokens$Gender) <- contrast
contrasts (exp_tokens$Gender)
## -F+M
## F -0.5
## M 0.5
#SeoulYears
contrast <- cbind (c(-1/2, +1/2))
colnames (contrast) <- c("-N+Y")
contrasts (exp_tokens$SeoulYears) <- contrast
contrasts (exp_tokens$SeoulYears)
## -N+Y
## No -0.5
## Yes 0.5
#Mandarin
contrast <- cbind (c(-1/2, +1/2))
colnames (contrast) <- c("-N+Y")
contrasts (exp_tokens$Mandarin) <- contrast
contrasts (exp_tokens$Mandarin)
## -N+Y
## No -0.5
## Yes 0.5
#Creak
contrast <- cbind (c(-1/2, +1/2))
colnames (contrast) <- c("-N+Y")
contrasts (exp_tokens$Creak) <- contrast
contrasts (exp_tokens$Creak)
## -N+Y
## N -0.5
## Y 0.5
#Order
contrast <- cbind (c(-1/2, +1/2))
colnames (contrast) <- c("-NORMAL+REVERSED")
contrasts (exp_tokens$Order) <- contrast
contrasts (exp_tokens$Order)
## -NORMAL+REVERSED
## ABX -0.5
## BAX 0.5
#Word
contrast <- cbind (c(-1/2, +1/2))
colnames (contrast) <- c("-taju+tasha")
contrasts (exp_tokens$Word) <- contrast
contrasts (exp_tokens$Word)
## -taju+tasha
## taju -0.5
## tasha 0.5
#SeoulRegion
levels(exp_tokens$SeoulRegion)
## [1] "Gyeonggi" "Other" "Seoul"
contrast <- cbind (c(0, +0.5, -0.5), c(+2/3, -1/3, -1/3))
colnames (contrast) <- c("-Seoul+Other", "-SeoulRest+Gyeonggi")
contrasts (exp_tokens$SeoulRegion) <- contrast
contrasts (exp_tokens$SeoulRegion)
## -Seoul+Other -SeoulRest+Gyeonggi
## Gyeonggi 0.0 0.6666667
## Other 0.5 -0.3333333
## Seoul -0.5 -0.3333333
I looked closer at my data and found that Age and Region are strongly correlated. Therefore, I chose to omit Age in the final model, as I was more interested in the regional differences than age differences.
#Set age groups
age_breaks <- c(20, 30, 40, 50, 60)
age_labels <- c("21-30", "31-40", "41-50", "51-60")
# New column with age groups
exp_tokens$AgeGroup <- cut(exp_tokens$Age,
breaks = age_breaks,
labels = age_labels,
right = FALSE)
head(exp_tokens)
## Participant Token_number Type A X B Reverse Correct Decision Age Gender Nationality Region SeoulYears Mandarin fxlChoice Catch_trial SeoulRegion Word Order
## 1 P_01 11 Exp i_taju_f_02 i_taju_lc_02 i_taju_l_06 0 0 m 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi taju ABX
## 2 P_01 15 Exp i_taju_f_03 i_taju_lc_03 i_taju_l_07 0 0 m 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi taju ABX
## 3 P_01 17 Exp i_tasha_l_02 i_tasha_lc_05 i_tasha_fc_01 1 1 z 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi tasha BAX
## 4 P_01 18 Exp i_tasha_l_04 i_tasha_lc_03 i_tasha_f_03 1 1 z 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi tasha BAX
## 5 P_01 36 Exp n_tasha_fc_01 i_tasha_lc_05 i_tasha_l_04 0 0 m 37 M Korean Gyeonggi Yes Yes 0 NoTrial Gyeonggi tasha ABX
## 6 P_01 38 Exp i_taju_l_01 i_taju_lc_05 i_taju_fc_01 1 1 m 37 M Korean Gyeonggi Yes Yes 1 NoTrial Gyeonggi taju BAX
## Creak AgeGroup
## 1 N 31-40
## 2 N 31-40
## 3 Y 31-40
## 4 N 31-40
## 5 Y 31-40
## 6 Y 31-40
#Contingency table
contingency_table <- table(exp_tokens$Region, exp_tokens$AgeGroup)
contingency_table
##
## 21-30 31-40 41-50 51-60
## Chungcheong 64 192 0 0
## Gangwon 64 0 0 0
## Gyeonggi 192 174 0 64
## Gyeongsang 192 192 64 64
## Jeolla 0 64 0 0
## Seoul 512 192 0 0
Setting “bobyqa” as the optimizer to help issues with model convergence (as per Titia’s advice):
library(lme4)
## Loading required package: Matrix
ctrl <- glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5))
library (lme4)
model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Creak * (Order + Word) | Participant), control = ctrl, family=binomial, data=exp_tokens)
## boundary (singular) fit: see help('isSingular')
summary (model_glm)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
## Family: binomial ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Creak * (Order + Word) | Participant)
## Data: exp_tokens
## Control: ctrl
##
## AIC BIC logLik deviance df.resid
## 1277.6 1474.1 -603.8 1207.6 1995
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.6990 -0.3681 -0.2026 -0.1317 7.3799
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Participant (Intercept) 0.36863 0.6072
## Creak-N+Y 0.17502 0.4184 -0.27
## Order-NORMAL+REVERSED 1.28222 1.1324 0.13 0.87
## Word-taju+tasha 0.05247 0.2291 -0.88 -0.14 -0.41
## Creak-N+Y:Order-NORMAL+REVERSED 0.35229 0.5935 0.77 0.40 0.69 -0.94
## Creak-N+Y:Word-taju+tasha 0.47723 0.6908 -0.19 0.32 -0.05 -0.17 0.07
## Number of obs: 2030, groups: Participant, 32
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.67147 0.20833 -12.823 < 2e-16 ***
## Creak-N+Y 0.11725 0.31057 0.378 0.706
## Order-NORMAL+REVERSED 1.94479 0.33878 5.741 9.44e-09 ***
## SeoulRegion-Seoul+Other 0.03779 0.35077 0.108 0.914
## SeoulRegion-SeoulRest+Gyeonggi 0.02875 0.37116 0.077 0.938
## Mandarin-N+Y -0.07725 0.39696 -0.195 0.846
## Gender-F+M 0.26142 0.30512 0.857 0.392
## Word-taju+tasha 0.01322 0.18011 0.073 0.941
## Creak-N+Y:Order-NORMAL+REVERSED -0.56646 0.54232 -1.045 0.296
## Creak-N+Y:SeoulRegion-Seoul+Other -0.09784 0.41531 -0.236 0.814
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi 0.72514 0.46478 1.560 0.119
## Creak-N+Y:Mandarin-N+Y -0.14509 0.49385 -0.294 0.769
## Creak-N+Y:Gender-F+M 0.24772 0.36943 0.671 0.503
## Creak-N+Y:Word-taju+tasha 0.45398 0.37237 1.219 0.223
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Creak * Order | Participant), control = ctrl, family=binomial, data=exp_tokens)
## boundary (singular) fit: see help('isSingular')
summary (model_glm)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
## Family: binomial ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Creak * Order | Participant)
## Data: exp_tokens
## Control: ctrl
##
## AIC BIC logLik deviance df.resid
## 1257.6 1392.4 -604.8 1209.6 2006
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4791 -0.3706 -0.2006 -0.1349 6.6562
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Participant (Intercept) 0.3649 0.6040
## Creak-N+Y 0.1741 0.4173 -0.33
## Order-NORMAL+REVERSED 1.2987 1.1396 0.12 0.90
## Creak-N+Y:Order-NORMAL+REVERSED 0.3492 0.5909 0.80 0.31 0.69
## Number of obs: 2030, groups: Participant, 32
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.67100 0.20899 -12.781 < 2e-16 ***
## Creak-N+Y 0.11863 0.30828 0.385 0.700
## Order-NORMAL+REVERSED 1.94018 0.33957 5.714 1.11e-08 ***
## SeoulRegion-Seoul+Other -0.01077 0.33588 -0.032 0.974
## SeoulRegion-SeoulRest+Gyeonggi -0.05029 0.37153 -0.135 0.892
## Mandarin-N+Y -0.12245 0.39916 -0.307 0.759
## Gender-F+M 0.29590 0.30708 0.964 0.335
## Word-taju+tasha -0.05232 0.15442 -0.339 0.735
## Creak-N+Y:Order-NORMAL+REVERSED -0.58264 0.54401 -1.071 0.284
## Creak-N+Y:SeoulRegion-Seoul+Other -0.04161 0.39448 -0.105 0.916
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi 0.73406 0.45919 1.599 0.110
## Creak-N+Y:Mandarin-N+Y -0.16102 0.48878 -0.329 0.742
## Creak-N+Y:Gender-F+M 0.26063 0.36235 0.719 0.472
## Creak-N+Y:Word-taju+tasha 0.39284 0.30885 1.272 0.203
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Order | Participant), control = ctrl, family=binomial, data=exp_tokens)
summary (model_glm)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
## Family: binomial ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Order | Participant)
## Data: exp_tokens
## Control: ctrl
##
## AIC BIC logLik deviance df.resid
## 1250.5 1345.9 -608.2 1216.5 2013
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4165 -0.3596 -0.2092 -0.1421 6.6407
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Participant (Intercept) 0.3414 0.5843
## Order-NORMAL+REVERSED 1.0551 1.0272 0.17
## Number of obs: 2030, groups: Participant, 32
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.614097 0.202062 -12.937 < 2e-16 ***
## Creak-N+Y 0.094804 0.233712 0.406 0.685
## Order-NORMAL+REVERSED 1.915412 0.318385 6.016 1.79e-09 ***
## SeoulRegion-Seoul+Other 0.091298 0.328667 0.278 0.781
## SeoulRegion-SeoulRest+Gyeonggi -0.009027 0.374191 -0.024 0.981
## Mandarin-N+Y -0.136229 0.401920 -0.339 0.735
## Gender-F+M 0.344661 0.308187 1.118 0.263
## Word-taju+tasha -0.059921 0.153566 -0.390 0.696
## Creak-N+Y:Order-NORMAL+REVERSED -0.105162 0.389209 -0.270 0.787
## Creak-N+Y:SeoulRegion-Seoul+Other -0.005892 0.359685 -0.016 0.987
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi 0.598319 0.428176 1.397 0.162
## Creak-N+Y:Mandarin-N+Y -0.036254 0.448530 -0.081 0.936
## Creak-N+Y:Gender-F+M 0.195459 0.333332 0.586 0.558
## Creak-N+Y:Word-taju+tasha 0.392763 0.307254 1.278 0.201
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Creak | Participant), control = ctrl, family=binomial, data=exp_tokens)
## boundary (singular) fit: see help('isSingular')
summary (model_glm)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
## Family: binomial ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Creak | Participant)
## Data: exp_tokens
## Control: ctrl
##
## AIC BIC logLik deviance df.resid
## 1262.8 1358.3 -614.4 1228.8 2013
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.2196 -0.3935 -0.2113 -0.1375 9.1057
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Participant (Intercept) 0.4900 0.7000
## Creak-N+Y 0.0248 0.1575 1.00
## Number of obs: 2030, groups: Participant, 32
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.585368 0.204090 -12.668 <2e-16 ***
## Creak-N+Y 0.018389 0.250310 0.073 0.941
## Order-NORMAL+REVERSED 1.982434 0.193143 10.264 <2e-16 ***
## SeoulRegion-Seoul+Other 0.094535 0.341826 0.277 0.782
## SeoulRegion-SeoulRest+Gyeonggi -0.135268 0.387045 -0.349 0.727
## Mandarin-N+Y -0.050833 0.413225 -0.123 0.902
## Gender-F+M 0.318546 0.324493 0.982 0.326
## Word-taju+tasha -0.055230 0.149797 -0.369 0.712
## Creak-N+Y:Order-NORMAL+REVERSED -0.076380 0.386177 -0.198 0.843
## Creak-N+Y:SeoulRegion-Seoul+Other 0.006624 0.357732 0.019 0.985
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi 0.614136 0.427902 1.435 0.151
## Creak-N+Y:Mandarin-N+Y -0.043359 0.448536 -0.097 0.923
## Creak-N+Y:Gender-F+M 0.190275 0.331548 0.574 0.566
## Creak-N+Y:Word-taju+tasha 0.377713 0.299584 1.261 0.207
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Order + Word | Participant), control = ctrl, family=binomial, data=exp_tokens)
## boundary (singular) fit: see help('isSingular')
summary (model_glm)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
## Family: binomial ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Order + Word | Participant)
## Data: exp_tokens
## Control: ctrl
##
## AIC BIC logLik deviance df.resid
## 1255.0 1367.3 -607.5 1215.0 2010
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.6203 -0.3672 -0.2090 -0.1408 6.9544
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Participant (Intercept) 0.34830 0.5902
## Order-NORMAL+REVERSED 1.04299 1.0213 0.17
## Word-taju+tasha 0.05231 0.2287 -0.99 -0.27
## Number of obs: 2030, groups: Participant, 32
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.596821 0.199742 -13.001 < 2e-16 ***
## Creak-N+Y 0.092368 0.233643 0.395 0.693
## Order-NORMAL+REVERSED 1.922351 0.316799 6.068 1.29e-09 ***
## SeoulRegion-Seoul+Other 0.099444 0.323877 0.307 0.759
## SeoulRegion-SeoulRest+Gyeonggi 0.077159 0.373241 0.207 0.836
## Mandarin-N+Y -0.071561 0.397722 -0.180 0.857
## Gender-F+M 0.298033 0.305558 0.975 0.329
## Word-taju+tasha 0.024335 0.173578 0.140 0.889
## Creak-N+Y:Order-NORMAL+REVERSED -0.111181 0.389807 -0.285 0.775
## Creak-N+Y:SeoulRegion-Seoul+Other -0.003415 0.360397 -0.009 0.992
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi 0.591090 0.426670 1.385 0.166
## Creak-N+Y:Mandarin-N+Y -0.027687 0.447194 -0.062 0.951
## Creak-N+Y:Gender-F+M 0.189613 0.334582 0.567 0.571
## Creak-N+Y:Word-taju+tasha 0.393227 0.307721 1.278 0.201
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Word | Participant), control = ctrl, family=binomial, data=exp_tokens)
## boundary (singular) fit: see help('isSingular')
summary (model_glm)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
## Family: binomial ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Word | Participant)
## Data: exp_tokens
## Control: ctrl
##
## AIC BIC logLik deviance df.resid
## 1262.2 1357.7 -614.1 1228.2 2013
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.3398 -0.3922 -0.2115 -0.1387 9.8789
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Participant (Intercept) 0.48323 0.6951
## Word-taju+tasha 0.04122 0.2030 -1.00
## Number of obs: 2030, groups: Participant, 32
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.5617676 0.2006783 -12.766 <2e-16 ***
## Creak-N+Y 0.1007410 0.2294104 0.439 0.661
## Order-NORMAL+REVERSED 1.9871108 0.1935953 10.264 <2e-16 ***
## SeoulRegion-Seoul+Other 0.1037814 0.3323512 0.312 0.755
## SeoulRegion-SeoulRest+Gyeonggi -0.0505015 0.3832609 -0.132 0.895
## Mandarin-N+Y 0.0233660 0.4031123 0.058 0.954
## Gender-F+M 0.2624863 0.3166577 0.829 0.407
## Word-taju+tasha 0.0301997 0.1704175 0.177 0.859
## Creak-N+Y:Order-NORMAL+REVERSED -0.1320869 0.3833344 -0.345 0.730
## Creak-N+Y:SeoulRegion-Seoul+Other -0.0006549 0.3505191 -0.002 0.999
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi 0.5944037 0.4202891 1.414 0.157
## Creak-N+Y:Mandarin-N+Y -0.0404828 0.4385247 -0.092 0.926
## Creak-N+Y:Gender-F+M 0.1958236 0.3254239 0.602 0.547
## Creak-N+Y:Word-taju+tasha 0.3803895 0.2997879 1.269 0.204
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
#final model
model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (1 | Participant), control = ctrl, family=binomial, data=exp_tokens)
summary (model_glm)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
## Family: binomial ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (1 | Participant)
## Data: exp_tokens
## Control: ctrl
##
## AIC BIC logLik deviance df.resid
## 1259.6 1343.8 -614.8 1229.6 2015
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.2040 -0.3920 -0.2120 -0.1388 9.4003
##
## Random effects:
## Groups Name Variance Std.Dev.
## Participant (Intercept) 0.4806 0.6933
## Number of obs: 2030, groups: Participant, 32
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.578154 0.202331 -12.742 <2e-16 ***
## Creak-N+Y 0.101474 0.229450 0.442 0.658
## Order-NORMAL+REVERSED 1.982179 0.193212 10.259 <2e-16 ***
## SeoulRegion-Seoul+Other 0.101135 0.338997 0.298 0.765
## SeoulRegion-SeoulRest+Gyeonggi -0.140570 0.384328 -0.366 0.715
## Mandarin-N+Y -0.034131 0.409217 -0.083 0.934
## Gender-F+M 0.306561 0.321774 0.953 0.341
## Word-taju+tasha -0.058101 0.149743 -0.388 0.698
## Creak-N+Y:Order-NORMAL+REVERSED -0.123136 0.382859 -0.322 0.748
## Creak-N+Y:SeoulRegion-Seoul+Other -0.002103 0.350104 -0.006 0.995
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi 0.600505 0.421986 1.423 0.155
## Creak-N+Y:Mandarin-N+Y -0.046529 0.440034 -0.106 0.916
## Creak-N+Y:Gender-F+M 0.199968 0.324582 0.616 0.538
## Creak-N+Y:Word-taju+tasha 0.379098 0.299553 1.266 0.206
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
Mean and SD calculation of fortis/lenis tokens:
VOT:
fortis_tasha <- c(8.4, 12.5, 12.6, 22.4, 14.1, 14.2, 6.6, 6.4)
lenis_tasha <- c(75.8, 70.4, 74.7, 67.6, 60, 66)
fortis_tajut <- c(9, 9.7, 9.7, 10.8, 8.7, 10.6, 8, 8.2)
lenis_tajut <- c(63.4, 64, 64.2, 44.3, 58.9, 53, 64, 60)
mean(lenis_tasha)
## [1] 69.08333
sd(lenis_tasha)
## [1] 5.875514
mean(lenis_tajut)
## [1] 58.975
sd(lenis_tajut)
## [1] 7.065965
F0:
fortis_f0_tajut <- c(225, 235, 234, 218, 231, 233, 228, 237)
lenis_f0_tajut <- c(244, 248, 250, 236, 239, 249, 238, 245)
fortis_f0_tasha <- c(213, 219, 222, 215, 227, 229, 232, 232)
lenis_f0_tasha <- c(238, 220, 224, 230, 224, 220)
mean(lenis_f0_tasha)
## [1] 226
sd(lenis_f0_tasha)
## [1] 6.928203
mean(lenis_f0_tajut)
## [1] 243.625
sd(lenis_f0_tajut)
## [1] 5.370222
Transforming log-odds to odds + lower and upper 95% confidence intervals:
odds <- exp(2.58)
upper_ci <- exp(1.98218 + 2 * 0.19321)
lower_ci <- exp(1.98218 - 2 * 0.19321)
odds
## [1] 13.19714
lower_ci
## [1] 4.932076
upper_ci
## [1] 10.68243