rMA Thesis analysis scripts

For my thesis, I have data from a perception experiment using an ABX task. The RQ-answering variable I was interested in is whether Korean listeners perceive an artificially created creaky lenis token as belonging to the ‘fortis’ category or to the ‘lenis’ category. My data structure is as shown in “table” below.

Import libraries:

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(minqa)
options(width=200)

Import the full datasheet. The name of the .txt file corresponds to the “fortis” responses being coded as 1 and “lenis” responses being coded 0:

table <- read.delim("anonymizedData.tsv", stringsAsFactors = TRUE)
head(table)

##   Participant Token_number   Type            A            X            B Reverse Correct Decision Age Gender Nationality   Region SeoulYears Mandarin fxlChoice Catch_trial SeoulRegion Word Order
## 1        P_01            1 Filler  n_taju_l_01  n_taju_l_02  n_panu_a_03       1       1        z  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi <NA>  <NA>
## 2        P_01            2 Filler n_taju_fc_01 n_taju_fc_02  m_kata_t_01       0       1        z  37      M      Korean Gyeonggi        Yes      Yes         1       Trial    Gyeonggi <NA>  <NA>
## 3        P_01            3 Filler  m_kata_t_01  m_kata_t_03  n_taju_l_02       0       0        m  37      M      Korean Gyeonggi        Yes      Yes         0       Trial    Gyeonggi <NA>  <NA>
## 4        P_01            4 Filler  n_taju_l_01  n_taju_l_02  n_panu_a_03       1       1        z  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi <NA>  <NA>
## 5        P_01            5 Filler  n_kaju_l_01 n_kasha_l_01 n_kasha_l_02       0       1        m  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi <NA>  <NA>
## 6        P_01            6 Filler  n_kaju_l_03 n_tasha_l_02 n_tasha_l_01       0       1        m  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi <NA>  <NA>
##   Creak
## 1  <NA>
## 2  <NA>
## 3  <NA>
## 4  <NA>
## 5  <NA>
## 6  <NA>

I have a lot of fillers, which do not tell me anything about the RQ, so I filter them out:

exp_tokens <- filter(table, Type == "Exp")
head(exp_tokens)

##   Participant Token_number Type             A             X             B Reverse Correct Decision Age Gender Nationality   Region SeoulYears Mandarin fxlChoice Catch_trial SeoulRegion  Word Order
## 1        P_01           11  Exp   i_taju_f_02  i_taju_lc_02   i_taju_l_06       0       0        m  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi  taju   ABX
## 2        P_01           15  Exp   i_taju_f_03  i_taju_lc_03   i_taju_l_07       0       0        m  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi  taju   ABX
## 3        P_01           17  Exp  i_tasha_l_02 i_tasha_lc_05 i_tasha_fc_01       1       1        z  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi tasha   BAX
## 4        P_01           18  Exp  i_tasha_l_04 i_tasha_lc_03  i_tasha_f_03       1       1        z  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi tasha   BAX
## 5        P_01           36  Exp n_tasha_fc_01 i_tasha_lc_05  i_tasha_l_04       0       0        m  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi tasha   ABX
## 6        P_01           38  Exp   i_taju_l_01  i_taju_lc_05  i_taju_fc_01       1       1        m  37      M      Korean Gyeonggi        Yes      Yes         1     NoTrial    Gyeonggi  taju   BAX
##   Creak
## 1     N
## 2     N
## 3     Y
## 4     N
## 5     Y
## 6     Y

The key variable here is the ‘fxlChoice’ column. A value of ‘f’ means the listener said that sound X (from the ABX set) is closer to the sound that was a ‘fortis’ sound (always sound A). Analogously, a value of ‘l’ means that the listener said that sound X was closer to the ‘lenis’ sound (always sound B). The column ‘Creak’ signifies if natural creakiness was present in the A sound. The column ‘Word’ contains the two words which we used in the experimental condition (tajut or tasha).

Each ABX set was played four times in two orders: ABX and BAX. This is signified in the ‘Order’ column.

For the demographic questions, we have Age (continuous variable), Gender (M or F), Region (6 levels), SeoulYears (whether the participant spent 2 or more years in Seoul; Yes or No), and Mandarin (knowledge of Mandarin; Yes or No). The important thing about Region is: We expected the Seoul region to score differently than the other regions. BUT, the “Gyeonggi” region can still be considered to be a part of “The greater Seoul” and the dialect there is not known to differ very much from the Seoul dialect. Therefore, we decided to group Regions into three levels in the column SeoulRegion: Seoul, Gyeonggi, and Other (all other regions). Other columns are irrelevant at this point.

#Gender
contrast <- cbind (c(-1/2, +1/2))
colnames (contrast) <- c("-F+M")
contrasts (exp_tokens$Gender) <- contrast
contrasts (exp_tokens$Gender)

##   -F+M
## F -0.5
## M  0.5

#SeoulYears
contrast <- cbind (c(-1/2, +1/2))
colnames (contrast) <- c("-N+Y")
contrasts (exp_tokens$SeoulYears) <- contrast
contrasts (exp_tokens$SeoulYears)

##     -N+Y
## No  -0.5
## Yes  0.5

#Mandarin
contrast <- cbind (c(-1/2, +1/2))
colnames (contrast) <- c("-N+Y")
contrasts (exp_tokens$Mandarin) <- contrast
contrasts (exp_tokens$Mandarin)

##     -N+Y
## No  -0.5
## Yes  0.5

#Creak
contrast <- cbind (c(-1/2, +1/2))
colnames (contrast) <- c("-N+Y")
contrasts (exp_tokens$Creak) <- contrast
contrasts (exp_tokens$Creak)

##   -N+Y
## N -0.5
## Y  0.5

#Order
contrast <- cbind (c(-1/2, +1/2))
colnames (contrast) <- c("-NORMAL+REVERSED")
contrasts (exp_tokens$Order) <- contrast
contrasts (exp_tokens$Order)

##     -NORMAL+REVERSED
## ABX             -0.5
## BAX              0.5

#Word
contrast <- cbind (c(-1/2, +1/2))
colnames (contrast) <- c("-taju+tasha")
contrasts (exp_tokens$Word) <- contrast
contrasts (exp_tokens$Word)

##       -taju+tasha
## taju         -0.5
## tasha         0.5

#SeoulRegion
levels(exp_tokens$SeoulRegion)

## [1] "Gyeonggi" "Other"    "Seoul"

contrast <- cbind (c(0, +0.5, -0.5), c(+2/3, -1/3, -1/3))
colnames (contrast) <- c("-Seoul+Other", "-SeoulRest+Gyeonggi")
contrasts (exp_tokens$SeoulRegion) <- contrast
contrasts (exp_tokens$SeoulRegion)

##          -Seoul+Other -SeoulRest+Gyeonggi
## Gyeonggi          0.0           0.6666667
## Other             0.5          -0.3333333
## Seoul            -0.5          -0.3333333

I looked closer at my data and found that Age and Region are strongly correlated. Therefore, I chose to omit Age in the final model, as I was more interested in the regional differences than age differences.

#Set age groups
age_breaks <- c(20, 30, 40, 50, 60)
age_labels <- c("21-30", "31-40", "41-50", "51-60")

# New column with age groups
exp_tokens$AgeGroup <- cut(exp_tokens$Age, 
                           breaks = age_breaks, 
                           labels = age_labels, 
                           right = FALSE)

head(exp_tokens)

##   Participant Token_number Type             A             X             B Reverse Correct Decision Age Gender Nationality   Region SeoulYears Mandarin fxlChoice Catch_trial SeoulRegion  Word Order
## 1        P_01           11  Exp   i_taju_f_02  i_taju_lc_02   i_taju_l_06       0       0        m  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi  taju   ABX
## 2        P_01           15  Exp   i_taju_f_03  i_taju_lc_03   i_taju_l_07       0       0        m  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi  taju   ABX
## 3        P_01           17  Exp  i_tasha_l_02 i_tasha_lc_05 i_tasha_fc_01       1       1        z  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi tasha   BAX
## 4        P_01           18  Exp  i_tasha_l_04 i_tasha_lc_03  i_tasha_f_03       1       1        z  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi tasha   BAX
## 5        P_01           36  Exp n_tasha_fc_01 i_tasha_lc_05  i_tasha_l_04       0       0        m  37      M      Korean Gyeonggi        Yes      Yes         0     NoTrial    Gyeonggi tasha   ABX
## 6        P_01           38  Exp   i_taju_l_01  i_taju_lc_05  i_taju_fc_01       1       1        m  37      M      Korean Gyeonggi        Yes      Yes         1     NoTrial    Gyeonggi  taju   BAX
##   Creak AgeGroup
## 1     N    31-40
## 2     N    31-40
## 3     Y    31-40
## 4     N    31-40
## 5     Y    31-40
## 6     Y    31-40

#Contingency table
contingency_table <- table(exp_tokens$Region, exp_tokens$AgeGroup)
contingency_table

##              
##               21-30 31-40 41-50 51-60
##   Chungcheong    64   192     0     0
##   Gangwon        64     0     0     0
##   Gyeonggi      192   174     0    64
##   Gyeongsang    192   192    64    64
##   Jeolla          0    64     0     0
##   Seoul         512   192     0     0

Setting “bobyqa” as the optimizer to help issues with model convergence (as per Titia’s advice):

library(lme4)

## Loading required package: Matrix

ctrl <- glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5))

library (lme4)
model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Creak * (Order + Word) | Participant), control = ctrl, family=binomial, data=exp_tokens)

## boundary (singular) fit: see help('isSingular')

summary (model_glm)

## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender +      Word) + (Creak * (Order + Word) | Participant)
##    Data: exp_tokens
## Control: ctrl
## 
##      AIC      BIC   logLik deviance df.resid 
##   1277.6   1474.1   -603.8   1207.6     1995 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6990 -0.3681 -0.2026 -0.1317  7.3799 
## 
## Random effects:
##  Groups      Name                            Variance Std.Dev. Corr                         
##  Participant (Intercept)                     0.36863  0.6072                                
##              Creak-N+Y                       0.17502  0.4184   -0.27                        
##              Order-NORMAL+REVERSED           1.28222  1.1324    0.13  0.87                  
##              Word-taju+tasha                 0.05247  0.2291   -0.88 -0.14 -0.41            
##              Creak-N+Y:Order-NORMAL+REVERSED 0.35229  0.5935    0.77  0.40  0.69 -0.94      
##              Creak-N+Y:Word-taju+tasha       0.47723  0.6908   -0.19  0.32 -0.05 -0.17  0.07
## Number of obs: 2030, groups:  Participant, 32
## 
## Fixed effects:
##                                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                              -2.67147    0.20833 -12.823  < 2e-16 ***
## Creak-N+Y                                 0.11725    0.31057   0.378    0.706    
## Order-NORMAL+REVERSED                     1.94479    0.33878   5.741 9.44e-09 ***
## SeoulRegion-Seoul+Other                   0.03779    0.35077   0.108    0.914    
## SeoulRegion-SeoulRest+Gyeonggi            0.02875    0.37116   0.077    0.938    
## Mandarin-N+Y                             -0.07725    0.39696  -0.195    0.846    
## Gender-F+M                                0.26142    0.30512   0.857    0.392    
## Word-taju+tasha                           0.01322    0.18011   0.073    0.941    
## Creak-N+Y:Order-NORMAL+REVERSED          -0.56646    0.54232  -1.045    0.296    
## Creak-N+Y:SeoulRegion-Seoul+Other        -0.09784    0.41531  -0.236    0.814    
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi  0.72514    0.46478   1.560    0.119    
## Creak-N+Y:Mandarin-N+Y                   -0.14509    0.49385  -0.294    0.769    
## Creak-N+Y:Gender-F+M                      0.24772    0.36943   0.671    0.503    
## Creak-N+Y:Word-taju+tasha                 0.45398    0.37237   1.219    0.223    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Creak * Order | Participant), control = ctrl, family=binomial, data=exp_tokens)

## boundary (singular) fit: see help('isSingular')

summary (model_glm)

## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender +      Word) + (Creak * Order | Participant)
##    Data: exp_tokens
## Control: ctrl
## 
##      AIC      BIC   logLik deviance df.resid 
##   1257.6   1392.4   -604.8   1209.6     2006 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4791 -0.3706 -0.2006 -0.1349  6.6562 
## 
## Random effects:
##  Groups      Name                            Variance Std.Dev. Corr             
##  Participant (Intercept)                     0.3649   0.6040                    
##              Creak-N+Y                       0.1741   0.4173   -0.33            
##              Order-NORMAL+REVERSED           1.2987   1.1396    0.12  0.90      
##              Creak-N+Y:Order-NORMAL+REVERSED 0.3492   0.5909    0.80  0.31  0.69
## Number of obs: 2030, groups:  Participant, 32
## 
## Fixed effects:
##                                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                              -2.67100    0.20899 -12.781  < 2e-16 ***
## Creak-N+Y                                 0.11863    0.30828   0.385    0.700    
## Order-NORMAL+REVERSED                     1.94018    0.33957   5.714 1.11e-08 ***
## SeoulRegion-Seoul+Other                  -0.01077    0.33588  -0.032    0.974    
## SeoulRegion-SeoulRest+Gyeonggi           -0.05029    0.37153  -0.135    0.892    
## Mandarin-N+Y                             -0.12245    0.39916  -0.307    0.759    
## Gender-F+M                                0.29590    0.30708   0.964    0.335    
## Word-taju+tasha                          -0.05232    0.15442  -0.339    0.735    
## Creak-N+Y:Order-NORMAL+REVERSED          -0.58264    0.54401  -1.071    0.284    
## Creak-N+Y:SeoulRegion-Seoul+Other        -0.04161    0.39448  -0.105    0.916    
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi  0.73406    0.45919   1.599    0.110    
## Creak-N+Y:Mandarin-N+Y                   -0.16102    0.48878  -0.329    0.742    
## Creak-N+Y:Gender-F+M                      0.26063    0.36235   0.719    0.472    
## Creak-N+Y:Word-taju+tasha                 0.39284    0.30885   1.272    0.203    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Order | Participant), control = ctrl, family=binomial, data=exp_tokens)
summary (model_glm)

## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender +      Word) + (Order | Participant)
##    Data: exp_tokens
## Control: ctrl
## 
##      AIC      BIC   logLik deviance df.resid 
##   1250.5   1345.9   -608.2   1216.5     2013 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4165 -0.3596 -0.2092 -0.1421  6.6407 
## 
## Random effects:
##  Groups      Name                  Variance Std.Dev. Corr
##  Participant (Intercept)           0.3414   0.5843       
##              Order-NORMAL+REVERSED 1.0551   1.0272   0.17
## Number of obs: 2030, groups:  Participant, 32
## 
## Fixed effects:
##                                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                              -2.614097   0.202062 -12.937  < 2e-16 ***
## Creak-N+Y                                 0.094804   0.233712   0.406    0.685    
## Order-NORMAL+REVERSED                     1.915412   0.318385   6.016 1.79e-09 ***
## SeoulRegion-Seoul+Other                   0.091298   0.328667   0.278    0.781    
## SeoulRegion-SeoulRest+Gyeonggi           -0.009027   0.374191  -0.024    0.981    
## Mandarin-N+Y                             -0.136229   0.401920  -0.339    0.735    
## Gender-F+M                                0.344661   0.308187   1.118    0.263    
## Word-taju+tasha                          -0.059921   0.153566  -0.390    0.696    
## Creak-N+Y:Order-NORMAL+REVERSED          -0.105162   0.389209  -0.270    0.787    
## Creak-N+Y:SeoulRegion-Seoul+Other        -0.005892   0.359685  -0.016    0.987    
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi  0.598319   0.428176   1.397    0.162    
## Creak-N+Y:Mandarin-N+Y                   -0.036254   0.448530  -0.081    0.936    
## Creak-N+Y:Gender-F+M                      0.195459   0.333332   0.586    0.558    
## Creak-N+Y:Word-taju+tasha                 0.392763   0.307254   1.278    0.201    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Creak | Participant), control = ctrl, family=binomial, data=exp_tokens)

## boundary (singular) fit: see help('isSingular')

summary (model_glm)

## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender +      Word) + (Creak | Participant)
##    Data: exp_tokens
## Control: ctrl
## 
##      AIC      BIC   logLik deviance df.resid 
##   1262.8   1358.3   -614.4   1228.8     2013 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.2196 -0.3935 -0.2113 -0.1375  9.1057 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev. Corr
##  Participant (Intercept) 0.4900   0.7000       
##              Creak-N+Y   0.0248   0.1575   1.00
## Number of obs: 2030, groups:  Participant, 32
## 
## Fixed effects:
##                                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                              -2.585368   0.204090 -12.668   <2e-16 ***
## Creak-N+Y                                 0.018389   0.250310   0.073    0.941    
## Order-NORMAL+REVERSED                     1.982434   0.193143  10.264   <2e-16 ***
## SeoulRegion-Seoul+Other                   0.094535   0.341826   0.277    0.782    
## SeoulRegion-SeoulRest+Gyeonggi           -0.135268   0.387045  -0.349    0.727    
## Mandarin-N+Y                             -0.050833   0.413225  -0.123    0.902    
## Gender-F+M                                0.318546   0.324493   0.982    0.326    
## Word-taju+tasha                          -0.055230   0.149797  -0.369    0.712    
## Creak-N+Y:Order-NORMAL+REVERSED          -0.076380   0.386177  -0.198    0.843    
## Creak-N+Y:SeoulRegion-Seoul+Other         0.006624   0.357732   0.019    0.985    
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi  0.614136   0.427902   1.435    0.151    
## Creak-N+Y:Mandarin-N+Y                   -0.043359   0.448536  -0.097    0.923    
## Creak-N+Y:Gender-F+M                      0.190275   0.331548   0.574    0.566    
## Creak-N+Y:Word-taju+tasha                 0.377713   0.299584   1.261    0.207    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Order + Word | Participant), control = ctrl, family=binomial, data=exp_tokens)

## boundary (singular) fit: see help('isSingular')

summary (model_glm)

## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender +      Word) + (Order + Word | Participant)
##    Data: exp_tokens
## Control: ctrl
## 
##      AIC      BIC   logLik deviance df.resid 
##   1255.0   1367.3   -607.5   1215.0     2010 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6203 -0.3672 -0.2090 -0.1408  6.9544 
## 
## Random effects:
##  Groups      Name                  Variance Std.Dev. Corr       
##  Participant (Intercept)           0.34830  0.5902              
##              Order-NORMAL+REVERSED 1.04299  1.0213    0.17      
##              Word-taju+tasha       0.05231  0.2287   -0.99 -0.27
## Number of obs: 2030, groups:  Participant, 32
## 
## Fixed effects:
##                                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                              -2.596821   0.199742 -13.001  < 2e-16 ***
## Creak-N+Y                                 0.092368   0.233643   0.395    0.693    
## Order-NORMAL+REVERSED                     1.922351   0.316799   6.068 1.29e-09 ***
## SeoulRegion-Seoul+Other                   0.099444   0.323877   0.307    0.759    
## SeoulRegion-SeoulRest+Gyeonggi            0.077159   0.373241   0.207    0.836    
## Mandarin-N+Y                             -0.071561   0.397722  -0.180    0.857    
## Gender-F+M                                0.298033   0.305558   0.975    0.329    
## Word-taju+tasha                           0.024335   0.173578   0.140    0.889    
## Creak-N+Y:Order-NORMAL+REVERSED          -0.111181   0.389807  -0.285    0.775    
## Creak-N+Y:SeoulRegion-Seoul+Other        -0.003415   0.360397  -0.009    0.992    
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi  0.591090   0.426670   1.385    0.166    
## Creak-N+Y:Mandarin-N+Y                   -0.027687   0.447194  -0.062    0.951    
## Creak-N+Y:Gender-F+M                      0.189613   0.334582   0.567    0.571    
## Creak-N+Y:Word-taju+tasha                 0.393227   0.307721   1.278    0.201    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (Word | Participant), control = ctrl, family=binomial, data=exp_tokens)

## boundary (singular) fit: see help('isSingular')

summary (model_glm)

## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender +      Word) + (Word | Participant)
##    Data: exp_tokens
## Control: ctrl
## 
##      AIC      BIC   logLik deviance df.resid 
##   1262.2   1357.7   -614.1   1228.2     2013 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.3398 -0.3922 -0.2115 -0.1387  9.8789 
## 
## Random effects:
##  Groups      Name            Variance Std.Dev. Corr 
##  Participant (Intercept)     0.48323  0.6951        
##              Word-taju+tasha 0.04122  0.2030   -1.00
## Number of obs: 2030, groups:  Participant, 32
## 
## Fixed effects:
##                                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                              -2.5617676  0.2006783 -12.766   <2e-16 ***
## Creak-N+Y                                 0.1007410  0.2294104   0.439    0.661    
## Order-NORMAL+REVERSED                     1.9871108  0.1935953  10.264   <2e-16 ***
## SeoulRegion-Seoul+Other                   0.1037814  0.3323512   0.312    0.755    
## SeoulRegion-SeoulRest+Gyeonggi           -0.0505015  0.3832609  -0.132    0.895    
## Mandarin-N+Y                              0.0233660  0.4031123   0.058    0.954    
## Gender-F+M                                0.2624863  0.3166577   0.829    0.407    
## Word-taju+tasha                           0.0301997  0.1704175   0.177    0.859    
## Creak-N+Y:Order-NORMAL+REVERSED          -0.1320869  0.3833344  -0.345    0.730    
## Creak-N+Y:SeoulRegion-Seoul+Other        -0.0006549  0.3505191  -0.002    0.999    
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi  0.5944037  0.4202891   1.414    0.157    
## Creak-N+Y:Mandarin-N+Y                   -0.0404828  0.4385247  -0.092    0.926    
## Creak-N+Y:Gender-F+M                      0.1958236  0.3254239   0.602    0.547    
## Creak-N+Y:Word-taju+tasha                 0.3803895  0.2997879   1.269    0.204    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

#final model
model_glm <- lme4::glmer (fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender + Word) + (1 | Participant), control = ctrl, family=binomial, data=exp_tokens)
summary (model_glm)

## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: fxlChoice ~ Creak * (Order + SeoulRegion + Mandarin + Gender +      Word) + (1 | Participant)
##    Data: exp_tokens
## Control: ctrl
## 
##      AIC      BIC   logLik deviance df.resid 
##   1259.6   1343.8   -614.8   1229.6     2015 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.2040 -0.3920 -0.2120 -0.1388  9.4003 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  Participant (Intercept) 0.4806   0.6933  
## Number of obs: 2030, groups:  Participant, 32
## 
## Fixed effects:
##                                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                              -2.578154   0.202331 -12.742   <2e-16 ***
## Creak-N+Y                                 0.101474   0.229450   0.442    0.658    
## Order-NORMAL+REVERSED                     1.982179   0.193212  10.259   <2e-16 ***
## SeoulRegion-Seoul+Other                   0.101135   0.338997   0.298    0.765    
## SeoulRegion-SeoulRest+Gyeonggi           -0.140570   0.384328  -0.366    0.715    
## Mandarin-N+Y                             -0.034131   0.409217  -0.083    0.934    
## Gender-F+M                                0.306561   0.321774   0.953    0.341    
## Word-taju+tasha                          -0.058101   0.149743  -0.388    0.698    
## Creak-N+Y:Order-NORMAL+REVERSED          -0.123136   0.382859  -0.322    0.748    
## Creak-N+Y:SeoulRegion-Seoul+Other        -0.002103   0.350104  -0.006    0.995    
## Creak-N+Y:SeoulRegion-SeoulRest+Gyeonggi  0.600505   0.421986   1.423    0.155    
## Creak-N+Y:Mandarin-N+Y                   -0.046529   0.440034  -0.106    0.916    
## Creak-N+Y:Gender-F+M                      0.199968   0.324582   0.616    0.538    
## Creak-N+Y:Word-taju+tasha                 0.379098   0.299553   1.266    0.206    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

Mean and SD calculation of fortis/lenis tokens:

VOT:

fortis_tasha <- c(8.4, 12.5, 12.6, 22.4, 14.1, 14.2, 6.6, 6.4)
lenis_tasha <- c(75.8, 70.4, 74.7, 67.6, 60, 66)
fortis_tajut <- c(9, 9.7, 9.7, 10.8, 8.7, 10.6, 8, 8.2)
lenis_tajut <- c(63.4, 64, 64.2, 44.3, 58.9, 53, 64, 60)


mean(lenis_tasha)

## [1] 69.08333

sd(lenis_tasha)

## [1] 5.875514

mean(lenis_tajut)

## [1] 58.975

sd(lenis_tajut)

## [1] 7.065965

F0:

fortis_f0_tajut <- c(225, 235, 234, 218, 231, 233, 228, 237)
lenis_f0_tajut <- c(244, 248, 250, 236, 239, 249, 238, 245)

fortis_f0_tasha <- c(213, 219, 222, 215, 227, 229, 232, 232)
lenis_f0_tasha <- c(238, 220, 224, 230, 224, 220)

mean(lenis_f0_tasha)

## [1] 226

sd(lenis_f0_tasha)

## [1] 6.928203

mean(lenis_f0_tajut)

## [1] 243.625

sd(lenis_f0_tajut)

## [1] 5.370222

Transforming log-odds to odds + lower and upper 95% confidence intervals:

odds <- exp(2.58)
upper_ci <- exp(1.98218 + 2 * 0.19321)
lower_ci <- exp(1.98218 - 2 * 0.19321)

odds

## [1] 13.19714

lower_ci

## [1] 4.932076

upper_ci

## [1] 10.68243

rMA Thesis analysis scripts

Patrik Hrabánek

2024-05-23