Logistic regression

Logistic regression

This page explains how you do logistic regression with Praat. You start by saving a table in a text file (if it contains non-ASCII symbols such as æ or ɛ, use the UTF-8 or UTF-16 format). The following example contains natural stimuli (female speaker) with measured F1 and duration values, and the responses of a certain listener who is presented each stimulus 10 times.

       F1    Dur   /æ/   /ɛ/

       764    87    2     8

       674   104    3     7

       574   126    0    10

       566    93    1     9

       618   118    1     9

      1025   147   10     0

       722   117    7     3

       696   169    9     1

      1024   124   10     0

       752    92    6     4

In this table we see 10 different stimuli, each characterized by a certain combination of the factors (independent variables) F1 (first formant in Hertz) and Dur (duration in milliseconds). The first row of the table means that there was a stimulus with an F1 of 764 Hz and a duration of 87 ms, and that the listener responded to this stimulus 2 times with the response category /æ/, and the remaining 8 times with the category /ɛ/.

A table as above can be typed into a text file. The columns can be separated with spaces and/or tab stops. The file can be read into Praat with Read Table from table file.... The command To logistic regression... will become available in the Statistics menu.

What does it do?

The logistic regression method will find values α, β_F1 and β_dur that optimize

α + β_F1 F1_k + β_dur Dur_k = ln (p_k(/ɛ/)/p_k(/æ/))

where k runs from 1 to 10, and p_k(/æ/) + p_k(/ɛ/) = 1.

The optimization criterion is maximum likelihood, i.e. those α, β_F1 and β_dur will be chosen that lead to values for p_k(/æ/) and p_k(/ɛ/) that make the observations in the table most likely.

Praat will create an object of type LogisticRegression in the list. When you then click the Info button, Praat will write the values of α (the intercept), β_F1 and β_dur into the Info window (as well as much other information).

The number of factors does not have to be 2; it can be 1 or more. The number of dependent categories is always 2.

What does it do?

Links to this page