Paul Boersma's writings on categorization

Paul Boersma’s writings on categorization

1. The “warping” view

In the paper that presented the Gradual Learning Algorithm, I modelled perceptual categorization as a mapping from an infinite number of values along an auditory continuum (e.g. F1) to a finite number of values along that same continuum, with constraints like PERCEIVE, *CATEG and *WARP:

1997	How we learn variation, optionality, and probability. IFA Proceedings 21: 43–58. Additional material: Simulation script. Earlier version: Rutgers Optimality Archive 221, 1997/10/12 (incorrect!). Also appeared as: chapter 15 of Functional Phonology (1998).

2. Arbitrary categories

As pointed out by Robert Kirchner, the drawback of the “warping” approach was that it could not handle mappings from more than one continuum to a single phonological category. In work with Paola Escudero, who had investigated cue weighting in L2 acquisition, we therefore modelled perceptual categorization as a mapping from an auditory continuum to arbitrary abstract categories, with cue constraints like “an F1 of 360 Hz is not /i/”. The following papers model (in OT, with the GLA) the acquisition of the perception of the English /ɪ/–/i/ contrast by L1 and L2 learners on the basis of two auditory cues, namely F1 and duration:

2003

Paola Escudero & Paul Boersma:
Modelling the perceptual development of phonological contrasts with Optimality Theory and the Gradual Learning Algorithm.
In Sudha Arunachalam, Elsi Kaiser & Alexander Williams (eds.): Proceedings of the 25th Annual Penn Linguistics Colloquium. (Penn Working Papers in Linguistics 8.1), 71–85. [Abstract]
(Note: the editors messed up the pictures; read the preprint instead.)
Preprint: Rutgers Optimality Archive 439, 2001/04/26 (correctly printed)
Data and Praat scripts: the P2 web page.

2004

Paola Escudero & Paul Boersma:
Bridging the gap between L2 speech perception research and phonological theory.
Studies in Second Language Acquisition 26: 551–585.

3. Why are cue constraints formulated negatively?

Wouldn't it be easier to have positively formulated constraints like “an F1 of 360 Hz is /i/” rather than negatively formulated constraints like “an F1 of 360 Hz is not /i/”?

There are two reasons for the negative formulation. The first reason is that if the number of phonological categories is greater than 2, modelling of cue weighting will fail spectacularly. This is shown in the following paper, which models the acquisition of the perception of 9 Dutch vowels by L1 and L2 learners on the basis of two auditory cues, namely F1 and F2:

2008

Paul Boersma & Paola Escudero:
Learning to perceive a smaller L2 vowel inventory: an Optimality Theory account.
In Peter Avery, Elan Dresher & Keren Rice (eds.): Contrast in phonology: theory, perception, acquisition, 271–301. Berlin: Mouton de Gruyter.
Preprint: 2007/06/30, 26 pages.
Earlier version: Rutgers Optimality Archive 684, 2004/09/06.

The second reason is that their negative formulation allows the cue constraints to be used for modelling comprehension as well as production. For instance, the constraint “an F1 of 360 Hz is not /i/” logically means the same as “/i/ does not have an F1 of 360 Hz”, so that it can be used both for modelling perception and for modelling phonetic implementation. All my papers on Parallel Bidirectional Phonology and Phonetics (BiPhon) take advantage of this bidirectionality.

4. Distributional learning

The acquisition algorithm in the above examples was lexicon-driven learning of perception: if you perceive auditory event [x] as the category /y/, but the lexicon subsequently (i.e. after recognition) tells you that the speaker's intended category was /z/, then the constraint rankings are modified in such a way that the next time you hear [x] you will be more likely to perceive it as /z/ than before.

This acquisition algorithm relies on correct lexical representations. Infants do not have those; they can only learn from the distributions of auditory events alone. Rachel Hayes noted that the old *WARP family of constraints might be able to handle this:

2003/02/28

Paul Boersma, Paola Escudero & Rachel Hayes:
Learning abstract phonological from auditory phonetic categories: An integrated model for the acquisition of language-specific sound categories.
Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, 3–9 August 2003, pp. 1013–1016 (= Rutgers Optimality Archive 585).

5. Category emergence in neural networks

The Optimality-Theoretic simulations by Boersma, Escudero & Hayes (2003) suffered from the need for a “Category Creation Day”. In neural networks, by contrast, categories can emerge gradually.

Here this works along a single auditory continuum, using the inoutstar learning algorithm:

2020	Paul Boersma, Titia Benders & Klaas Seinhorst: Neural networks for phonology and phonetics. Journal of Language Modelling 8: 103–177.

And here it works along one or two auditory continua, using a Deep Boltzmann Machine learning algorithm:

2019	Paul Boersma: Simulated distributional learning in deep Boltzmann machines leads to the emergence of discrete categories. Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, 5–9 August 2019. 1520–1524.

Both of the above had only two levels of representation. Here is an example with three levels, in which things that look like phonological features emerge:

2022	Paul Boersma, Kateřina Chládková & Titia Benders: Phonological features emerge substance-freely from the phonetics and the morphology. Canadian Journal of Linguistics 67: 611–669 (a special issue on substance-free phonology).

Go to Paul Boersma’s home page