OT learning 7. Learning from overt forms

OT learning 7. Learning from overt forms

In order to be able to learn phonological production, both EDCD and GLA require pairs of underlying form and surface form. However, the language-learning child hears neither of these forms: she only hears overt forms, with less structural information than the underlying and surface forms contain.

Interpretive parsing

The language-learning child has to construct both the surface form and the underlying form from the overt form that she hears. Tesar & Smolensky (1998) proposed that the child computes a surface form from the overt form by using the same constraint ranking as in production. For instance, the overt form [σ σ σ], which is a sequence of three syllables with stress on the second syllable, will be interpreted as the surface form /(σ σˈ) σ/ in iambic left-aligning languages (IAMBIC >> TROCHAIC, and ALLFEETLEFT >> ALLFEETRIGHT), but as the surface form /σ (σˈ σ)/ in trochaic right-aligning languages. Tesar & Smolensky call this procedure Robust Interpretive Parsing, because it works even if the listener's grammar would never produce such a form. For instance, if IAMBIC >> ALLFEETRIGHT >> TROCHAIC >> ALLFEETLEFT, the listener herself would produce the iambic right-aligned /σ (σ σˈ)/ for any trisyllabic underlying form, though she will still interpret [σ σˈ σ] as /(σ σˈ) σ/, which is illegal in her own grammar. Hearing forms that are illegal in one's own grammar is of course a common situation for language-learning children.

In Tesar & Smolensky's view, the underlying form can be trivially computed from the surface form, since the surface form contains enough information. For instance, the surface form /(σ σˈ) σ/ must lead to the underlying form |σ σ σ| if all parentheses and stress marks are removed. Since McCarthy & Prince (1995), this containment view of surface representations has been abandoned. In Praat, therefore, the underlying form is not trivially computed from the surface form, but all the tableaus are scanned for the surface form that violates the least high-ranked constraints (in the usual OT sense), as long as it contains the given overt form. For instance, if IAMBIC >> ALLFEETRIGHT >> TROCHAIC >> ALLFEETLEFT, the overt form [σ σˈ σ] occurs in two candidates: the surface form /(σ σˈ) σ/ in the tableau for the underlying form |σ σ σ|, and the surface form /σ (σˈ σ)/ in the tableau for the underlying form |σ σ σ|. The best candidate is the surface form /(σ σˈ) σ/ in the tableau for the underlying form |σ σ σ|. Hence, Praat's version of Robust Interpretive Parsing will map the overt form [σ σˈ σ] to the underlying form |σ σ σ| (the ‘winning tableau’) and to the surface form /(σ σˈ) σ/ (to be sure, this is the same result as in Tesar & Smolensky's version, but crucial differences between the two versions will appear when faithfulness constraints are involved).

In Praat, you can do interpretive parsing. For example, create a grammar with Create metrics grammar... from the New menu. Then choose Get interpretive parse... from the Query submenu and supply "[L1 L L]" for the overt form, which means a sequence of three light syllables with a main stress on the first. The Info window will show you the optimal underlying and surface forms, given the current constraint ranking.

Learning from partial forms

Now that the learning child can convert an overt form to an underlying-surface pair, she can compare this surface form to the surface form that she herself would have derived from this underlying form. For instance, If IAMBIC >> ALLFEETRIGHT >> TROCHAIC >> ALLFEETLEFT, the winning tableau is |σ σ σ|, and the perceived adult surface form is /(σ σˈ) σ/. But from the underlying form |σ σ σ|, the learner will derive /σ (σ σˈ)/ as her own surface form. The two surface forms are different, so that the learner can take action by reranking one or more constraints, perhaps with EDCD or GLA.

In Praat, you can learn from partial forms. Select the metrics grammar and choose Learn from one partial output..., and supply "[L1 L L]". If you do this several times, you will see that the winner for the tableau "|L L L|" will become one of the two forms with overt part "[L1 L L]".

To run a whole simulation, you supply a Distributions object with one column, perhaps from a text file. The following text file shows the overt forms for Latin, with the bisyllabic forms occurring more often than the trisyllabic forms:

   "ooTextFile"

   "Distributions"

   1 column with numeric data

      "Latin"

   28 rows

   "[L1 L]" 25

   "[L1 H]" 25

   "[H1 L]" 25

   "[H1 H]" 25

   "[L1 L L]" 5

   "[H1 L L]" 5

   "[L H1 L]" 5

   "[H H1 L]" 5

   "[L1 L H]" 5

   "[H1 L H]" 5

   "[L H1 H]" 5

   "[H H1 H]" 5

   "[L L1 L L]" 1

   "[L H1 L L]" 1

   "[L L H1 L]" 1

   "[L H H1 L]" 1

   "[L L1 L H]" 1

   "[L H1 L H]" 1

   "[L L H1 H]" 1

   "[L H H1 H]" 1

   "[H L1 L L]" 1

   "[H H1 L L]" 1

   "[H L H1 L]" 1

   "[H H H1 L]" 1

   "[H L1 L H]" 1

   "[H H1 L H]" 1

   "[H L H1 H]" 1

   "[H H H1 H]" 1

Read this file into Praat with Read from file.... A Distributions object then appears in the object list. Click To Strings..., then OK. A Strings object containing 1000 strings, drawn randomly from the distribution, with relative frequencies as in the text file, will appear in the list. Click Inspect to check the contents.

You can now select the OTGrammar together with the Strings and choose Learn from partial outputs.... A thousand times, Praat will construct a surface form from the overt form by interpretive parsing, and also construct the underlying form in the same way, from which it will construct another surface form by evaluating the tableau. Whenever the two surface forms are not identical, some constraints will be reranked. In the current implementation, the disharmonies for interpretive parsing and for production are the same, i.e., if the evaluation noise is not zero, the disharmonies are randomly renewed before each interpretive parsing but stay the same for the subsequent virtual production.

Links to this page

OT learning