OT learning 3.1. Data from a pair distribution

If the grammar contains faithfulness constraints, the learner needs pairs of underlying and adult surface forms. For our place assimilation example, she needs a lot of /at+ma/ - [atma] pairs, and four times as many /an+pa/ - [ampa] pairs as /an+pa/ - [anpa] pairs. We can specify this language-data distribution in a PairDistribution object, which we could simply save as a text file:

"ooTextFile"
"PairDistribution"
4 pairs
"at+ma" "atma" 100
"at+ma" "apma" 0
"an+pa" "anpa" 20
"an+pa" "ampa" 80

The values appear to represent percentages, but they could also have been 1.0, 0.0, 0.2, and 0.8, or any other values with the same proportions. We could also have left out the second pair and specified "3 pairs" instead of "4 pairs" in the third line.

We can create this pair distribution with Create place assimilation distribution from the Optimality Theory submenu of the New menu in the Objects window. To see that it really contains the above data, you can draw it to the Picture window. To change the values, use Inspect (in which case you should remember to click Change after any change).

To generate input-output pairs from the above distribution, select the PairDistribution and click To Stringses.... If you then just click OK, there will appear two Strings objects in the list, called "input" (underlying forms) and "output" (surface forms). Both contain 1000 strings. If you Inspect them both, you can see that e.g. the 377th string in "input" corresponds to the 377th string in "output", i.e., the two series of strings are aligned. See also the example at PairDistribution: To Stringses....

These two Strings objects are sufficient to help an OTGrammar grammar to change its constraint rankings in such a way that the output distributions generated by the grammar match the output distributions in the language data. See §5.

Links to this page


© ppgb, January 31, 2011