OT learning 6. Shortcut to grammar learning

Once you have mastered the tedious procedures of making Praat learn stochastic grammars, as described in the previous chapters of this tutorial, you can try a faster procedure, which simply involves selecting an OTGrammar object together with a PairDistribution object, and clicking Learn.... Once you click OK, Praat will feed the selected grammar with input/output pairs drawn from the selected distribution, and the grammar will be modified every time its output is different from the given output. Here is the meaning of the arguments:

Evaluation noise (standard value: 2.0)
the standard deviation of the noise added to the ranking of each constraint at evaluation time.
Strategy (standard value: Symmetric all)
what to do when the learner's output is different from the given output. Possibilities:
    Demotion only: lower the ranking of every constraint that is violated more in the correct output than in the learner's output. This algorithm crashes if there is variation in the data, i.e. if some inputs can have more than one possible adult outputs.
    Symmetric one: lower the ranking of the highest-ranked constraint that is violated more in the adult output than in the learner's output, and raise the ranking of the highest-ranked constraint that is violated more in the learner's output than in the adult output. This is the "minimal" algorithm described and refuted in Boersma (1998), chapters 14-15.
    Symmetric all: lower the ranking of all constraints that are violated more in the adult output than in the learner's output, and raise the ranking of all constraints that are violated more in the learner's output than in the adult output. This is the algorithm described in Boersma & Hayes (2001).
    Weighted uncancelled: the same as "Symmetric all", but the size of the learning step is divided by the number of moving constraints. This makes sure that the average ranking of all the constraints is constant.
    Weighted all: the "Symmetric all" strategy can reworded as follows: "lower the ranking of all constraints that are violated in the adult output, and raise the ranking of all constraints that are violated in the learner's output". Do that, but divide the size of the learning step by the number of moving constraints.
    EDCD: Error-Driven Constraint Demotion, the algorithm described by Tesar & Smolensky (1998). All constraints that prefer the adult form and are ranked above the highest-ranked constraint that prefers the learner's form, are demoted to the ranking of that last constraint minus 1.0.
Initial plasticity (standard value: 1.0)
Replications per plasticity (standard value: 100000)
Plasticity decrement (standard value: 0.1)
Number of plasticities (standard value: 4)
these four arguments determine the learning scheme, i.e. the number of times the grammar will receive data at a certain plasticity. With the standard values, there will be 100000 data while the plasticity is 1.0 (the initial plasticity), 100000 data while the plasticity is 0.1, 100000 data while the plasticity is 0.01, and 100000 data while the plasticity is 0.001. If you want learning at a constant plasticity, set the number of plasticities to 1. Note that for the decision strategies of HarmonicGrammar, LinearOT, PositiveHG or MaximumEntropy the learning step for a constraint equals the plasticity multiplied by the difference between the numbers of violations of this constraint in the adult output and in the learner's output.
Rel. plasticity spreading (standard value: 0.1)
if this is not 0, the size of the learning step will vary randomly. For instance, if the plasticity is set to 0.01, and the relative plasticity spreading is 0.1, you will get actual learning steps that could be anywhere between 0.007 and 0.013, according to a Gaussian distribution with mean 0.01 and standard deviation 0.001.
Honour local rankings (standard value: on)
if this is on, the fixed rankings that you supplied in the grammar will be maintained during learning: if a constraint falls below a constraint that is supposed to be universally lower-ranked, this second constraint will be demoted as well.
Number of chews (standard value: 1)
the number of times that each input-output pair is fed to the grammar. Setting this number to 20 will give a slightly different (perhaps more accurate) result than simply raising the plasticity by a factor of 20.

Links to this page


© ppgb, May 23, 2007