OT learning 1. Kinds of grammars

This is chapter 1 of the OT learning tutorial.

According to Prince & Smolensky (1993), an Optimality-Theoretic (OT) grammar consists of a number of ranked constraints. For every possible input (usually an underlying form), GEN (the generator) generates a (possibly very large) number of output candidates, and the ranking order of the constraints determines the winning candidate, which becomes the single optimal output.

According to Prince & Smolensky (1993) and Smolensky & Legendre (2006), a Harmonic Grammar (HG) consists of a number of weighted constraints. The winning candidate, which becomes the single optimal output, is the one with the greatest harmony, which is a measure of goodness determined by the weights of the constraints violated by each candidate.

In OT, ranking is strict, i.e., if a constraint A is ranked higher than the constraints B, C, and D, a candidate that violates only constraint A will always be beaten by any candidate that respects A (and any higher constraints), even if it violates B, C, and D.

In HG, weighting is additive, i.e., a candidate that only violates a constraint A with a weight of 100 has a harmony of -100 and will therefore beat a candidate that violates both a constraint B with a weight of 70 and a constraint C with a weight of 40 and therefore has a harmony of only -110. Also, two violations of constraint B (harmony 2 * -70 = -140) are worse than one violation of constraint A (harmony -100).

### 1. Ordinal OT grammars

Because only the ranking order of the constraints plays a role in evaluating the output candidates, Prince & Smolensky took an OT grammar to contain no absolute ranking values, i.e., they accepted only an ordinal relation between the constraint rankings. For such a grammar, Tesar & Smolensky (1998) devised an on-line learning algorithm (Error-Driven Constraint Demotion, EDCD) that changes the ranking order whenever the form produced by the learner is different from the adult form (a corrected version of the algorithm can be found in Boersma (2009b)). Such a learning step can sometimes lead to a large change in the behaviour of the grammar.

### 2. Stochastic OT grammars

The EDCD algorithm is fast and convergent. As a model of language acquisition, however, its drawbacks are that it is extremely sensitive to errors in the learning data and that it does not show realistic gradual learning curves. For these reasons, Boersma (1997) proposed stochastic OT grammars in which every constraint has a ranking value along a continuous ranking scale, and a small amount of noise is added to this ranking value at evaluation time. The associated error-driven on-line learning algorithm (Gradual Learning Algorithm, GLA) effects small changes in the ranking values of the constraints with every learning step. An added virtue of the GLA is that it can learn languages with optionality and variation, which was something that EDCD could not do. For how this algorithm works on some traditional phonological problems, see Boersma & Hayes (2001).

Ordinal OT grammars can be seen as a special case of the more general stochastic OT grammars: they have integer ranking values (strata) and zero evaluation noise. In Praat, therefore, every constraint is taken to have a ranking value, so that you can do stochastic as well as ordinal OT.

### 3. Categorical Harmonic Grammars

Jäger (2003) and Soderstrom, Mathis & Smolensky (2006) devised an on-line learning algorithm for Harmonic Grammars (stochastic gradient ascent). As proven by Fischer (2005), this algorithm is guaranteed to converge upon a correct grammar, if there exists one that handles the data.

### 4. Stochastic Harmonic Grammars

There are two kinds of stochastic models of HG, namely MaxEnt (= Maximum Entropy) grammars (Smolensky (1986), Jäger (2003)), in which the probablity of a candidate winning depends on its harmony, and Noisy HG (Boersma & Escudero (2008), Boersma & Pater (2016)), in which noise is added to constraint weights at evaluation time, as in Stochastic OT.

The algorithm by Jäger (2003) and Soderstrom, Mathis & Smolensky (2006) can learn languages with optionality and variation (Boersma & Pater (2016)).

### The OTGrammar object

An OT grammar is implemented as an OTGrammar object. In an OTGrammar object, you specify all the constraints, all the possible inputs and all their possible outputs.