
The Gradual Learning Algorithm (GLA) is a learning algorithm for Stochastic Optimality Theory.
The following paper defines the GLA and applies it to lexicondriven learning of perceptual categorization.
1997  How we learn variation, optionality, and probability. IFA Proceedings 21: 4358. Additional material: Simulation script. Earlier version: Rutgers Optimality Archive 221, 1997/10/12 (incorrect!). Also appeared as: chapter 15 of Functional Phonology (1998). 
You can do your own GLA simulations with the Praat program. This also does Tesar & Smolensky’s EDCD as well as "learning from partial representations" (Robust Interpretive Parsing, both for EDCD and GLA). An old version of the manual is:
1999  OptimalityTheoretic learning in the Praat program. IFA Proceedings 23: 1735 (= Rutgers Optimality Archive 380). 
A more recent version is available under Help in the Praat program.
The following paper tests the GLA on fully informed learning of production in Ilokano, Finnish and English.
2001  Paul Boersma & Bruce Hayes: Empirical tests of the Gradual Learning Algorithm. Linguistic Inquiry 32: 4586. [copyright] Earlier version: Rutgers Optimality Archive 348, 1999/09/29. 
This article contains several simulations. You can download data and simulation scripts for use with the Praat program, which allow you to replicate these simulations:
The following paper tests the GLA on the acquisition of basic syllable structure in Dutch.
2000  Paul Boersma & Clara Levelt: Gradual constraintranking learning algorithm predicts acquisition order. Proceedings of Child Language Research Forum 30, Stanford, California, pp. 229237. Note: the editors messed up the phonetic symbols; the above is the correct version (Rutgers Optimality Archive 361, 1999/08/28). 
The GLA is also used in all my papers on categorization, recognition, metrical phonology (partly informed learning of production), grammaticality judgments, and Phonology and Phonetics in Parallel.
The GLA is not guaranteed to learn all languages that can be generated by a Stochastic OT grammar. It has been known from early on that some cases of variation with constraints with complementary violations do not converge, because the search space for the learning algorithm has fewer dimensions than N1 (where N is the number of constraints). In 2005, Joe Pater even found a case where the GLA does not converge on a categorical (nonvariation) language that can be represented in OT. I designed the script GLAsuccessRate.praat to find out how likely such a situation is to occur. This script will show you the following facts about learning algorithms:
A relativizing remark: Boersma (2003) shows that these numbers may be rather different if there is a hidden representation between the input and the output. For Tesar & Smolensky’s example with 12 metrical constraints, GLA learns 70 percent of Tesar & Smolensky’s 124 languages correctly, whereas (corrected) EDCD only learns 60 percent; by the way, stochastic gradient ascent for Harmonic Grammar learns 80 percent. See 7 below.
An analogous script for variation cases is GLAsuccessRateVar.praat.
Some of the above conclusions are tested in EDCDsuccessRate.praat, and described in:
2009 
Some correct errordriven versions of the Constraint Demotion algorithm. Linguistic Inquiry 40: 667686. [copyright] Earlier version: Rutgers Optimality Archive 980, 2008/07/01 (accepted revision, with proof). Earlier version: Rutgers Optimality Archive 953, 2008/03/07 (original submission, without proof). 
A correctness proof for a learning algorithm for Harmonic Grammar (identical to the one used by Soderstrom, Mathis & Smolensky 2006) is described in:
2016  Paul Boersma & Joe Pater: Convergence properties of a gradual learning algorithm for Harmonic Grammar. [preprint, 2013/03/13] In John McCarthy & Joe Pater (eds.): Harmonic Serialism and Harmonic Grammar, 389434. Sheffield: Equinox. Earlier version: Rutgers Optimality Archive 970, 2008/05/21. 
Go to Paul Boersma’s home page