SpellingChecker

One of the types of objects in Praat. For checking the spelling in texts and TextGrid objects.

1. How to create a SpellingChecker object

You normally read in a SpellingChecker with Read from file... from the Open menu.

2. How to check the spelling of a TextGrid

A SpellingChecker object can be used fur purposes of spelling checking. In order to check the spellings in a TextGrid object, you first view the TextGrid in an editor window by selecting the TextGrid together with the SpellingChecker object, and clicking View & Edit. In most cases, you will also want to select a Sound or LongSound object before clicking View & Edit, so that a representation of the sound is also visible (and audible) in the editor. Thus, you typically select three objects and click View & Edit. The editor then allows you to check the spellings (commands Check spelling in tier and Check spelling in interval from the Spell menu).

3. How to create a SpellingChecker object for the first time

If you are the maintainer of a word list for spelling checking, you will want to convert this list to a SpellingChecker object that you can distribute among the transcribers of your corpus.

The first step is to create a WordList object from your text file, as described on the WordList man page. Then you simply click To SpellingChecker. A button labelled Edit... appears. This command allows you to set the following attributes of the SpellingChecker object:

Allow all parenthesized
this flag determines whether text between parentheses is ignored in spelling checking. This would allow the transcriber to mark utterances in foreign languages, which cannot be found in the lexicon.
Separating characters
determines the set of characters (apart from the space character) that separate words. The standard is ".,;:()". If a string like "error-prone" should be considered two separate words, you will like to change this to ".,;:()-". For the Corpus of Spoken Dutch (CGN), the hyphen is not a separator, since words like "mee-eter" should be checked as a whole. If a string like "Mary's" should be considered two separate words, include the apostrophe.
Allow all names
determines whether all words that start with a capital are allowed. For the CGN, this is on, since the lexicon does not contain many names.
Name prefixes
a space-separated list that determines what small groups of characters can precede names. For the CGN, this is "'s- d' l'", since names like 's-Gravenhage, d'Ancona, and l'Hôpital should be ignored by the spelling checker.
Allow all words containing
a space-separated list of strings that make a word correct even if not in the lexicon. For the CGN, this is "* xxx", since words like keuje*d and verxxxing should be ignored by the spelling checker.
Allow all words starting with
a space-separated list of prefixes that make a word correct even if not in the lexicon. For the CGN, this is empty.
Allow all words ending in
a space-separated list of suffixes that make a word correct even if not in the lexicon. For the CGN, this is "-", since the first word in verzekerings- en bankwezen should be ignored by the spelling checker.

Links to this page


© ppgb, January 28, 2011