WordList

One of the types of objects in Praat. An object of class WordList contains a sorted list of strings in a system-independent format. WordList objects can be used for spelling checking after conversion to a SpellingChecker object.

1. How to create a WordList object

You will normally create a WordList object by reading a binary WordList file. You'll use the generic Read from file... command from the Open menu.

See below under 3 for how to create such a file.

2. What you can do with a Wordlist object

The main functionality of a WordList is its ability to tell you whether it contains a certain string. If you select a WordList, you can query the existence of a specific word by using the Has word command. You supply the word and press OK. If the WordList does contain the word, the value "1" will be written to the Info window; otherwise, the value "0" will be written.

3. How to create a binary WordList file

You can create a binary WordList file from a simple text file that contains a long list of words. Perhaps such a text file has been supplied by a lexicographic institution in your country; because of copyright issues, such word lists cannot be distributed with the Praat program. To convert the simple text file into a compressed WordList file, you basically take the following steps:

    Read Strings from raw text file: "lexicon.txt"
    Sort
    To WordList
    Save as binary file: "lexicon.WordList"

I'll explain these steps in detail. For instance, a simple text file may contain the following list of words:

    cook
    cooked
    cookie
    cookies
    cooking
    cooks
    Copenhagen
    København
    München
    Munich
    ångström

These are just 11 words, but the procedure will work fine if you have a million of them, and enough memory in your computer.

You can read the file into a Strings object with Read Strings from raw text file... from the Open menu in the Objects window. The resulting Strings object contains 11 strings in the above order, as you can verify by viewing them with Inspect.

If you select the Strings, you can click the To WordList button. However, you will get the following complaint:

       String "Copenhagen" not sorted. Please sort first.

This complaint means that the strings have not been sorted in Unicode sorting order. So you click Sort, and the Strings object becomes:

    Copenhagen
    København
    Munich
    München
    cook
    cooked
    cookie
    cookies
    cooking
    cooks
    ångström

The strings are now in Unicode sorting order, in which capitals come before lower-case letters, and composite characters follow the latter.

Clicking To WordList now succeeds, and a WordList object appears in the list. If you save it to a text file (with the Save menu), you will get the following file:

    File type = "ooTextFile"
    Object class = "WordList"
   
    string = "Copenhagen
    København
    Munich
    München
    cook
    cooked
    cookie
    cookies
    cooking
    cooks
    ångström\"

Note that any double quotes (") that appear inside the strings, will be doubled, as is done everywhere inside strings in Praat text files.

After you have created a WordList text file, you can create a WordList object just by reading this file with Read from file... from the Open menu.

The WordList object has the advantage over the Strings object that it won't take up more memory than the original word list. This is because the WordList is stored as a single string: a contiguous list of strings, separated by new-line symbols.


© ppgb 20190616