Unicode

Praat is becoming a fully international program: the texts in Praat's TextGrids, Tables, scripts, or Info window (and elsewhere) can contain many types of characters (see special symbols). For this reason, Praat saves its text files in one of two possible formats: ASCII or UTF-16.

ASCII text files

If your TextGrid (or Table, or script, or Info window...) contains only characters that can be encoded as ASCII, namely the characters !"#$%&’()*+,-./0123456789:;<=>?@ ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_` abcdefghijklmnopqrstuvwxyz{|}~, then when you say Save as text file... or Save, Praat will write an ASCII text file, which is a text file in which every character is encoded in a single byte (8 bits). All programs that can read plain text files can read such files produced by Praat.

UTF-16 text files

If your TextGrid (or Table, or script, or Info window...) contains one or more characters that cannot be encoded as ASCII, for instance West-European characters such as åçéöß¿, East-European characters such as čłőšůź, or Hebrew characters such as אבגםוֹוּ, then when you say Save as text file... or Save, Praat will write an UTF-16 text file, which is a text file in which every character is encoded in two bytes (and some very rare characters in four bytes). Many programs can read such text files, for instance NotePad, WordPad, Microsoft Word, and TextWrangler.

What if my other programs cannot read UTF-16 text files?

If you want to export your Table to Microsoft Excel or to SPSS, or if you want your TextGrid file to be read by somebody else's Perl script, then there will be no problem if your Table contains only ASCII characters (see above). But if your Table contains any other (i.e. non-ASCII) characters, you may be in trouble, because Praat will write the Table as an UTF-16 text file, and not all of the programs just mentioned can read such files yet.

What you can do is go to Text writing preferences... in the Preferences submenu of the Praat menu, and there set the output encoding to UTF-8. Praat will from then on save your text files in the UTF-8 format, which means one byte for every ASCII character and 2 to 4 bytes for every non-ASCII character. Especially on Linux, many programs understand UTF-8 text and will display the correct characters. Programs such as SPSS do not understand UTF-8 but will still display ASCII characters correctly; for instance, the names München and Wałęsa may appear as MÃ÷nchen and WaÅ,Ä™sa or so.

If you can get by with West-European characters (on Windows), then you may choose try ISO Latin-1, then UTF-16 for the output encoding. It is possible (but not guaranteed) that programs like SPSS then display your West-European text correctly. This trick is of limited use, because it will not work if your operating system is set to a "codepage" differently from ISO Latin-1 (or "ANSI"), or if you need East-European or Hebrew characters, or if you want to share your text files with Macintosh users.

If you already have some UTF-16 text files and you want to convert them to UTF-8 or ISO Latin-1 (the latter only if they do not contain non-West-European characters), then you can read them into Praat and save them again (with the appropriate output encoding setting). Other programs, such a NotePad and TextWrangler, can also do this conversion.

Finally, it is still possible to make sure that all texts are ASCII, e.g. you type the characters ß and ő as \ss and \o: respectively. See special symbols.

Links to this page


© ppgb, January 29, 2011