IFA Dutch Spoken Language Corpus

© 2001, De Nederlandse Taalunie
This corpus was made possible by grant 355-75-001 of
the Netherlands Organization for Scientific Research

This service allows you to query a (PostgreSQL) database of the IFA Dutch Spoken Language Corpus. For information, please contact:Rob van Son R.J.J.H.vanSon@uva.nl.
Powered by CGIscriptor.

The copyrights to all materials presented here are owned by the Nederlandse Taalunie or R.J.J.H. van Son (where indicated) unless explicitely stated otherwise.
All material are licensed under the GNU General Public License (GPL). For information or further details on /License.txt">licensing, see here or contact me at the email address above.

The corpus contains phonemically segmented and labelled speech from 8 speakers. For each speaker, a Fixed text has been recorded in several "styles", and a retold version of the fixed text. Furthermore, each speaker told an Informal story face-to-face with an interviewer which was the basis of a speaker specific variable text corpus, which was read and retold by each speaker individualy. For more detailed information, see

Summary information:
Recorded and segmented speech of 8 speakers (net time in seconds)
SexAgeIDRecorded
sentences
Segmented
sentences
F20N37362674
F28G41803964
F40L31122466
F60E41813230
M15R21251430
M40K27201891
M56H28942368
M66O37811436
Total- -2673319465

A more extended paper describing the corpus can be found here (pdf) and here (pdf). This has been presented as a poster at the EUROSPEECH2001 conference. There is also a short "tour" available. This site runs on CGIscriptor perl inline scripting. There is a special page containing available research papers.

You can find information on the label protocol, the naming conventions, and the phoneme labels here (in Dutch) (PDF)and here (some of it in English).

Speaking styles are:

Recordings were made on two separate channels, from a fixed microphone (_fm) and from a head mounted microphone (_hm). Next to the sound files, some analysis results are available, e.g., the spectral Center of Gravity (CoG) files. The Spectral Center of Gravity is a compact way to represent spectral changes, especially changes in speech source (noise and voicing) and changes related to articulatory movements. These are stored as Praat ASCII Sound files. Sound files are CD quality (directly recorded on Audio CD), i.e., 44.1 kHz and 16 bit.

Rob van Son

Access to data

Here is a manual of our Web Interface with examples.