IFA Dutch Spoken Language Corpus

© 2001, De Nederlandse Taalunie
This corpus was made possible by grant 355-75-001 of
the Netherlands Organization for Scientific Research

This service allows you to query a (PostgreSQL) database of the IFA Dutch Spoken Language Corpus. For information, please contact:Rob van Son R.J.J.H.vanSon@uva.nl.
Powered by CGIscriptor.

The copyrights to all materials presented here are owned by the Nederlandse Taalunie or R.J.J.H. van Son (where indicated) unless explicitely stated otherwise.
All material are licensed under the GNU General Public License (GPL). For information or further details on /License.txt">licensing, see here or contact me at the email address above.

The corpus contains phonemically segmented and labelled speech from 8 speakers. For each speaker, a Fixed text has been recorded in several "styles", and a retold version of the fixed text. Furthermore, each speaker told an Informal story face-to-face with an interviewer which was the basis of a speaker specific variable text corpus, which was read and retold by each speaker individualy. For more detailed information, see

Summary information:
Recorded and segmented speech of 8 speakers (net time in seconds)
Sex Age ID Recorded
sentences Segmented
sentences

F 20 N 3736 2674

F 28 G 4180 3964

F 40 L 3112 2466

F 60 E 4181 3230

M 15 R 2125 1430

M 40 K 2720 1891

M 56 H 2894 2368

M 66 O 3781 1436

Total - - 26733 19465

Summary information:
Recorded and segmented speech of 8 speakers (net time in seconds)
Sex	Age	ID	Recorded sentences	Segmented sentences
F	20	N	3736	2674
F	28	G	4180	3964
F	40	L	3112	2466
F	60	E	4181	3230
M	15	R	2125	1430
M	40	K	2720	1891
M	56	H	2894	2368
M	66	O	3781	1436
Total	-	-	26733	19465

A more extended paper describing the corpus can be found here (pdf) and here (pdf). This has been presented as a poster at the EUROSPEECH2001 conference. There is also a short "tour" available. This site runs on CGIscriptor perl inline scripting. There is a special page containing available research papers.

You can find information on the label protocol, the naming conventions, and the phoneme labels here (in Dutch) (PDF)and here (some of it in English).

Speaking styles are:

Informal: An elicited story about a vacation trip told to an interviewer (face to face)
Retold: A previously read story (a fixed fairy tale or the vacation trip) retold in an empty room
Read: A long text read from a cueing screen
Sentence: Isolated sentences read from a cueing screen
Pseudo Sentence: Non-sentences, constructed by stringing randomly picked words, read from a cueing screen
Word: Word lists read from a cueing screen
Syllable: Syllable lists read from a cueing screen
Pronunciation: A list of pronunciation test (see following items)
ABC: The alphabet read from a cueing screen
NUM: The numbers from 0-12 read from a cueing screen
VOW: Isolated vowels read from a cueing screen
HVD: Vowels in H_D context read from a cueing screen
VCV: Isolated intervocalic consonants read from a cueing screen

Recordings were made on two separate channels, from a fixed microphone (_fm) and from a head mounted microphone (_hm). Next to the sound files, some analysis results are available, e.g., the spectral Center of Gravity (CoG) files. The Spectral Center of Gravity is a compact way to represent spectral changes, especially changes in speech source (noise and voicing) and changes related to articulatory movements. These are stored as Praat ASCII Sound files. Sound files are CD quality (directly recorded on Audio CD), i.e., 44.1 kHz and 16 bit.

Rob van Son

Access to data

Here is a manual of our Web Interface with examples.

Descriptive statistics on segmentation (compiled data)
Sentences (raw speech)
Audio fragments (labeled speech)
Texts (with speech)
Single database records (direct access to database records)
IMDI access (use the IMDI XML interface)

IFA Dutch Spoken Language Corpus

© 2001, De Nederlandse Taalunie This corpus was made possible by grant 355-75-001 of the Netherlands Organization for Scientific Research

Access to data

Here is a manual of our Web Interface with examples.

© 2001, De Nederlandse Taalunie
This corpus was made possible by grant 355-75-001 of
the Netherlands Organization for Scientific Research