IFA Dutch Spoken Language Corpus: Raw Texts

You can query our database with the raw text fragments. For those fragments for which speech has been "processed", links to the compressed recordings are available (Ogg Vorbis format).

The texts are mostly the reading texts presented to the speakers and only a few transliterations of the retold stories (Retold style, not present for all speakers). Only the retold stories come close to a real transcript of the speech, but they too contain differences with the "real" transliterations.

All texts are repetitive. Only the Variable Informal style and its transliterations, Variable Text and Variable Sentence, are unique for each speaker. Note that the sentences in these three "styles" are identical in the text files (Variable Informal and Variable Text are identical).

For selecting the relevant texts, please choose one or more speakers, text types (Fixed or Variable) and one or more speaking styles. You can narrow down the selection by entering regular expressions on the ID code of the fragment and the contents of the Text. You can also select fragments based on the number of words and characters.

Select the speech material (select Speaker: F/M - female/male, Age, ID A-Z; Text Type: Fixed or Variable; and Speaking Styles):

Speaker
Text material
Speaking Style
Narrow down the selection of files:
ID does (~) "CHECKED"> or does NOT (!~) "CHECKED"> match regexp
<= Number of Characters <=
<= Number of Words <=
Text does (~) "CHECKED"> or does NOT (!~) "CHECKED"> match regexp

Selected Files
(look at the bottom of the page for collected TAR archives)


Sizes are uncompressed