IFA Dialog Video corpus

All materials are licensed under the GNU general public license

© 2007, Nederlandse Taalunie
This corpus was made possible by grant 276-75-002 of
the Netherlands Organization for Scientific Research

Introduction

The IFA Dialog Video corpus is a collection of annotated video recordings of friendly Face-to-Face dialogs. It is modelled on the Face-to-Face dialogs in the Spoken Dutch Corpus (CGN). The procedures and design of the corpus were adapted to make this corpus useful for other researchers of Dutch speech. For this corpus 20 dialog conversations of 15 minutes we recorded and annotated, in total 5 hours of speech. To stay close to the very useful Face-to-Face dialogs in the CGN, pairs of well acquainted participants were selected, either good friends, relatives, or long-time colleagues. The participants were allowed to talk about any topic they wanted.

In total, 20 recordings were annotated to the same, or updated, standards as the original CGN. Only the initial orthographic transcription was done by hand. Other CGN-format annotations were only done automatically. Two other manual annotations were added, a functional annotation of dialog utterances and annotated gaze direction.

See also the LREC paper The IFADV corpus: A free dialog video corpus
(van Son, R., Wesseling, W., Sanders, E., and van den Heuvel, H. (2008). LREC'08, Marrakech)

Recordings

Recordings were made with two gen-locked JVC TK-C1480B analog color video cameras.
Specification:

Image pickup: 1/2 type IT CCD 752 (H) x 582 (V)
Synchronization: Internal Line Lock, Full Genlock
Scanning frequency: (H) 15.625kHz x (V) 50Hz
Resolution: 480 TV lines (H)
A: Ernitec GA4V10NA-1/2 lens (4-10mm)
B: Panasonic WV-LZ80/2 lens (6-12mm)

Gen-lock ensures synchronization of all frames of the two cameras. Recordings were digitized using two Canopus ADVC110 digital video converters. Recordings were stored unprocessed on disk, ie, in DV format with 48 kHz 16 bit PCM sound.

Each camera was positioned to the left of one speaker and focussed on the face of the other. Subjects wore a Samson QV head-set microphone.

Subjects first spoke some scripted sentences. Then they were instructed to speak freely while preferably avoiding sensitive material or identifying people by name. All subject signed an informed consent and transfered all copyrights to the Dutch Language Union (Nederlandse Taalunie).

Point your IMDI browser to: http://www.fon.hum.uva.nl/IFA-SpokenLanguageCorpora/IFADVcorpus/Annotations/IMDI/IFADVcorpus.imdi

Materials

Release note:
The original recordings contained dropped frames which made the two recordings of each dialog to become out-of-sync. This has been corrected by duplicating frames. This procedure is described in the SMILoverlay files. Only the corrected recordings are made available here. The original recordings, with the lacking frames, are available on request. Recordings are limited to 900 seconds (15 min) and corrected for dropped frames. That is, the video frames and audio files of both recordings are synchronized.

Compressed video, with automatically normalized brightness and contrast levels
- AVI: MPEG-4 Part 2/DivX3, encoded video recordings (~287 MB). Also normalized for sound volume.
  
  mencoder -quiet -af volnorm=2:0.25 -vf pp=autolevels:fullyrange -of avi -ovc lavc -lavcopts vcodec=msmpeg4:vbitrate=2400000:vhq:keyint=50 -oac mp3lame -o outfile.avi infile.dv;
- OGV: Ogg Theora encoded video recordings (~200 MB)
  
  ffmpeg2theora --format dv --videoquality 4 --sharpness 1 --pp autolevels:fullyrange --license GPLv2 -o outfile.ogv infile.dv

Speech files, extracted from the recordings.
- WAV: RIFF/WAV (uncompressed)
- SPX: SPEEX (low bitrate)
- OGG: Ogg Vorbis (compressed)
- _lb_OGG: Ogg Vorbis (low bitrate)
- FLAC: Free Lossless Audio Codec (compressed)
- _lb.MP3: MP3 (low bitrate)

Cropped Compressed and cropped (50% width) recordings and associated original sound files (WAV). Compression is identical to the AVI file in the Compressed video directory.
Filenames ending in "_pcm" contain the original PCM sound (48 kHz, 16 bit), those ending in "_mp3lame" contain MP3 sound. Please, we prefer if you download the MP3 files and the WAV files seperately instead of the larger PCM files. (Note that the .wav in this directory files should be identical to those in the Speech files directory. They are kept for historical reasons)

Annotations of the stereo sound of the A recording (ie, subject A on the left channel)
Labeling was done by Anita van Boxtel at SPEX under supervision of Eric Sanders and Henk van den Heuvel
- ort: Orthographic transcription (Praat TextGrid)
- awd: Automatic word and phoneme alignment of ort (Praat TextGrid)
- pos: Automatic Part-of-Speech labeling (CGN format)
- EAF: Elan annotations for gaze direction
- dbl: Normalized combination of other annotations (Praat TextGrid)
- scripts: Maintenance scripts for annotations
- Transcripts: Readable dialog transcripts
  Corresponding audio files can be found in the Speech files directory.
- Summaries of the dialogs
  Compiled by Stephanie Wagenaar
- IMDI files of the recordings
  Compiled by Maaike van Naerssen
(note that not all recordings are annotated)

DialogCorpus Corrected DV recordings.
VERY LARGE. Downloads will be truncated. If you need these recordings, please contact us.

Original Recordings Original raw DV recordings.
VERY LARGE. Downloads will be truncated. If you need these recordings, please contact us.

MD5sums for the recordings in DialogCorpus
(these are the same as can be found in the DialogCorpus directory)

scripts
Scripts used to record and process the dialogs

SMILoverlays
SMIL xml files to correct the dropped frames in the recordings in DialogCorpus

tables
Metadata on the recordings and the speakers.

Documents
Forms and published papers.

Annotation instructions (HTML)
Annotation instructions (Praat Manual)
Annotation instructions are in Dutch

IFA Dialog Video Corpus.
Copyright © 2007 Nederlandse Taal Unie
This corpus was made possible by grant 276-75-002 of the Netherlands Organization for Scientific Research
(Created by R.J.J.H. van Son and Wieneke Wesseling of the ACLC. Annotations were performed by SPEX)

Please note that these materials are distributed under the the GPLv2 license. This license only covers the Copyright protection of the corpus. Publishing or broadcasting of materials from this corpus might be covered by other laws, eg, laws protecting the privacy and "good name" of the subjects. This is especially relevant if the materials are used outside of an educational or R&D context. Please read the forms in the Documents directory for more information (in Dutch).

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.