File identification codes
|
|
The naming convention for speech fragments and files. All fragments have the form "F 28 G 1 FT 1 A ...". These identify the speaker (sex, age, and ID), the recording (1 or 2), the speaking style/text type, and the item in this recording (slide number, sentence, word, etc.) down to the individual phoneme.
The coding scheme
The coding scheme was designed to enable the unique and descriptive
identification of any part of any recording. Most importantly, it must
allow the insertion and deletion of descriptions with minimal effects on
other parts. Furthermore, the codes should be 'decodable' by humans so
they can verify their validity. That is, using a simple list, the item
that belongs to a code should be recognisable. These aims were reached by
a hierarchical code that uses an alternation of alphabetical and numerical
codes. From any code, it is easy to obtain the code of item that contains
it (e.g., the sentence that contains a word, or the word that contains a
phoneme). Furthermore, removing or inserting an item can be done without
changing other codes. However, the hierarchical nature of the code (which
makes it understandible) forces a reassignment of an item (e.g., a phoneme
from one syllable to another) to become a deletion + insertion if the codes
should remain legible (which is not necessary). However, even in these
cases, changes have only a very localized effect.
Speech Style Identification Codes
Fixed content
-
FR: Fixed text Retold
-
FT: Fixed Text read aloud
-
FS: Fixed Sentences
-
PS: Pseudo (unpredictable) Sentences
-
FW: Fixed Word list
-
FY: Fixed sYllable list
-
FPA: fixed Pronounciation list A
-
FPB: fixed Pronounciation list B
-
FPA1ABC: AlphaBeth
-
FPA1NUM: NUMericals
-
FPA1VOW: VOWels
-
FPA1HVD: Vowels in /hVd/ context
-
FPA1VCV: Consonants in Vowel context
Variable content
-
VI: Variable Informal story
-
VR: Variable story Retold
-
VT: Variable Text read aloud
-
VS: Variable Sentences
-
VW: Variable Word list
-
VY: Variable sYllable list
Other Sounds
-
G: Gauge signal ([12], Tone/Noise, e.g., G1N or G2T)
File Identification Code
Identify a speech segment by speaker, session, task, slide, and position
(uses Regular Expression Codes, parse with [A-Z\ -\ ,]+ and [0-9\ -\ ,]+:
[...] = excactly 1, [...]+ = one or more, [...]? = 0 or 1)
-
Sex [MF] : Male / Female (Boy/Girl?) This is a descriptive mnemonic. It indicates fundamental characteristics of the speaker and can be extended, e.g., 'LF', 'DM', or 'XF' could indicate 'Larynchectomized Female', 'Deaf Male', or 'Sex Change Female' (transsexual Male -> Female) respectively. In multilingual databases it should be preceded by a language code, e.g., 'DUF' for a Dutch speaking Female. Note: This can change between recordings, it is not speaker specific.
The general expression is '[A-Z]+'.
-
Age [0-9]+ : in years. Use a '-' when indicating months, e.g.,
1-4 means 1 year and 4 months. This can change between recordings.
It is NOT intended to identify a specific speaker.
-
Subject [A-Z]+ : just an identification code. Add a unique corpus
code (e.g., 'IFA') in front of the speaker ID code (e.g., 'IFAN'
or 'IFA-N') for data exchange. This should identify a particular
speaker.
-
Recording [12] : the number of the recording session (1 or 2, more
if needed)
Speech Identification Code (see relevant codes)
-
Chunk/Slide [0-9]+ : the number of the chunk of speech based on the
slide number or some paragraph number
-
Sentence [A-Z]+ : just the position of the sentence in the chunk, 'A',
'B', 'C' etc.. 'A' if there is no sentence (as in word lists), use 'AA'
... 'ZZ' etc. if needed
-
Word [0-9]+ : the number of the word in the sentence (or the chunk
or list)
-
Syllable [S-Z]+: the position of the syllable in the word, starting
from S.
-
Position [OKCA]?: The position of a phoneme or cluster in the syllable,
Onset, Kernel, Coda, Ambisyllabic (the last is optional).
-
Phoneme [0-9]+ : the number of the phoneme in the syllable
-
Channel _[fh]m The recording microphone channel, fixed, '_fm', head-mounted,
'_hm', or both, '_bm', but the latter is optional, is appended after
the name. This affix should be appended to every derived file.
Special codes
-
Inserted_speech [\ -]+ : False starts, corrections, errors are all counted
"in situ", but the "count" is preceded by a -, a repeated error etc.
is preceded by a double --, higher counts receive their ordinal number
between the --. For example, the third restart of the sixth sentence
is coded as '-C-F', for words it would be '-3-6' (note that the repeat
count is coded in the same way as the major count, alphabetic when
alphabetic, numeric when numeric). WARNING: the order of these counts
is not necessarily chronological.
-
Collections [\ +]+ : If some items are combined, use {start}++{end} . For
instance, if words 3 to 6 of sentence F28G1VI3C are stored together,
use F28G1VI3C3++6. If the first two sentences are combined, use F28G1VI3A++B.
To indicate everything to the end of the item, leave out the second
({end}) indicator, e.g., F28G1VI3C2T++ indicates all syllables after
the first (S).
Use a single + to combine non-identical multi-channel recordings, e.g.,
shadowing and multi-logues. For instance, if speaker F27B shadows speaker
F28G, we get F27B3SH1+F28G1FY for F27B recording 3 SHadowing session 1 of
the recording F28G1FY.
-
Subdivissions [\ .\ ,] : If a major divission must be broken down, e.g.,
sentences into phrases, a period or comma is inserted. For example,
the second phrase of the sixth sentence becomes 'F,B' or 'F.B'. Comma's
are prefered, but not all systems allow them. In syllable-parts, these
divissions are optional (e.g., 'TC' instead of 'T,C').
Content descriptions of the segment
-
_Description A separator and textual description of the underlying
sound, e.g., "_fm" for the Fixed Microphone channel of an audio file, "_LB"
for the author (labeler) of a Label file. You can use whatever description
you want, even pure content descriptions like "_zong" for the word or
_4E3SC3 for the corresponding segment in the reference text. You
can add as many description as you like, e.g., _fm_WF_zong_4E3SC3.
-
Note: The description gives information on the context and contents of the
sound segment and its relation to the other recordings. The descriptions
are optional. However, I suggest the following minimum set: Channel for
the recordings and author for any type of annotation.
Example:
M56H1FT3A4SC2_fm_SM
-
Male, 56 yoa, Subject H, Recording session 1, Fixed Text read aloud, slide 3, sentence A, Word 4, Syllable S (i.e., first syllable), Coda, Phoneme 2, derived from the fixed microphone _fm and author SM.
Links to this page
© Rob van Son, October 16th, 2001