Next: Sound
Up: Waveforms and Spectrograms
Previous: Waveforms and Spectrograms
The Speech Chain is the title of a book by Peter Denes. The concept of a
speech chain is a good place to start thinking about speech. This chain
includes the following links in which a though is expressed in different forms
as it is born in a speaker's mind and eventually gives rise to understanding
in a listener's mind (see Figure 1 for two diagrams from Rabiner and Juang
which compare the links in the speech chain with their computer analogs):
- Intention
The speaker first decides to say something to another human being (or to a
machine). This event takes place in the higher centers of the mind/brain.
- Language
The desired thought passes through the language centers of the brain
where it is given expression in words which are assembled together in the
proper order and given final phonetic, intonational, and durational form.
- Motor Program and Muscle Movement
The results of the language-production centers of the brain may be considered
a speech motor program which executes over time by conveying firing
sequences to the lower neurological centers, which in turn impart motion to
all of the muscles responsible for speech production: the diaphragm, the
larynx, the tongue, the jaw, the lips, and so on. Much if not all of this
activity is subconscious, and involves constant corrective feedback.
- Airstream in the Vocal Tract
As a result of the muscle movements, a stream of air emerges from the
lungs, passes through the vocal cords where a phonation type (e.g.
normal voicing, whispering, aspiration, creaky voice, or no shaping
whatsoever) is developed, and receives its final shape in the vocal tract
before emerging from the mouth and the nose and through the tissues of the face.
- Sound Wave in Air
The vibrations caused by the vocal apparatus of the speaker radiate through
the air as a sound wave.
- Electronic Transduction
The sound wave may be converted to analog or digital form for storage or
transmission, and in the form of electric waves may be transported thousands
of miles to
its destination, where the information in the electric waves is converted back
to the form of sound. It is in the form of an electronic copy of the
original sound wave that automatic speech recognition by computer gains
access to speech data.
- Hearing
The sound wave, which may have passed through electronic coding and decoding,
eventually strikes the eardrums of another human being, where it is first
converted to waves on the surface of the tympanum membranes, next to mechanical
motion via the ossicles of the middle ear, then to fluid pressure waves in the
medium bathing the basiliar membrane of the inner ear, and finally to firings
in the 30,000 neural fibers which combine to form the auditory nerve.
- Auditory and Language Processing
The lower centers of the brainstem, the thalamus, the auditory cortex, and the
language centers of the brain all cooperate in the recognition of the phonemes
which convey meaning, the intonational and durational contours which provide
additional information, and the vocal quality which allows the listener to
recognize who is speaking and to gain insight into the speaker's health,
emotional state, and intention in speaking.
- Understanding
The higher centers of the brain, both conscious and subconscious,
bring to this incoming auditory and language data all the experience of
the listener in the form of previous memories and understanding of the
current context, allowing the listener to ``manufacture" in his or her
mind a more or less faithful ``replica" of the thought which was originally
formulated in the speaker's consciousness and to update the listener's
description of the current state of the world. The listener may in turn
become the speaker, and vice versa, and the speech chain will then operate in
reverse.
In this course we will have nothing to say about the higher processing levels
mentioned above: that is, steps 1 and 2 in the speaker and 8 and 9 in the
listener. We will concentrate on the following two areas:
- The movements of the muscles in the vocal apparatus, how this movement
results in the sounds of speech, and the classification of the sounds of
speech into different broad categories and specific distinguishable phonemes.
This corresponds to steps 3 and 4 above, in which we consider speech within
the vocal tract.
- Speech can viewed as a sound wave, and the concepts of acoustics,
the branch of physics dealing with sound, may be used to describe and analyze
it. It may also be viewed as a signal carrying information, in which case
mathematical techniques such as digital signal processing (DSP),
linear predictive coding (LPC), and Hidden Markov Modelling (HMM)
may be applied to it. These analysis techniques correspond to steps 5 and 6
above, in which we consider speech once it has left the vocal tract as a
sound wave susceptible to decoding and recognition outside the human nervous
system.
We will also have a little to say about step 7 above, in which the sound
wave is converted back into neuronal activity, but this time in the ear
of the listener rather than in the vocal tract of the speaker. The branch
of science which studies this form of the speech signal is called auditory
neurology.
This course is entitled The Structure of Spoken Language, but it bears
the subtitle Speech Spectrogram Reading. This is because the
spectrogram, which will be introduced below, can be viewed as a
central meeting point of all the aspects of speech science which we will
study in this course. It contains the imprint of the vocal tract of the
person who produced the utterance; it is derived by digital signal processing
from that utterance; and it is very likely that the human ear creates
something very much like a spectrogram as its first step in decoding the
utterance. In this course we will focus our attention on trying to decode
the message contained in the spectrogram by using our sense of vision rather
than our sense of hearing. That is why we call the course
``spectrogram reading."
Before we can understand what a spectrogram is, however, we need to say
something about sound, and about the sound waveform.
Next: Sound
Up: Waveforms and Spectrograms
Previous: Waveforms and Spectrograms
Ed Kaiser
Sat Mar 15 00:01:27 PST 1997