 
  
  
   
 
A sound waveform is a representation of the changes in pressure caused by the temporal evolution of a sound wave.
  
 
There are three attributes which determine the exact appearance of a sinusoidal waveform representing a pure tone, corresponding to the three variables other than time t in the above equation:
 ) This measures the number of times per second
that the waveform repeats its basic sinusoidal cycle, multiplied by
 ) This measures the number of times per second
that the waveform repeats its basic sinusoidal cycle, multiplied by   .
This so-called angular frequency measured in radians is used in the basic
equation for SHM because it is equal to
 .
This so-called angular frequency measured in radians is used in the basic
equation for SHM because it is equal to   , where k is the
elasticity constant mentioned above and m is the vibrating mass.  Frequency
can also be expressed in cycles per second (cps):
 , where k is the
elasticity constant mentioned above and m is the vibrating mass.  Frequency
can also be expressed in cycles per second (cps):   ,
and it is this meaure of frequency which is normally used to describe
spectrogram frequencies.
Cycles per second are also called Hertz (Hz); for higher frequencies we use
the abbreviations kHz (1000 cps), MHz (1,000,000 cps), etc.  For example,
the tuning fork which produces E above middle C vibrates at a frequency of
2071.13 radians per second, or 329.63 Hz.  This means that in one second,
the tuning fork tines move from maximum displacement in one direction to maximum
displacement in the other direction and back again 329.63 times.  It is this
frequency which corresponds to our sensation of the pitch of E above middle C.
A doubling of frequency corresponds to one musical octave; thus the E one
octave higher has a frequency of 659.26 Hz.
Human hearing is sensitive to frequencies from 20 Hz to 15 kHz or greater.
Animals such as bats and whales can hear sounds with a frequency of up to 150
kHz.  Human hearing is limited both in intensity (corresponding to amplitude)
and in frequency.  See Figure 3 for an diagram of the region of audibility for
human beings, and for the sound level readings for some typical human auditory
environments.  See Figure 4 for a diagram of the frequencies associated with
the 88 keys of a piano, along with the range of other orchestral instruments
and human voices.  We will have much more to say about the different
frequencies which make up the sounds of speech; in a sense, spectrogram
reading is being able to recognize those frequencies which distinguish
the different phonemes.
 ,
and it is this meaure of frequency which is normally used to describe
spectrogram frequencies.
Cycles per second are also called Hertz (Hz); for higher frequencies we use
the abbreviations kHz (1000 cps), MHz (1,000,000 cps), etc.  For example,
the tuning fork which produces E above middle C vibrates at a frequency of
2071.13 radians per second, or 329.63 Hz.  This means that in one second,
the tuning fork tines move from maximum displacement in one direction to maximum
displacement in the other direction and back again 329.63 times.  It is this
frequency which corresponds to our sensation of the pitch of E above middle C.
A doubling of frequency corresponds to one musical octave; thus the E one
octave higher has a frequency of 659.26 Hz.
Human hearing is sensitive to frequencies from 20 Hz to 15 kHz or greater.
Animals such as bats and whales can hear sounds with a frequency of up to 150
kHz.  Human hearing is limited both in intensity (corresponding to amplitude)
and in frequency.  See Figure 3 for an diagram of the region of audibility for
human beings, and for the sound level readings for some typical human auditory
environments.  See Figure 4 for a diagram of the frequencies associated with
the 88 keys of a piano, along with the range of other orchestral instruments
and human voices.  We will have much more to say about the different
frequencies which make up the sounds of speech; in a sense, spectrogram
reading is being able to recognize those frequencies which distinguish
the different phonemes.
The notion of frequency is related to two other measures:
 )
The wavelength of a given sinusoidal wave is the spatial length of
one complete cycle of the wave; it is directly proportional to
the wave period, and inversely proportional to its frequency.  The wavelength
is also dependent upon the velocity of propagation of sound v in the given
medium of transmission.  For example, we
have seen that the speed of sound in air at
 )
The wavelength of a given sinusoidal wave is the spatial length of
one complete cycle of the wave; it is directly proportional to
the wave period, and inversely proportional to its frequency.  The wavelength
is also dependent upon the velocity of propagation of sound v in the given
medium of transmission.  For example, we
have seen that the speed of sound in air at   is 331 m/s.  The
wavelength of a sinusoidal tone at 100 Hz will measure 3.31 m, whereas for a
sinusoidal tone at 1000 Hz the wavelength will be 0.331 m.  In water at
  is 331 m/s.  The
wavelength of a sinusoidal tone at 100 Hz will measure 3.31 m, whereas for a
sinusoidal tone at 1000 Hz the wavelength will be 0.331 m.  In water at
  , where the speed of sound is 4.3 times as great as in air at the same
temperature, the wavelength for a 100 Hz tone is also greater at 14.33 m.
 , where the speed of sound is 4.3 times as great as in air at the same
temperature, the wavelength for a 100 Hz tone is also greater at 14.33 m.
 ) Phase measures the displacement in degrees from
 ) Phase measures the displacement in degrees from   at some starting reference point in time.  At present it is considered
that human hearing is not particularly sensitive to phase shifts, and thus in
this course we shall not pay much attention to phase information.
  at some starting reference point in time.  At present it is considered
that human hearing is not particularly sensitive to phase shifts, and thus in
this course we shall not pay much attention to phase information.
Figure 5 contains three sets of sine waves differing in the three defining attributes explained above:
 and
  and
  , or between
 , or between   and
  and   , making it possible to store each
sample in two bytes or one byte.
 , making it possible to store each
sample in two bytes or one byte.
Thus the original signal which varies continuously over time and in value is reduced to an array of 8000 or 16000 two-byte or one-byte integer values per second. The goal of this analog-to-digital conversion is to reduce the amount of data to a manageable level; even with this data reduction, each second of speech requires at least 8K bytes of storage.
 
  
  
  