Sound: Deepen band modulation...

Sound: Deepen band modulation...

A command to enhance the fast spectral changes, like F₂ movements, in each selected Sound object.

Settings

Enhancement (dB): the maximum increase in the level within each critical band. The standard value is 20 dB.
From frequency (Hz): the lowest frequency that shall be manipulated. The bottom frequency of the first critical band that is to be enhanced. The standard value is 300 Hertz.
To frequency (Hz): the highest frequency that shall be manipulated (the last critical band may be narrower than the others). The standard value is 8000 Hz.
Slow modulation (Hz): the frequency f_slow below which the intensity modulations in the bands should not be expanded. The standard value is 3 Hz.
Fast modulation (Hz): the frequency f_fast above which the intensity modulations in the bands should not be expanded. The standard value is 30 Hz.
Band smoothing (Hz): the degree of overlap of each band into its adjacent bands. Prevents ringing. The standard value is 100 Hz.

Algorithm

This algorithm was inspired by Nagarajan, Wang, Merzenich, Schreiner, Johnston, Jenkins, Miller & Tallal (1998), but not identical to it. Now follows the description.

Suppose the settings have their standard values. The resulting sound will composed of the unfiltered part of the original sound, plus all manipulated bands.

First, the resulting sound becomes the original sound, stop-band filtered between 300 and 8000 Hz: after a forward Fourier transform, all values in the Spectrum at frequencies between 0 and 200 Hz and between 8100 Hz and the Nyquist frequency of the sound are retained unchanged. The spectral values at frequencies between 400 and 7900 Hz are set to zero. Between 200 and 400 Hz and between 7900 and 8100 Hz, the values are multiplied by a raised sine, so as to give a smooth transition without ringing in the time domain (the raised sine also allows us to view the spectrum as a sum of spectral bands). Finally, a backward Fourier transform gives us the filtered sound.

The remaining part of the spectrum is divided into critical bands, i.e. frequency bands one Bark wide. For instance, the first critical band run from 300 to 406 Hz, the second from 406 to 520 Hz, and so on. Each critical band is converted to a pass-band filtered sound by means of the backward Fourier transform.

Each filtered sound will be manipulated, and the resulting manipulated sounds are added to the stop-band filtered sound we created earlier. If the manipulation is the identity transformation, the resulting sound will be equal to the original sound. But, of course, the manipulation does something different. Here are the steps.

First, we compute the local intensity of the filtered sound x (t):

intensity (t) = 10 log₁₀ (x² (t) + 10^-6)

This intensity is subjected to a forward Fourier transform. In the frequency domain, we administer a band filter. We want to enhance the intensity modulation in the range between 3 and 30 Hz. We can achieve this by comparing the very smooth intensity contour, low-pass filtered at f_slow = 3 Hz, with the intensity contour that has enough temporal resolution to see the place-discriminating F₂ movements, which is low-pass filtered at f_fast = 30 Hz. In the frequency domain, the filter is

H (f) = exp (- (αf / f_fast)²) - exp (- (αf / f_slow)²)

where α equals √ln 2 ≈ 1 / 1.2011224, so that H (f) has its -6 dB points at f_slow and f_fast:

   alpha = sqrt (ln (2))

   filter = Create Sound from formula: "filter", 1, 0.0, 100.0, 10.0,

   ... ~ exp (- (alpha * x / 30.0) ^ 2) - exp (- (alpha * x / 3.0) ^ 2)

Red

   Draw: 0, 0, 0.0, 1.0, "no", "curve"

   Black

   Draw inner box

   Text bottom: "yes", "Frequency %f (Hz)"

   Text left: "yes", "Amplitude filter %H (%f)"

   One mark left: 0.0, "yes", "yes", "no", ""

   One mark left: 0.5, "yes", "yes", "yes", ""

   One mark left: 1.0, "yes", "yes", "no", ""

   One mark right: 1.0, "no", "yes", "no", "0 dB"

   One mark right: 0.5, "no", "yes", "no", "\-m6 dB"

   One mark bottom: 3.0, "yes", "yes", "yes", ""

   One mark bottom: 30.0, "yes", "yes", "yes", ""

Now, why do we use such a flat filter? Because a steep filter would show ringing effects in the time domain, dividing the sound into 30-ms chunks. If our filter is a sum of exponentials in the frequency domain, it will also be a sum of exponentials in the time domain. The backward Fourier transform of the frequency response H (f) is the impulse response h (t). It is given by

h (t) = 2π√π f_fast/α exp (-(πtf_fast/α)²) - 2π√π f_slow/ l exp (-(πtf_slow/α)²)

This impulse response behaves well:

   impulseResponse = Create Sound from formula: "impulseResponse", 1,

   ... -0.2, 0.2, 2500, ~ 2 * pi * sqrt (pi) / alpha *

   ... (30.0 * exp (- (pi * 30.0 / alpha * x) ^ 2) -

   ...   3.0 * exp (- (pi * 3.0 / alpha * x) ^ 2))

Red

   Draw: 0, 0, -100.0, 400.0, "no", "curve"

   Black

   Draw inner box

   Text bottom: "yes", "Time %t (s)"

   Text left: "yes", "Intensity impulse reponse %h (%t)"

   One mark bottom: -0.2, "yes", "yes", "no", ""

   One mark bottom: 0.0, "yes", "yes", "yes", ""

   One mark bottom: 0.2, "yes", "yes", "no", ""

   One mark left: 0.0, "yes", "yes", "yes", ""

We see that any short intensity peak will be enhanced, and that this enhancement will suppress the intensity around 30 milliseconds from the peak. Non-Gaussian frequency-domain filters would have given several maxima and minima in the impulse response, clearly an undesirable phenomenon.

After the filtered band is subjected to a backward Fourier transform, we convert it into power again:

power (t) = 10^{filtered / 2}

The relative enhancement has a maximum that is smoothly related to the basilar place:

ceiling = 1 + (10^{enhancement / 20} - 1) c (1/2 - 1/2 cos (π f_midbark / 13))

where f_midbark is the mid frequency of the band. Clipping is implemented as

factor (t) = 1 / (1 / power (t) + 1 / ceiling)

Finally, the original filtered sound x (t), multiplied by this factor, is added to the output.

Links to this page

What was new in 3.7?