Using Praat to synthesize speech from Vocal Tract Area functions

New functionality was introduced in Praat 5.3.14 and is still under development. Future versions will allow the direct creation of Vocal Tract Tiers from LPC objects and LPC filtering of (source) sounds with sample-frequencies and durations that differ from those of the LPC analysis. Be aware that this code has not been tested extensively, so there will be bugs.

Links to Real Time MRI and articulatory synthesis

Seeing Speech, Collected Works on Real-Time Imaging of Speech Production by Erik Bresch, Department of Electrical Engineering, University of Southern California

Example 4 Nice video of articulator movements while speaking the Rainbow text, with colored region markings

VocalTractLab Towards high-quality articulatory speech synthesis. Contains very nice demonstration videos

Der Zug hat eine Stunde Verspätung.
Dona Nobis Pacem singing articulators in 2D (2007)
Salvete based on Canon in D by Pachelbel singing articulators in 3D (2010)

MULTIMODAL SPEECH SYNTHESIS, KTH Stockholm

THE 3D VOCAL TRACT PROJECT

Examples of manipulations using Vocal Tract Area functions in praat

In Praat it is possible to calculate a vocal tract area function that is equivalent to a certain (vowel) sound. The sound can then be resynthesized using the calculated vocal tract area function as a filter. The vocal tract area functions can be manipulated and modified before resynthesis. In the list below, you find some example vocal tract area functions of sustained /a/, /i/, and /y/, and the resynthesized sounds.

Female speaker

/a/ speech female voice (original)
/a/ vocal tract of female voice (acoustic, 42 segments)
Resynthesized voice of /a/ vocal tract
/i/ speech female voice (original)
/i/ vocal tract of female voice (acoustic, 42 segments)
Resynthesized voice of /i/ vocal tract
/y/ speech female voice (original)
/y/ vocal tract of female voice (acoustic, 44 segments)
Resynthesized voice of /y/ vocal tract

Voice source used (2s)

Blend two vocal tracts: Paste the lips of an /y/ onto the vocal tract of an /i/. That is append the last four segments of /y/ to /i/ vocal tract function, adapt length etc.:

Male speaker

/a/ speech male voice (original)
/a/ vocal tract of male voice (acoustic, 42 segments)
Resynthesized voice of /a/ vocal tract
/i/ speech male voice (original)
/i/ vocal tract of male voice (acoustic, 42 segments)
Resynthesized voice of /i/ vocal tract
/y/ speech male voice (original)
/y/ vocal tract of male voice (acoustic, 44 segments)
Resynthesized voice of /y/ vocal tract

Voice source used (2s)

Blend two vocal tracts: Paste the lips of an /y/ onto the vocal tract of an /i/. That is append the last two segments of /y/ to /i/ vocal tract function, adapt length etc.:

Attaching measured areas to a Vocal Tract Area functions in praat

Take measured areas from MRI slices of the lips, and attach them to an existing Vocal Tract Area function. Start with the recordings of /i/ and /y/ of the female speaker above. Areas for her lips were determined using an MRI image. Starting from the teeth (X=0) go outward. Only every third slice was used. Slice thickness was 1.4064 mm and the area value is positioned at slice midpoint. All values are recalculated to meters.

X /i/ (m²) /y/ (m²) /a/ (m²)

0.0007032 0.00024051 0.00017821 0.00062801

0.0049224 0.000366 0.00012811 0.00043362

0.0091416 0.00035899 0.00008623 0.00037098

0.0133608 - 0.00001303 0.00039874

0.01758 - - 0.00037381

Start with the original recorded vowels /i/ and /y/ from the female voice. Convert them to LPC -> VocalTract with order 30 and length 0.17 (/i/ VocalTract) and order 32 and length 0.1756 (/y/ VocalTract). Replace the last three sections in the original /i/ VocalTract with the values from the table for /i/ and /y/, For the /y/ table values, adapt the number of sections of the resulting VocalTract to 31 and length to 0.1756. The same is done for the last four sections of the original /y/ VocalTract. But now the number of sections for the /i/ table values is reduced to 31 and the length to 0.17. The two original and four new VocalTracts can then be resynthesized like was done above.

Starting with the original recorded /i/

/i/ VocalTract (order 30, length 0.17)
/i/ resynthesis (sound)
/i/ with lips of /i/ VocalTract (order 30, length 0.17)
/i/ with lips of /i/ resynthesis (sound)
/i/ with lips of /y/ VocalTract (order 31, length 0.1756)
/i/ with lips of /y/ resynthesis (sound)

Starting with the original recorded /y/

/y/ VocalTract (order 32, length 0.1756)
/y/ resynthesis (sound)
/y/ with lips of /i/ VocalTract (order 31, length 0.17)
/y/ with lips of /i/ resynthesis (sound)
/y/ with lips of /y/ VocalTract (order 32, length 0.1756)
/y/ with lips of /y/ resynthesis (sound)

From Vocal Tract area functions to speech

The following table explains how to get from a Vocal Tract to a synthetic sound. For synthesis, a "Source" sound is needed that supplies the driver of the Vocal Tract filter. In normal speech, the source sound is produced by the glottal folds, or voice box. You can generate a source as specified below. Note that the sample frequency of the source sound has to be equal to the number of segments in the Vocal Tract in kHz. For instance, if you have 40 segments (tubes), you need a source sampled with 40kHz. Use the Praat Resample... function to perform the resampling. The length of the Vocal Tract Tier must be exactly the same as the length of the Source sound. Below, we take a duration of 3 seconds in the presented examples. The audio example is 5 seconds long.

Here is an example generated by determining the vocal tract area function at a point in a recorded /a/ and one at a corresponding point in a recorded /i/ from the same speaker. The voice source signal is entirely synthetic.

To test the synthesis, you can use the standard vocal tracts in Praat or create a Vocal Tract from recorded speech. The standard phone Vocal Tracts can be created in Praat from New->Articulatory synthesis->Create Vocal Tract from phone... . To create a Vocal Tract from recorded speech, simply read in the recording and convert it to LPC with the Formants & LPC -> LPC (autocorrelation)... options. Enter the number of segments you want in your Vocal Tract as the prediction order. Then use To VocalTract (slice)... to generate the Vocal Tract object. Save it with Save->Save as short text file... . Note that there is a rather convoluted relationship between the LPC prediction order, the sample frequency, the recorded sound and the quality of the resulting LPC model.

You can download Praat from www.praat.org

Action Praat Script

Sythesize sound

Read VocalTract file Open->Read from file... Read from file... a.VocalTract

Convert to Vocal Tract Tier To VocalTractTier... To VocalTractTier... 0 3 0.5

Convert Tier to LPC To LPC... To LPC... 0.005

Select both LPC and Source audio file Option/Control select source audio Sound plus Sound Source

Filter Source with LPC Filter... Filter... no

Resample to 10kHz Convert->Resample... Resample... 10000 50

Generate Source sound

Create an empty PitchTier object New->Tiers->Create PitchTier... Create PitchTier... Source 0 3

Add a high starting point at 120Hz Modify->Add point... Add point... 0 120

Add a low end point at 100Hz Modify->Add point... Add point... duration 100

Convert it into a phonation sound Synthesize->To Sound (phonation)... To Sound (phonation)... 40000 1 0.05 0.7 0.03 3 4 no

Scale to a nice intensity Modify->Scale intensity... Scale intensity... 70

Create Vocal Tract

Read audio file Open->Read from file... Read from file... a.wav

Convert to LPC with predition order 40 for 40 tube segments Formants & LPC -> LPC (autocorrelation)... To LPC (autocorrelation)... 40 0.025 0.005 50

Convert LPC to Vocal tract, use slice at 2 seconds and a total vocal tract length of 20 cm To VocalTract (slice)... To VocalTract (slice)... 2 0.20

Example files /i/ /a/

VocalTractExample.praat: Synthesizer script
CreateVocalTracts.praat Script to create example VocalTracts from example audio
a.wav: Example sustained /a/ recording (natural speech)
a.VocalTract: Example /a/ VocalTract
i.wav: Example sustained /i/ recording (natural speech)
i.VocalTract: Example /i/ VocalTract
a_i_synthesis.wav: Example /a/-/i/ synthesis

Example files /i/ /y/ (LPC order 44)

i2_y_i2_synthesis.praat: Example /i/-/y/-/i/ praat script
i2.wav: Example sustained /i/ recording (natural speech)
i2.VocalTract: Example /i/ VocalTract (4.0 s)
y.wav: Example sustained /y/ recording (natural speech)
y.VocalTract: Example /y/ VocalTract (3.5 s)
i2_y_i2.VocalTractTier: Example /i/-/y/-/i/ Vocal tract tier
i2_y_i2_synthesis.wav: Example /i/-/y/-/i/ synthesis

Vocal Tract tube models

The Vocal Tract area functions model the human vocal tract as a set of connected tubes with variable width. Determining the tube, or segment, areas with LPC is not very reliable. Below are presented tube models as determined with LPC (prediction order 40) and the "theoretical" models as given by Praat New->Articulatory synthesis->Create Vocal Tract from phone....

Vocal Tract tube model of /a/ (example) Vocal Tract tube model of /i/ (example)

Standard Vocal Tract tube model of /a/ Standard Vocal Tract tube model of /i/

Vocal Tract tube model of /y/ (example) Standard Vocal Tract tube model of /y/

VocalTract file format

The example VocalTract file below is created with Save as short text file... . There is a more descriptive (longer) format that is obtained with Save as text file... .

File type = "ooTextFile" The line by which Praat can recognize your file

Object class = "VocalTract 2" The line that tells Praat about the contents

Empty line

0 xmin: First segment (Glottis, meter)

0.2 xmax: Last segment (Lips, meter)

40 nx: Number of segments

0.005 dx: Segment length (m)

0.0025 x1: Position of first segment

1 ymin: NA

1 ymax: NA

1 ny: NA

1 dy: NA

1 y1: NA

0.00010813061971705616 Area in m²

0.00010390570341053334 Area in m²

8.903563828398031e-05 Area in m²

0.00010876151465927323 Area in m²

.... Many more values

0.008175693406171154 Area in m²

0.0013459947563683344 Area in m²

0.04293933951717365 Area in m²

0.000489118171677886 Area in m²

VocalTractExample.zip: all files

License

    Copyright © 2012  NKI-AVL, Amsterdam and R.J.J.H. van Son

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License along with this program.  
    If not, see http://www.gnu.org/licenses/.

X	/i/ (m²)	/y/ (m²)	/a/ (m²)
0.0007032	0.00024051	0.00017821	0.00062801
0.0049224	0.000366	0.00012811	0.00043362
0.0091416	0.00035899	0.00008623	0.00037098
0.0133608	-	0.00001303	0.00039874
0.01758	-	-	0.00037381

Action	Praat	Script
Sythesize sound
Read VocalTract file	Open->Read from file...	Read from file... a.VocalTract
Convert to Vocal Tract Tier	To VocalTractTier...	To VocalTractTier... 0 3 0.5
Convert Tier to LPC	To LPC...	To LPC... 0.005
Select both LPC and Source audio file	Option/Control select source audio Sound	plus Sound Source
Filter Source with LPC	Filter...	Filter... no
Resample to 10kHz	Convert->Resample...	Resample... 10000 50

Generate Source sound
Create an empty PitchTier object	New->Tiers->Create PitchTier...	Create PitchTier... Source 0 3
Add a high starting point at 120Hz	Modify->Add point...	Add point... 0 120
Add a low end point at 100Hz	Modify->Add point...	Add point... duration 100
Convert it into a phonation sound	Synthesize->To Sound (phonation)...	To Sound (phonation)... 40000 1 0.05 0.7 0.03 3 4 no
Scale to a nice intensity	Modify->Scale intensity...	Scale intensity... 70

Create Vocal Tract
Read audio file	Open->Read from file...	Read from file... a.wav
Convert to LPC with predition order 40 for 40 tube segments	Formants & LPC -> LPC (autocorrelation)...	To LPC (autocorrelation)... 40 0.025 0.005 50
Convert LPC to Vocal tract, use slice at 2 seconds and a total vocal tract length of 20 cm	To VocalTract (slice)...	To VocalTract (slice)... 2 0.20


Vocal Tract tube model of /a/ (example)	Vocal Tract tube model of /i/ (example)

Standard Vocal Tract tube model of /a/	Standard Vocal Tract tube model of /i/

Vocal Tract tube model of /y/ (example)	Standard Vocal Tract tube model of /y/

File type = "ooTextFile"		The line by which Praat can recognize your file
Object class = "VocalTract 2"		The line that tells Praat about the contents
		Empty line
0		xmin: First segment (Glottis, meter)
0.2		xmax: Last segment (Lips, meter)
40		nx: Number of segments
0.005		dx: Segment length (m)
0.0025		x1: Position of first segment
1		ymin: NA
1		ymax: NA
1		ny: NA
1		dy: NA
1		y1: NA
0.00010813061971705616		Area in m²
0.00010390570341053334		Area in m²
8.903563828398031e-05		Area in m²
0.00010876151465927323		Area in m²
....		Many more values
0.008175693406171154		Area in m²
0.0013459947563683344		Area in m²
0.04293933951717365		Area in m²
0.000489118171677886		Area in m²