Tract
Overview
Tract
implements a rudimentary physical model of a vocal
tract for use in articulatory synthesis. It is based
on the classic Kelly-Lochbaum model developed at Bell Labs
in the early 60s. In recent years, such a model has gotten
new breath (so to speak) in programs like
Pink Trombone, from which
this particular was derived from.
In order to be both "physically based" and practically computable, the model crudely pretends that our vocal tract is a series of cylindrical tubes with varying diameters. Changing the diameters in various ways will result in different phonemes to come out the other side. Almost by magic.
What combinations of diameters produce what kind of phonemes? It's not a straightforward answer, and new methods for finding ideal shapes are still an ongoing area of research. In the early days, measurements were originally obtained from real vocal tracts!
The tract is a filter that expects some kind of external input source (in speech synthesis, this relationship is known as a "source-filter" model). The closer that input resembles the kind of sound our glottis makes, the more vocal-like the output sounds will turn out. In programs like Pink Trombone, an analytical model of the glottis was used as the source signal for it's vocal tract filter.
Compared to Pink Trombone, this model is greatly simplified.
Both implement the digital waveguide as a ladder filter
consisting of 44
Kelly-Lochbaum scattering junctions.
In
this model, the nasal component (known as velum
control)
can be optionally enabled. Transient generator in PT,
used whenever there were blocking interferences in the vocal
tract, has been removed. An implementation
that more closely resembles PT can be found in Voc.
Tangled files
Tract
tangles into two main files: tract.c
and
tract.h
. Defining SK_TRACT_PRIV
will expose the struct.
#include <stdlib.h>
#include <math.h>
#include <string.h>
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
#define SK_TRACT_PRIV
#include "tract.h"
<<static_funcdefs>>
<<funcs>>
#ifndef SK_TRACT_H
#define SK_TRACT_H
#ifndef SKFLT
#define SKFLT float
#endif
<<typedefs>>
<<funcdefs>>
#ifdef SK_TRACT_PRIV
<<structs>>
#endif
#endif
Struct and Initialization
The core struct is defined as sk_tract
.
typedef struct sk_tract sk_tract;
struct sk_tract {
<<sk_tract>>
};
The struct is initialized with sk_tract_init
.
void sk_tract_init(sk_tract *tr);
void sk_tract_init(sk_tract *tr)
{
<<init>>
calculate_reflections(tr);
}
Components and Variables
Glottal and Lip Reflection constants
Glottal and lip reflection constants. These are set to be
0.75
and -0.85
, respectively.
SKFLT glottal_reflection;
SKFLT lip_reflection;
tr->glottal_reflection = 0.75;
tr->lip_reflection = -0.85;
Tract Size
The active size of the vocal tract is determined by
a variable n
. It can have a maximum value of 44.
int n;
tr->n = 44;
Diameters
The diameters
array are inputs available
to manipulate the vocal tract.
SKFLT diameter[44];
memset(tr->diameter, 0, 44 * sizeof(SKFLT));
<<initial_shape>>
The diameters get set to an initial shape. This is the one Neil Thapen uses in Pink Trombone:
{
int i;
for(i = 0; i < tr->n; i++) {
SKFLT diameter = 0;
if(i < 7 - 0.5) {
diameter = 0.6;
} else if( i < 12) {
diameter = 1.1;
} else {
diameter = 1.5;
}
tr->diameter[i] = diameter;
}
}
Areas
Diameters get converted into areas and stored in
the array A
, and areas get
converted into reflection coefficients stored in the
array reflection
.
SKFLT A[44];
SKFLT reflection[44];
memset(tr->A, 0, 44 * sizeof(SKFLT));
memset(tr->reflection, 0, 44 * sizeof(SKFLT));
Scattering Junction Outputs and Bidirectional Delay Line
Each scattering junction has an left and right output
corresponding to the direction of pressure, as well
as filter memory. The junction output arrays are known
as junction_outL
and junction_outR
, respectively.
Additional Left/Right arrays are used to implement the
bidirectional delay line for the underlying waveguide.
SKFLT junction_outL[44];
SKFLT L[44];
SKFLT junction_outR[44];
SKFLT R[44];
memset(tr->junction_outL, 0, 44 * sizeof(SKFLT));
memset(tr->L, 0, 44 * sizeof(SKFLT));
memset(tr->junction_outR, 0, 44 * sizeof(SKFLT));
memset(tr->R, 0, 44 * sizeof(SKFLT));
User-Supplied Shape Callback
A user-supplied shape
callback can be used to shape
the diameters at audio-rate, allowing for things like
smoothing filters. A user-data void
struct is also included here.
void *ud;
void (*shape)(sk_tract *, SKFLT *, void *);
tr->shape = NULL;
tr->ud = NULL;
Use Diameters Flag
The use_diameters
flag can be used to enable/disable
the diameter control. If disabled, areas can be
directly manipulated.
int use_diameters;
sk_tract_use_diameters(tr, 1);
It is set with the function sk_tract_use_diameters
,
where mode
is true (1) or false (0).
void sk_tract_use_diameters(sk_tract *tr, int mode);
void sk_tract_use_diameters(sk_tract *tr, int mode)
{
tr->use_diameters = mode;
}
Diameter Manipulation
The vocal tract is controlled by mainpulating the
underlying diameter sizes. These can be directly accessed
via sk_tract_diameters
.
Get Diameters
Returns the array holding the diameter values.
SKFLT* sk_tract_diameters(sk_tract *tr);
SKFLT* sk_tract_diameters(sk_tract *tr)
{
return tr->diameter;
}
Tract Size
The number of diameters can be retrieved with
sk_tract_size
. Usually, this is 44.
int sk_tract_size(sk_tract *tr);
int sk_tract_size(sk_tract *tr)
{
return tr->n;
}
Shaper Function
More often than not, one wants to apply sample-accurate
smoothing to the diameters rather than work them directly.
This is done using a callback interface, known as a
shaper
.
The function sk_tract_shaper
sets up a shaper callback.
It takes in the shaper callback as well as any external
user data needed to manage state in that callback.
This function gets called at every sample, and takes in
three arguments: the sk_tract
struct, the output array to
write to, and the externally managed user data.
In practice, one potential approach is to use a filterbank of 44 smoothing filters to control the diameter shapes. This allows diameters to be controlled without producing any artificats caused by large discontinuities.
void sk_tract_shaper(sk_tract *tract,
void (*shape)(sk_tract *, SKFLT *, void *),
void *ud);
void sk_tract_shaper(sk_tract *tract,
void (*shape)(sk_tract *, SKFLT *, void *),
void *ud)
{
tract->shape = shape;
tract->ud = ud;
}
Tongue Control
Neil Thapen's Pink Trombone employs a curious "tongue control" functionality, which allows one use 2 dimensions of control to shape the entire tract. This is also the underlying control mechanism for Voc.
This behavior
is ported in the function sk_tract_tongue_shape
, where
position
and diameter
are both
normalized floating point values.
void sk_tract_tongue_shape(sk_tract *tract,
SKFLT position,
SKFLT diameter);
static void set_diameters(sk_tract *tract,
int blade_start,
int lip_start,
int tip_start,
SKFLT tongue_index,
SKFLT tongue_diameter,
SKFLT *diameters)
{
int i;
SKFLT t;
SKFLT fixed_tongue_diameter;
SKFLT curve;
int grid_offset = 0;
for(i = blade_start; i < lip_start; i++) {
t = 1.1 * M_PI *
(SKFLT)(tongue_index - i)/(tip_start - blade_start);
fixed_tongue_diameter = 2+(tongue_diameter-2)/1.5;
curve = (1.5 - fixed_tongue_diameter + grid_offset) * cos(t);
if(i == blade_start - 2 || i == lip_start - 1) curve *= 0.8;
if(i == blade_start || i == lip_start - 2) curve *= 0.94;
diameters[i] = 1.5 - curve;
}
}
void sk_tract_tongue_shape(sk_tract *tract,
SKFLT position,
SKFLT diameter)
{
position = 12 + 16.0 * position;
diameter = 3.5 * diameter;
set_diameters(tract, 10, 39, 32,
position, diameter, tract->diameter);
}
Area Manipulation
Sometimes it might be more adventageous to set the areas directly, rather than using the diameters (which are then squared and set as the area).
To use areas directly, diameter control must be turned off.
This done by setting the use_diameters
flag to be false via
sk_tract_use_diameters
.
Area shapes can be set using the function
sk_tract_set_area_shape
. This will set the shape of the area
to be an array of size sz
. If the array is larger than
the current tract size, it will be truncated. If it is
smaller, the last values will be padded with the last tract
sample in the array.
void sk_tract_set_area_shape(sk_tract *tr, SKFLT *areas, int sz);
void sk_tract_set_area_shape(sk_tract *tr, SKFLT *areas, int sz)
{
int n;
SKFLT last;
last = 0;
for (n = 0; n < tr->n; n++) {
if (n >= sz) {
tr->A[n] = last;
} else {
tr->A[n] = areas[n];
last = areas[n];
}
}
}
Get the area array directly with sk_tract_areas
.
SKFLT* sk_tract_areas(sk_tract *tr);
SKFLT* sk_tract_areas(sk_tract *tr)
{
return tr->A;
}
Velum Control
In articulatory synthesis, a velum
parameter is used to
simulate the airflow that runs through the
nasal passageways that occurs from blocking
the soft palette. Increasing the velum parameter will end
up making the resulting output sound more nasally.
Variables
Nose size is hardcoded to be a size of 28.
Similar to the main Vocal tract, the Nasal airway also
is made of a waveguide (a bidirectional delay represented
in noseL
and noseR
) and scattering junctions
(nose_junc_outL
and nose_junc_outR
). The
nasal passaged way is shaped with nose_diameter
, which
are then converted to areas in noseA
.
The velum
variable stores the velum amount, typically
a value between 0 and 1.
The nasal waveguide component is hooked up at one of
the cylindrical segments of the tract waveguide component,
indicated by nose_start
. This is hard-coded to be 17.
SKFLT nose_diameter[28];
SKFLT noseL[28];
SKFLT noseR[28];
SKFLT noseA[28];
SKFLT nose_reflection[28];
SKFLT nose_junc_outL[28];
SKFLT nose_junc_outR[28];
SKFLT velum;
SKFLT reflection_left;
SKFLT reflection_right;
SKFLT reflection_nose;
int nose_start;
tr->velum = 0;
tr->nose_start = 17;
tr->reflection_left = 0;
tr->reflection_right = 0;
memset(tr->noseL, 0, 28 * sizeof(SKFLT));
memset(tr->noseR, 0, 28 * sizeof(SKFLT));
memset(tr->noseA, 0, 28 * sizeof(SKFLT));
Use Velum Flag
int use_velum;
sk_tract_use_velum(tr, 0);
void sk_tract_use_velum(sk_tract *tr, int mode);
void sk_tract_use_velum(sk_tract *tr, int mode)
{
tr->use_velum = mode;
}
Setting the Velum
void sk_tract_velum(sk_tract *tract, SKFLT velum);
void sk_tract_velum(sk_tract *tract, SKFLT velum)
{
tract->velum = velum;
}
Shaping The Nose
The shape of the nasal airway is only done once at init-time. The function used produces a clipped triangle wave shape.
This was adapted from the original Pink Trombone source. It's unclear how this particular shape was derived, but it looks empiricially discovered.
{
int i;
for (i = 0; i < 28; i++) {
SKFLT d;
d = 2 * ((SKFLT)i / 28);
if(d < 1) {
d = 0.4 + 1.6 * d;
} else {
d = 0.5 + 1.5*(2-d);
}
d = d < 1.9 ? d : 1.9;
tr->nose_diameter[i] = d;
}
calculate_nose_reflections(tr);
}
Calculate Nose Reflections
Similar to calculate_reflections
, the
calculate_nose_reflections
function converts nose
diameters to areas, and then computes the reflection
coefficients.
static void calculate_nose_reflections(sk_tract *tr);
static void calculate_nose_reflections(sk_tract *tr)
{
int i;
for(i = 0; i < 28; i++) {
tr->noseA[i] = tr->nose_diameter[i] * tr->nose_diameter[i];
}
for(i = 1; i < 28; i++) {
tr->nose_reflection[i] = (tr->noseA[i - 1] - tr->noseA[i]) /
(tr->noseA[i-1] + tr->noseA[i]);
}
}
Velum Computation (Nasal Waveguide Computation)
Calculate Nasal Reflection Coefficients
Called from inside calculate_reflections
, this uses the
velum parameter to calculate the reflection coefficients
used by the nasal waveguide. Velum in this context sets
up the initial diameter segment, which has the effect
of controlling the opening of nasal airway. As the
velum value gets larger, the amount of nasal sound in
the output increases.
if (tr->use_velum) {
SKFLT sum;
tr->nose_diameter[0] = tr->velum;
tr->noseA[0] = tr->nose_diameter[0] * tr->nose_diameter[0];
sum = tr->A[tr->nose_start] + tr->A[tr->nose_start + 1] + tr->noseA[0];
tr->reflection_left = (SKFLT)(2 * tr->A[tr->nose_start] - sum) / sum;
tr->reflection_right = (SKFLT)(2 * tr->A[tr->nose_start + 1] - sum) / sum;
tr->reflection_nose = (SKFLT)(2 * tr->noseA[0] - sum) / sum;
}
Compute The Nasal Waveguide
Is use_velum
is enabled, it will kick on computation
for the Nasal Waveguide, which gets computed after the
main tract waveguide.
First, the junction outputs for the tract and nose are updated. In a way, this "connects" the nasal waveguide to the tract waveguide.
After that, the segments in nasal waveguide are updated.
if (tr->use_velum) {
i = tr->nose_start;
r = tr->reflection_left;
tr->junction_outL[i - 1] = r*tr->R[i-1] +
(1+r)*(tr->noseL[0]+tr->L[i]);
r = tr->reflection_right;
tr->junction_outR[i] = r*tr->L[i] +
(1+r)*(tr->R[i-1]+tr->noseL[0]);
r = tr->reflection_nose;
tr->nose_junc_outR[0] =
r * tr->noseL[0]+(1+r)*(tr->L[i]+tr->R[i-1]);
tr->nose_junc_outL[28 - 1] =
tr->noseR[28 - 1] * tr->lip_reflection;
for(i = 1; i < 28; i++) {
w = tr->nose_reflection[i] * (tr->noseR[i-1] + tr->noseL[i]);
tr->nose_junc_outR[i] = tr->noseR[i - 1] - w;
tr->nose_junc_outL[i - 1] = tr->noseL[i] + w;
}
for(i = 0; i < 28; i++) {
tr->noseR[i] = tr->nose_junc_outR[i];
tr->noseL[i] = tr->nose_junc_outL[i];
}
}
Add nasal component to signal output
if (tr->use_velum) {
out += tr->noseR[28 - 1];
}
Computing Audio
Tick Function
A single sample of audio is computed with sk_tract_tick
.
It expects an input signal in
, and returns a single
sample.
SKFLT sk_tract_tick(sk_tract *tract, SKFLT in);
There are two main things that happen here. First, the reflection coefficients for the ladder filter are calculated. Then, the input is computed. The output is then scaled and returned.
Note that tract_compute
is called twice, an artifact from
Pink Trombone. Most likely this is done to simulate
forward and backward propogation.
SKFLT sk_tract_tick(sk_tract *tr, SKFLT in)
{
SKFLT tmp;
SKFLT out;
out = 0;
calculate_reflections(tr);
tmp = 0;
/* compute twice for forwards/backwards propogation */
tmp += tract_compute(tr, in);
tmp += tract_compute(tr, in);
out = tmp * 0.125;
return out;
}
Calculate Reflections
static void calculate_reflections(sk_tract *tr);
static void calculate_reflections(sk_tract *tr)
{
int i;
SKFLT *diam;
diam = tr->diameter;
<<shapeit>>
<<calculate_areas>>
<<calculate_reflection_coefficients>>
<<calculate_nasal_reflection_coefficients>>
}
If the shape
function exists, call it.
If use_diameters
is enabled (by default it is),
it will pass in the diameters as an output. Otherwise,
it will pass in the areas A
directly.
if (tr->shape != NULL) {
if (tr->use_diameters)
tr->shape(tr, tr->diameter, tr->ud);
else
tr->shape(tr, tr->A, tr->ud);
}
The cross-sectional areas are calculated by squaring
the input diameters. This will only happen if
use_diameters
is enabled.
if (tr->use_diameters) {
for(i = 0; i < tr->n; i++) {
tr->A[i] = diam[i] * diam[i];
}
}
The reflection coefficients are calculated from the computed areas. This is the difference between neighboring areas over their sum:
Where
is known as the
scattering coefficient
or reflection coefficient
, and
are the areas.
(Adapted from Perry Cook's Real Sound Synthesis for interactive Applications, found on pg. 230)
To prevent numerical issues, reflections are sent to a close-to-1 value if the area is exactly 0.
for(i = 1; i < tr->n; i++) {
if(tr->A[i] == 0) {
tr->reflection[i] = 0.999; /* to prevent bad behavior if 0 */
} else {
tr->reflection[i] =
(tr->A[i - 1] - tr->A[i]) / (tr->A[i - 1] + tr->A[i]);
}
}
Tract Compute
A single pass of tract computation is done with
tract_compute
.
static SKFLT tract_compute(sk_tract *tr, SKFLT in);
static SKFLT tract_compute(sk_tract *tr, SKFLT in)
{
SKFLT r, w;
int i;
SKFLT out;
out = 0;
<<initial_junction_outputs>>
<<compute_scattering_junctions>>
<<update_delay_lines>>
<<get_output>>
<<add_velum_component>>
return out;
}
The tract has air flow moving in two directions. The right direction is glottis. The left direction are the lips.
tr->junction_outR[0] = tr->L[0] * tr->glottal_reflection + in;
tr->junction_outL[tr->n - 1] = tr->R[tr->n - 1] * tr->lip_reflection;
First, the left/right junction outputs are computed.
The variable names used here are adapted from Jack Mullen's PhD dissertation, on the section on BiDirectional Waveguide Composition, in section 2.5.2, figure 2.77:
for(i = 1; i < tr->n; i++) {
r = tr->reflection[i];
w = r * (tr->R[i - 1] + tr->L[i]);
tr->junction_outR[i] = tr->R[i - 1] - w;
tr->junction_outL[i - 1] = tr->L[i] + w;
}
<<velum_computation>>
The left and right delay lines are updated as attenuated versions of the junction ouputs.
for(i = 0; i < tr->n; i++) {
tr->R[i] = tr->junction_outR[i]*0.999;
tr->L[i] = tr->junction_outL[i]*0.999;
}
The output signal is the last sample in the right-moving delay line.
out = tr->R[tr->n - 1];