Tract

Overview

Tract implements a rudimentary physical model of a vocal tract for use in articulatory synthesis. It is based on the classic Kelly-Lochbaum model developed at Bell Labs in the early 60s. In recent years, such a model has gotten new breath (so to speak) in programs like Pink Trombone, from which this particular was derived from.

In order to be both "physically based" and practically computable, the model crudely pretends that our vocal tract is a series of cylindrical tubes with varying diameters. Changing the diameters in various ways will result in different phonemes to come out the other side. Almost by magic.

What combinations of diameters produce what kind of phonemes? It's not a straightforward answer, and new methods for finding ideal shapes are still an ongoing area of research. In the early days, measurements were originally obtained from real vocal tracts!

The tract is a filter that expects some kind of external input source (in speech synthesis, this relationship is known as a "source-filter" model). The closer that input resembles the kind of sound our glottis makes, the more vocal-like the output sounds will turn out. In programs like Pink Trombone, an analytical model of the glottis was used as the source signal for it's vocal tract filter.

Compared to Pink Trombone, this model is greatly simplified. Both implement the digital waveguide as a ladder filter consisting of 44 Kelly-Lochbaum scattering junctions. In this model, the nasal component (known as velum control) can be optionally enabled. Transient generator in PT, used whenever there were blocking interferences in the vocal tract, has been removed. An implementation that more closely resembles PT can be found in Voc.

Tangled files

Tract tangles into two main files: tract.c and tract.h. Defining SK_TRACT_PRIV will expose the struct.

<<tract.c>>=

#include <stdlib.h>
#include <math.h>
#include <string.h>

#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif

#define SK_TRACT_PRIV
#include "tract.h"

<<static_funcdefs>>
<<funcs>>

<<tract.h>>=

#ifndef SK_TRACT_H
#define SK_TRACT_H
#ifndef SKFLT
#define SKFLT float
#endif
<<typedefs>>
<<funcdefs>>
#ifdef SK_TRACT_PRIV
<<structs>>
#endif
#endif

Struct and Initialization

The core struct is defined as sk_tract.

<<typedefs>>=

typedef struct sk_tract sk_tract;

<<structs>>=

struct sk_tract {
    <<sk_tract>>
};

The struct is initialized with sk_tract_init.

<<funcdefs>>=

void sk_tract_init(sk_tract *tr);

<<funcs>>=

void sk_tract_init(sk_tract *tr)
{
    <<init>>
    calculate_reflections(tr);
}

Components and Variables

Glottal and Lip Reflection constants

Glottal and lip reflection constants. These are set to be 0.75 and -0.85, respectively.

<<sk_tract>>=

SKFLT glottal_reflection;
SKFLT lip_reflection;

<<init>>=

tr->glottal_reflection = 0.75;
tr->lip_reflection = -0.85;

Tract Size

The active size of the vocal tract is determined by a variable n. It can have a maximum value of 44.

<<sk_tract>>=

int n;

<<init>>=

tr->n = 44;

Diameters

The diameters array are inputs available to manipulate the vocal tract.

<<sk_tract>>=

SKFLT diameter[44];

<<init>>=

memset(tr->diameter, 0, 44 * sizeof(SKFLT));
<<initial_shape>>

The diameters get set to an initial shape. This is the one Neil Thapen uses in Pink Trombone:

<<initial_shape>>=

{
    int i;

    for(i = 0; i < tr->n; i++) {
        SKFLT diameter = 0;
        if(i < 7 - 0.5) {
            diameter = 0.6;
        } else if( i < 12) {
            diameter = 1.1;
        } else {
            diameter = 1.5;
        }

        tr->diameter[i] = diameter;
    }
}

Areas

Diameters get converted into areas and stored in the array A, and areas get converted into reflection coefficients stored in the array reflection.

<<sk_tract>>=

SKFLT A[44];
SKFLT reflection[44];

<<init>>=

memset(tr->A, 0, 44 * sizeof(SKFLT));
memset(tr->reflection, 0, 44 * sizeof(SKFLT));

Scattering Junction Outputs and Bidirectional Delay Line

Each scattering junction has an left and right output corresponding to the direction of pressure, as well as filter memory. The junction output arrays are known as junction_outL and junction_outR, respectively. Additional Left/Right arrays are used to implement the bidirectional delay line for the underlying waveguide.

<<sk_tract>>=

SKFLT junction_outL[44];
SKFLT L[44];
SKFLT junction_outR[44];
SKFLT R[44];

<<init>>=

memset(tr->junction_outL, 0, 44 * sizeof(SKFLT));
memset(tr->L, 0, 44 * sizeof(SKFLT));
memset(tr->junction_outR, 0, 44 * sizeof(SKFLT));
memset(tr->R, 0, 44 * sizeof(SKFLT));

User-Supplied Shape Callback

A user-supplied shape callback can be used to shape the diameters at audio-rate, allowing for things like smoothing filters. A user-data void struct is also included here.

<<sk_tract>>=

void *ud;
void (*shape)(sk_tract *, SKFLT *, void *);

<<init>>=

tr->shape = NULL;
tr->ud = NULL;

Use Diameters Flag

The use_diameters flag can be used to enable/disable the diameter control. If disabled, areas can be directly manipulated.

<<sk_tract>>=

int use_diameters;

<<init>>=

sk_tract_use_diameters(tr, 1);

It is set with the function sk_tract_use_diameters, where mode is true (1) or false (0).

<<funcdefs>>=

void sk_tract_use_diameters(sk_tract *tr, int mode);

<<funcs>>=

void sk_tract_use_diameters(sk_tract *tr, int mode)
{
    tr->use_diameters = mode;
}

Diameter Manipulation

The vocal tract is controlled by mainpulating the underlying diameter sizes. These can be directly accessed via sk_tract_diameters.

Get Diameters

Returns the array holding the diameter values.

<<funcdefs>>=

SKFLT* sk_tract_diameters(sk_tract *tr);

<<funcs>>=

SKFLT* sk_tract_diameters(sk_tract *tr)
{
    return tr->diameter;
}

Tract Size

The number of diameters can be retrieved with sk_tract_size. Usually, this is 44.

<<funcdefs>>=

int sk_tract_size(sk_tract *tr);

<<funcs>>=

int sk_tract_size(sk_tract *tr)
{
    return tr->n;
}

Shaper Function

More often than not, one wants to apply sample-accurate smoothing to the diameters rather than work them directly. This is done using a callback interface, known as a shaper.

The function sk_tract_shaper sets up a shaper callback. It takes in the shaper callback as well as any external user data needed to manage state in that callback.

This function gets called at every sample, and takes in three arguments: the sk_tract struct, the output array to write to, and the externally managed user data.

In practice, one potential approach is to use a filterbank of 44 smoothing filters to control the diameter shapes. This allows diameters to be controlled without producing any artificats caused by large discontinuities.

<<funcdefs>>=

void sk_tract_shaper(sk_tract *tract,
                     void (*shape)(sk_tract *, SKFLT *, void *),
                     void *ud);

<<funcs>>=

void sk_tract_shaper(sk_tract *tract,
                     void (*shape)(sk_tract *, SKFLT *, void *),
                     void *ud)
{
    tract->shape = shape;
    tract->ud = ud;
}

Tongue Control

Neil Thapen's Pink Trombone employs a curious "tongue control" functionality, which allows one use 2 dimensions of control to shape the entire tract. This is also the underlying control mechanism for Voc.

This behavior is ported in the function sk_tract_tongue_shape, where position and diameter are both normalized floating point values.

<<funcdefs>>=

void sk_tract_tongue_shape(sk_tract *tract,
                           SKFLT position,
                           SKFLT diameter);

<<funcs>>=

static void set_diameters(sk_tract *tract,
                          int blade_start,
                          int lip_start,
                          int tip_start,
                          SKFLT tongue_index,
                          SKFLT tongue_diameter,
                          SKFLT *diameters)
{
    int i;
    SKFLT t;
    SKFLT fixed_tongue_diameter;
    SKFLT curve;
    int grid_offset = 0;

    for(i = blade_start; i < lip_start; i++) {
        t = 1.1 * M_PI *
            (SKFLT)(tongue_index - i)/(tip_start - blade_start);
        fixed_tongue_diameter = 2+(tongue_diameter-2)/1.5;
        curve = (1.5 - fixed_tongue_diameter + grid_offset) * cos(t);
        if(i == blade_start - 2 || i == lip_start - 1) curve *= 0.8;
        if(i == blade_start || i == lip_start - 2) curve *= 0.94;
        diameters[i] = 1.5 - curve;
    }
}

void sk_tract_tongue_shape(sk_tract *tract,
                           SKFLT position,
                           SKFLT diameter)
{
    position = 12 + 16.0 * position;
    diameter = 3.5 * diameter;
    set_diameters(tract, 10, 39, 32,
                  position, diameter, tract->diameter);
}

Area Manipulation

Sometimes it might be more adventageous to set the areas directly, rather than using the diameters (which are then squared and set as the area).

To use areas directly, diameter control must be turned off. This done by setting the use_diameters flag to be false via sk_tract_use_diameters.

Area shapes can be set using the function sk_tract_set_area_shape. This will set the shape of the area to be an array of size sz. If the array is larger than the current tract size, it will be truncated. If it is smaller, the last values will be padded with the last tract sample in the array.

<<funcdefs>>=

void sk_tract_set_area_shape(sk_tract *tr, SKFLT *areas, int sz);

<<funcs>>=

void sk_tract_set_area_shape(sk_tract *tr, SKFLT *areas, int sz)
{
    int n;
    SKFLT last;

    last = 0;

    for (n = 0; n < tr->n; n++) {
        if (n >= sz) {
            tr->A[n] = last;
        } else {
            tr->A[n] = areas[n];
            last = areas[n];
        }
    }
}

Get the area array directly with sk_tract_areas.

<<funcdefs>>=

SKFLT* sk_tract_areas(sk_tract *tr);

<<funcs>>=

SKFLT* sk_tract_areas(sk_tract *tr)
{
    return tr->A;
}

Velum Control

In articulatory synthesis, a velum parameter is used to simulate the airflow that runs through the nasal passageways that occurs from blocking the soft palette. Increasing the velum parameter will end up making the resulting output sound more nasally.

Variables

Nose size is hardcoded to be a size of 28.

Similar to the main Vocal tract, the Nasal airway also is made of a waveguide (a bidirectional delay represented in noseL and noseR) and scattering junctions (nose_junc_outL and nose_junc_outR). The nasal passaged way is shaped with nose_diameter, which are then converted to areas in noseA.

The velum variable stores the velum amount, typically a value between 0 and 1.

The nasal waveguide component is hooked up at one of the cylindrical segments of the tract waveguide component, indicated by nose_start. This is hard-coded to be 17.

<<sk_tract>>=

    SKFLT nose_diameter[28];
    SKFLT noseL[28];
    SKFLT noseR[28];
    SKFLT noseA[28];
    SKFLT nose_reflection[28];
    SKFLT nose_junc_outL[28];
    SKFLT nose_junc_outR[28];
    SKFLT velum;
    SKFLT reflection_left;
    SKFLT reflection_right;
    SKFLT reflection_nose;
    int nose_start;

<<init>>=

tr->velum = 0;
tr->nose_start = 17;
tr->reflection_left = 0;
tr->reflection_right = 0;
memset(tr->noseL, 0, 28 * sizeof(SKFLT));
memset(tr->noseR, 0, 28 * sizeof(SKFLT));
memset(tr->noseA, 0, 28 * sizeof(SKFLT));

Use Velum Flag

<<sk_tract>>=

int use_velum;

<<init>>=

sk_tract_use_velum(tr, 0);

<<funcdefs>>=

void sk_tract_use_velum(sk_tract *tr, int mode);

<<funcs>>=

void sk_tract_use_velum(sk_tract *tr, int mode)
{
    tr->use_velum = mode;
}

Setting the Velum

<<funcdefs>>=

void sk_tract_velum(sk_tract *tract, SKFLT velum);

<<funcs>>=

void sk_tract_velum(sk_tract *tract, SKFLT velum)
{
    tract->velum = velum;
}

Shaping The Nose

The shape of the nasal airway is only done once at init-time. The function used produces a clipped triangle wave shape.

This was adapted from the original Pink Trombone source. It's unclear how this particular shape was derived, but it looks empiricially discovered.

<<init>>=

{
    int i;
    for (i = 0; i < 28; i++) {
        SKFLT d;
        d = 2 * ((SKFLT)i / 28);
        if(d < 1) {
            d = 0.4 + 1.6 * d;
        } else {
            d = 0.5 + 1.5*(2-d);
        }
        d = d < 1.9 ? d : 1.9;
        tr->nose_diameter[i] = d;
    }
    calculate_nose_reflections(tr);
}

Calculate Nose Reflections

Similar to calculate_reflections, the calculate_nose_reflections function converts nose diameters to areas, and then computes the reflection coefficients.

<<static_funcdefs>>=

static void calculate_nose_reflections(sk_tract *tr);

<<funcs>>=

static void calculate_nose_reflections(sk_tract *tr)
{
    int i;

    for(i = 0; i < 28; i++) {
        tr->noseA[i] = tr->nose_diameter[i] * tr->nose_diameter[i];
    }

    for(i = 1; i < 28; i++) {
        tr->nose_reflection[i] = (tr->noseA[i - 1] - tr->noseA[i]) /
            (tr->noseA[i-1] + tr->noseA[i]);
    }
}

Velum Computation (Nasal Waveguide Computation)

Calculate Nasal Reflection Coefficients

Called from inside calculate_reflections, this uses the velum parameter to calculate the reflection coefficients used by the nasal waveguide. Velum in this context sets up the initial diameter segment, which has the effect of controlling the opening of nasal airway. As the velum value gets larger, the amount of nasal sound in the output increases.

<<calculate_nasal_reflection_coefficients>>=

if (tr->use_velum) {
    SKFLT sum;
    tr->nose_diameter[0] = tr->velum;
    tr->noseA[0] = tr->nose_diameter[0] * tr->nose_diameter[0];
    sum = tr->A[tr->nose_start] + tr->A[tr->nose_start + 1] + tr->noseA[0];
    tr->reflection_left = (SKFLT)(2 * tr->A[tr->nose_start] - sum) / sum;
    tr->reflection_right = (SKFLT)(2 * tr->A[tr->nose_start + 1] - sum) / sum;
    tr->reflection_nose = (SKFLT)(2 * tr->noseA[0] - sum) / sum;
}

Compute The Nasal Waveguide

Is use_velum is enabled, it will kick on computation for the Nasal Waveguide, which gets computed after the main tract waveguide.

First, the junction outputs for the tract and nose are updated. In a way, this "connects" the nasal waveguide to the tract waveguide.

After that, the segments in nasal waveguide are updated.

<<velum_computation>>=

if (tr->use_velum) {
    i = tr->nose_start;
    r = tr->reflection_left;

    tr->junction_outL[i - 1] = r*tr->R[i-1] +
        (1+r)*(tr->noseL[0]+tr->L[i]);

    r = tr->reflection_right;
    tr->junction_outR[i] = r*tr->L[i] +
        (1+r)*(tr->R[i-1]+tr->noseL[0]);

    r = tr->reflection_nose;
    tr->nose_junc_outR[0] =
        r * tr->noseL[0]+(1+r)*(tr->L[i]+tr->R[i-1]);
    tr->nose_junc_outL[28 - 1] =
        tr->noseR[28 - 1] * tr->lip_reflection;

    for(i = 1; i < 28; i++) {
        w = tr->nose_reflection[i] * (tr->noseR[i-1] + tr->noseL[i]);
        tr->nose_junc_outR[i] = tr->noseR[i - 1] - w;
        tr->nose_junc_outL[i - 1] = tr->noseL[i] + w;
    }

    for(i = 0; i < 28; i++) {
        tr->noseR[i] = tr->nose_junc_outR[i];
        tr->noseL[i] = tr->nose_junc_outL[i];
    }
}

Add nasal component to signal output

<<add_velum_component>>=

if (tr->use_velum) {
    out += tr->noseR[28 - 1];
}

Computing Audio

Tick Function

A single sample of audio is computed with sk_tract_tick. It expects an input signal in, and returns a single sample.

<<funcdefs>>=

SKFLT sk_tract_tick(sk_tract *tract, SKFLT in);

There are two main things that happen here. First, the reflection coefficients for the ladder filter are calculated. Then, the input is computed. The output is then scaled and returned.

Note that tract_compute is called twice, an artifact from Pink Trombone. Most likely this is done to simulate forward and backward propogation.

<<funcs>>=

SKFLT sk_tract_tick(sk_tract *tr, SKFLT in)
{
    SKFLT tmp;
    SKFLT out;

    out = 0;

    calculate_reflections(tr);
    tmp = 0;

    /* compute twice for forwards/backwards propogation */
    tmp += tract_compute(tr, in);
    tmp += tract_compute(tr, in);

    out = tmp * 0.125;

    return out;
}

Calculate Reflections

<<static_funcdefs>>=

static void calculate_reflections(sk_tract *tr);

<<funcs>>=

static void calculate_reflections(sk_tract *tr)
{
    int i;
    SKFLT *diam;

    diam = tr->diameter;

    <<shapeit>>
    <<calculate_areas>>
    <<calculate_reflection_coefficients>>
    <<calculate_nasal_reflection_coefficients>>
}

If the shape function exists, call it. If use_diameters is enabled (by default it is), it will pass in the diameters as an output. Otherwise, it will pass in the areas A directly.

<<shapeit>>=

if (tr->shape != NULL) {
    if (tr->use_diameters)
        tr->shape(tr, tr->diameter, tr->ud);
    else
        tr->shape(tr, tr->A, tr->ud);
}

The cross-sectional areas are calculated by squaring the input diameters. This will only happen if use_diameters is enabled.

<<calculate_areas>>=

if (tr->use_diameters) {
    for(i = 0; i < tr->n; i++) {
        tr->A[i] = diam[i] * diam[i];
    }
}

The reflection coefficients are calculated from the computed areas. This is the difference between neighboring areas over their sum:

$k_n = {{A_n - A_{n - 1}} \over {A_{n} + A_{n - 1}}}$

Where k_n is known as the scattering coefficient or reflection coefficient, and are the areas.

(Adapted from Perry Cook's Real Sound Synthesis for interactive Applications, found on pg. 230)

To prevent numerical issues, reflections are sent to a close-to-1 value if the area is exactly 0.

<<calculate_reflection_coefficients>>=

for(i = 1; i < tr->n; i++) {
    if(tr->A[i] == 0) {
        tr->reflection[i] = 0.999; /* to prevent bad behavior if 0 */
    } else {
        tr->reflection[i] =
            (tr->A[i - 1] - tr->A[i]) / (tr->A[i - 1] + tr->A[i]);
    }
}

Tract Compute

A single pass of tract computation is done with tract_compute.

<<static_funcdefs>>=

static SKFLT tract_compute(sk_tract *tr, SKFLT in);

<<funcs>>=

static SKFLT tract_compute(sk_tract *tr, SKFLT in)
{
    SKFLT  r, w;
    int i;
    SKFLT out;

    out = 0;

    <<initial_junction_outputs>>
    <<compute_scattering_junctions>>
    <<update_delay_lines>>
    <<get_output>>
    <<add_velum_component>>

    return out;
}

The tract has air flow moving in two directions. The right direction is glottis. The left direction are the lips.

<<initial_junction_outputs>>=

tr->junction_outR[0] = tr->L[0] * tr->glottal_reflection + in;
tr->junction_outL[tr->n - 1] = tr->R[tr->n - 1] * tr->lip_reflection;

First, the left/right junction outputs are computed.

The variable names used here are adapted from Jack Mullen's PhD dissertation, on the section on BiDirectional Waveguide Composition, in section 2.5.2, figure 2.77:

$\eqalign{ w &= r[p^{+}_i - p^{+}_{i + 1}] \cr p^{-}_{i} &= p^{+}_{i + 1} + w \cr p^{-}_{i + 1} &= p^{+}_{i} + w \cr }$

<<compute_scattering_junctions>>=

for(i = 1; i < tr->n; i++) {
    r = tr->reflection[i];
    w = r * (tr->R[i - 1] + tr->L[i]);
    tr->junction_outR[i] = tr->R[i - 1] - w;
    tr->junction_outL[i - 1] = tr->L[i] + w;
}
<<velum_computation>>

The left and right delay lines are updated as attenuated versions of the junction ouputs.

<<update_delay_lines>>=

for(i = 0; i < tr->n; i++) {
    tr->R[i] = tr->junction_outR[i]*0.999;
    tr->L[i] = tr->junction_outL[i]*0.999;
}

The output signal is the last sample in the right-moving delay line.

<<get_output>>=

out = tr->R[tr->n - 1];