Voc is a physical model of the vocal tract. It is based off of Neil Thapen's Pink Trombone, but rewritten in portable ANSI C code.
Voc uses the classic Kelly-Lochbaum vocal synthesis technique, one of the earliest attempts at digital physical modelling. The Kelly-Lochbaum approach approximates the vocal tract as tube-model, consisting of a series of cylinders with varying diameters. The excitation signal used with the Tract is the Liljencrants-Fant glottal pulse model, a highly accurate mathematical approximation of the glottal wave.
Voc is written using Literate Programming, a programming paradigm created by Donald Knuth of TeX and The Art of Computer Programming Fame. In literate programming, one melts natural language and computer code together in order to better articulate what the program does. In some ways, it can be thought of a very integrated form of code documentation. Voc uses a program called CWEB, the classic literate documentation system developed by Knuth himself and Silvio Levy.
As a result of literate programming, Voc has two formats. One format is the C code itself, designed to be read by compilers. The other format is TeX code, which can be compiled into a readable PDF document.
The main motiviation behind using a literate documentation system like CWEB is is in the fact that while C is great language for implementing audio DSP algorithms, it is a sore language for trying to understand them. With CWEB, one can use plain english and TeX's math mode to better convey some of concepts in the implementation.
The source code to Voc can be found on github.
The generated output of cweave of can be found here. The title page has the git commit hash, which can be used to see how updated the document is with the upstream code.
For those wishing to try out Voc with minimum hassle, there is a realtime voc demo which is able to compile for both Mac OSX and Linux systems running JACK. Soundpipe and GLFW3 are needed dependencies.
Sporth Examples and Visualizations
The videos below showcase Voc being used in a musical context. The sounds were synthesized using Sporth and the Sporth plugin implementation of Voc. The visuals were generated via Cairo, FFMPEG, and Runt via Pixku.
Each example has two videos that accompany it.
The first video visualizes the four main parameters of the Voc Sporth Ugen.
- Both position and diameter are macro controls for the vocal tract which loosely approximate tongue control.
- The parameter Tenseness controls the tenseness of the glottis, as well as the amount of aspiration noise in the signal.
- The parameter velum controls the area of the velum, or soft pallette. Larger values cause more nasal values to happen.
Frequency is not visualized.
The second video visualizes how the vocal tract is changing as a two dimensional plot of each of the diameters.
The first real patch created in Sporth using Voc was Babble. Babble makes good use of interpolated random number generators and jitter to produce sounds that mimic a babbling person.
The patch Chant was an attempt to build a constrasting patch to Babble, this time focusing on slow moving periodic modulations instead of fast and random ones. Voc is able to turn into something resembling a chanting monk by picking a low fundamental frequency, a high velum, low frequency sinusoidal modulation of the position, as well as a high diameter parameter. A lowend boost and reverb is added to taste.
This was an attempt to find interesting values from the tongue control parameters, as well as build something tonal. Both diameter and position are mapped to clocked envelope generators, which have the effect of going between two vowel states. In this case, they make a sound which approximates the nonsensical word "unya".