Weekly Update 3: Synthesizing the Voice of a Tiny Creature
Suddenly, a Tiny Being
This week, I found myself synthesizing the voice of a Tiny Creature. Here's what it ended up sounding like:
What's special for me here is that the voice is tiny. It is a new trick I managed to teach my singing computer this week. It is something I managed to figure out while exploring some other singing synthesizers and trying to dig into how they work.
Tiny Creatures Are Narrative Magnets
Constructed sounds like these interest me because of the way they conjure up story and character. This sound is entirely synthesized from math and code, yet my brain can't help but put a face and backstory on it. For example, I like to imagine this voice coming from some kind of Tree Sprite, excitedly telling you about their morning adventure searching for mushrooms up by the swamp. This kind of sound design with an attached character narrative around is mechanic I like to play with, as can be seen/heard with goblins and computer on the phone with his mother.
Choir: A Compelling Interactive Singing Barbershop Quartet
The Choir interactive musical interface found on Adult Swim is a big inspiration for me, right up there with Pink Trombone. It's made by the same people who did Blob Opera, only this one was made first, and to me it sounds better (Blob Opera sounds a little over-engineered to me: they got too into making different phonemes with fricatives, and there's too much reverb).
My Previous Attempts To Synthesize Vocal Ensembles
Before we can talk about Synthesize Tiny Voices, we need to talk about Synthesizing singing voice ensembles.
By Singing Computer standards, Choir is an excellent sounding ensemble, and this is one of the primary reasons why I love it so much. Humans singers work very hard to blend when they are singing together. Robot singers tend to have the opposite problem of being so precise that the voices melt into eachother, resulting in a thin texture.
I know how robot singing enembles fail from personal experience. You could say there's some history.
In an experiment I did back in 2021, I worked out a barbershop tag arrangement using 4 instances of an iteration of my vocal synthesizer. In my attempt, I added variations to pitch, vibrato depth, and maybe even vowel shape to try and make the voices pop out more in the mix.
The end result is okay, but it certainly wasn't "Choir", and not even close to richness of the human performed version. Golly, what a sound. Wish I could get stuff to sound like that.
This wasn't my only attempt at ensembles. Later that year, I would try to do ensembles a few more times for looptober. You can hear how the synthesized voices singing together begin to start to sound like an organ:
You can also hear this organ-like in the background vocals in this doo-woppy arrangment here:
Things got a little bit better when I started using more "breathy alto" and less "intense tenor/baritone". The textures in these little "bossa nova" and "jazzy" sketches are a bit of an improvement, but still the voices "clump".
I started trying to add small differences to the voices in ways I knew how, like in pitch and timing. This also helped a bit for this bit inspired by Howard Shore's Passing of the Elves:
This piece, which makes me think of singing angels, has some good vocal textures to it as well, but I think it's less to do with the voices and more to do with "pretty chords" and slides (the pitch sliding was a great suggestion by someone, and it helped a lot):
So, what does Choir do to get their singing synthesizers to sound so good together? Truthfully, I am still not 100% sure, but this week I found some pretty strong clues. (And yes, this will lead up to the tiny voices).
Hunting for Clues in "Choir"
One particular parameter of interest to me was something called "tract_scale". Each of the 4 voices had a slightly different scale, set to some value close to 1, like 1.02, 1.03, and 0.93. As an experiment, I isolated the "lead" vocalist, and cranked its "tract_scale" up to 2. When I reloaded the Choir program, I had a voice that sounded like a singing child. Not only that, but it seemed to be singing the same "ah" vowel sound at the same pitch, just from a smaller voice. Counterintuitively, a larger tract scale seemed to create a vocal tract that with a smaller length. Different vowel sounds have particular shapes. This seemd to be taking a particular shape, and stretching or squashing it to fit the overall tract.
Tract scale seems to be one of the ways Choir is able to get such a rich texture. By slightly varying to size of each singers vocal tract, each singer is given a slightly different timbre and vocal quality, which in turn adds more distinctness to the mix.
Teaching the singing computer new tricks...
But was it really that easy? Treating a vowel like a rubber mask and stretching it out to fit on a tract of particular length? That mechanical singing voice that I accidentally miniaturized was a siren call for me. There was no way I wasn't going to have this kind of sound in my system. I now knew what I was missing. I needed tiny voices. I had to try.
With my text editor opened up to the code of my singing synthesizer, I quickly chiselled out a crude equivalent of "cranking the tract scale to 2" like I had done in Choir. Sure enough, the chipmunkification of the voice could be heard. Go figure. I garnished it with a bit of character and enthusiasm, and created the tiny being played at the beginning of this blog post.
This was a only proof of concept, and I had indeed proven the concept. The next phase was actually building something I could use more than once. So, I coded up a new version of the vocal tract component, where the overall length could be adjusted. I taught a critter how to say "oo", then "ah", then how to wildly alternate between the two:
Overall, this week has had some great unexpected discoveries. Vocal tract scaling gives some good amunition for the Gestlings project, whose goal is to design sounds with personality.