Noob Question on Vocal Synthesis

I know how to create Vowels using Formant Wavetables, but how do you create Consonants?

The rabbit hole you are likely heading down is in Phonetics:

Formants/vowels behave very differently than consonants.

Whereas Formants are usually a sustained tone, consonants like Plosives (T/D) would only really be noticeable in a particular attack envelope.

Consonants are also referred to as Articulations and wind instruments can articulate notes with tonguing so things like da or ta would result in subtle changes for the attack envelope on a wind instrument like a saxophone.

Besides the envelopes there’s also the “breathiness” part ( what it is usually called ) which is non-vocal noise after ( or before ) some consonant envelopes.

So say “kick” slowly and listen to the breath noise you make at the end. Notice how similar and different that is to the sound after “kit”.

( Say everything very slowly, so you can listen to what is happening. )

Same with the sound made before an “s” or “sh”. Listen to the breath portion before the words “so” and “show”. Notice how the “breath” part is shorter with “ch”. Create a new word that rhymes with “show” but starts with “ch”. See how the preceding “breathiness” is shorter, and the envelope parts are different.

You make complex combinations by this putting things together. Things in the middle gets squashed. Like say “oshkosh”, and listen to the two “sh” at the ends.

There is software that will make these sounds, and sing vocal speech, but it is interesting to break things down into parts and do things by hand.

2 Likes

In the original concept of the vocoder (actually intended to encode speech for communication purposes) there were separate blocks for voiced sounds using a pitched oscillator and unvoiced sounds using a noise source with particular envelopes.

Not 100% related to completely artificial vocal sounds: The fact that most vocoders in musical use only incorporate the voiced part of speech is the major reason why it is so hard to make out the lyrics in vocoder “singing”. This is unlike a talkbox where a synth sound is fed into the player’s mouth with a tube and the player can make some fricatives and plosives in addition to filtering the sound with their mouth.

1 Like