Meaning of asymmetrical waveform for scientific paper

I am writing a scientific paper on the ability of the human voice to outperform current AI voice generators in the expression of emotion. In one section I focus on a single word, and am including a picture of the Audacity waveform of that word to show, very approximately, some of the detail in a spoken word, as here:

The asymetry between top and bottom if also interesting, but I do not understand what it means. I don’t want to remove it, I like it, it supports the point that spoken voice has great detail. But I don’t want to misrepresent either. What does this asymmetry mean? Is it innate to the voice, or merely an artificial artifact of processing?

AI speech-to-speech retains expression, see … https://youtu.be/0UVppC0Ihjk?&t=72
i.e. if you can convey the expression you want it to have it can do it.

1 Like

The asymmetry between top and bottom doesn’t mean anything. A change in the position of the microphone can make the symmetry change dramatically, due to how the distance between sound source and microphone change the phase of different frequencies.

A better representation of how we hear is the spectrogram view.

1 Like

Yes, speech to speech can retain expression. However my paper is on the difference between my recorded voice and the AI generated voice when the AI generator is given the same written words as input, producing much less emotional expression (of course).

So the key text from that link seems to be:

This asymmetry is due mainly to two things, the first being the relative phase relationships between the fundamental and different harmonic components in a harmonically complex signal. In combining different frequency signals with differing phase relationships, the result is often a distinctly asymmetrical waveform, and that waveform asymmetry often changes and evolves over time, too. That’s just what happens when complex related signals are superimposed.

The other element involved in this is that many acoustic sources inherently have a ‘positive air pressure bias’ because of the way the sound is generated. To talk or sing, we have to breathe out, and to play a trumpet, we have to blow air through the tubing. So, in these examples, there is inherently more energy available for the compression side of the sound wave than there is for the rarefaction side, and that can also contribute to an asymmetrical waveform.

Is there an “explain like I am five” description of what this means? If not, if I need months of audio engineer training to comprehend this, I understand, just asking.

If you instruct the AI how to say the written text it will add that emotion,
(otherwise it cannot tell how you want it to be read: it’s not telepathic)

asymmetry is normal, it does not effect how the audio sounds.
Its presence is not unique to real human speech.

1 Like

You can’t hear this asymmetry so it’s unrelated to expression or emotion.

Some more unhelpful information for you… Electronics (like an amplifier or preamp, etc.) sometimes inverts the polarity (which will invert the asymmetry). There is about a 50/50 chance of it being inverted somewhere between the recording & playback process.

1 Like

Sure, I understand asymmetry is normal, but is there some way to explain what it means? In other words, why is the bottom portion different than the top? Is there way to explain what causes that in terms that a non-audio expert could understand?

A sound wave is a (small) change in atmospheric pressure. The positive part of the wave is above normal atmospheric pressure and the negative part is below normal.

You eardrum has equal pressure on both sides so you don’t feel or hear the constant pressure but the quick changes are heard as sound. (Or you can feel the slow change when you go up & down in an airplane until the pressure equalizes.)

If you’ve ever seen a woofer moving, it’s compressing the air when it moves-out and decompressing when it moves-in. If it moves-out and stays-out, it can’t hold compression because the room isn’t sealed and it can’t hold pressure but it can make a temporary change as a “wave”.

If you connect a flashlight battery to a speaker you’ll hear a “click” when the speaker moves in or out (depending on polarity) and another “click” when the battery is disconnected and speaker moves back to its resting position.

It’s something like a wave in the water… The wave goes both above and below the normal-average water level.

When you hit a drum, the air on the opposite side of the “hit” is compressed and the air on the top is de-compressed for a fraction of a second until the drum head bounces back in the opposite direction and you get compression on top and decompression on the bottom. The hit usually creates asymmetry because the bounce-back, and each successive vibration is attenuated.

When you pick a guitar string, the first movement is the obviously strongest and this will also create asymmetry.

Recording engineers sometimes put a microphone on both sides of a drum (or in front and in back of a guitar speaker) and then the signals of one of the mics is inverted so they are in-phase when mixed.

…I don’t understand the physiology of what makes some voices asymmetric but there’s no reason that vocal chords have to be perfectly symmetrical.

1 Like

Awesome, now i understand. Thanks so much.

The asymmetry in the recording is correctable with phase rotation …

but the only advantage is it gains more headroom for those competing in the loudness war.

1 Like