How to edit sound in the byte level

cpptrialanderror · January 19, 2022, 3:53pm

First of all i am new to audacity. What i have in mind is how a simple sound like an “aaaa” (human sound) can be seen in the byte level that it is stored.
Example: i record myself saying that “aaaa”. Now i want to see the bytes that produce that and the pattern that it follows to do so. Is this possible?
If it is where can i see it and manipulate it?
After seeing the pattern it follows to produce the “aaaa” sound (sorry for saying that all the time) i would like to be able to reproduce it to make a talking robot in a way with my voice as a template.
I hope what i said makes sense. If it does point me to the right direction please, if not i will try to clarify further. Thank you

Trebor · January 19, 2022, 4:48pm

Voice cloning is possible, but not via Audacity … https://youtu.be/VnFC-s2nOtI?t=45

steve · January 19, 2022, 5:30pm

The bytes won’t tell you much.
The digital audio is a sequence of samples that follow one after another at the sample rate. For example, if the sample rate is 44100 Hz, then the samples are space at intervals of 1/44100th of a second. You can see the samples represented in an audio track by zooming in very close (see: Zooming Overview - Audacity Manual)

By default, Audacity tracks are “32-bit float”, which means that each sample is represented as a 32-bit (4 bytes) floating point number.

More useful might be the sample values (converted to decimal), which you can get with “Sample Data Export”. See: Sample Data Export - Audacity Manual
Even then it may not be very useful as it is hard to determine audio qualities such as pitch and timbre from a sequence of numbers.

cpptrialanderror · January 19, 2022, 5:43pm

Listen up my good sir. My voice is like a sweet sweet melody to anyone who hears it but when i record it, something utterly mysterious happens and i sound like a baby seal with a sore throat, which is preposterous. We must fix this immediately!
Now joking aside, what is preventing audacity from doing this? It loads the audio file, it has the interface to zoom to the individual samples as far as i can tell.
Can we make it to open these samples up to an interface where it will have the +1 -1 possible values that the bytes have or am i saying stupid stuff?

Trebor · January 20, 2022, 1:38pm

If you just want a robotic version of your voice,
bit-crushing is the generic way of doing that …

DVDdoug · January 25, 2022, 10:42pm

Can we make it to open these samples up to an interface where it will have the +1 -1 possible values that the bytes have or am i saying stupid stuff?

The 1st problem is 44,100 samples per second (or whatever the sample rate is). It’s too much data to comprehend.

That said, you can export audio as a numerical text file with Tools → Sample Data Export. You can look at the numbers with Windows Notepad, and you can change the values and re-import them. You are limited to 1 million samples, so that’s about 22 seconds of CD-quality audio… And a rather unmanageable text file!

If you record yourself saying “aaaa” twice, the data will be completely different. First because of normal human analog variations, and secondly because the samples will line-up at different points in the waveform.* Even if you digitize the same recording twice (elimination all of the analog variations, except maybe for noise) you’ll be sampling different points on the waveform.

Audio is “complicated”… Real world sounds contain many simultaneous frequencies. The harmonics & overtones are what makes a guitar sound different from a trumpet when they are playing the same note and it’s what makes two different singers sound different when they are singing the same notes and the same song. If you look at a pure sine wave you can easily figure-out (or approximate) the frequency by looking at the time for one cycle (the “period”). But since normal audio contains simultaneous frequencies, and these frequencies change moment-to-moment, it’s difficult to get meaningful frequency information.

[u]Plot Spectrum[/u] will show you the frequency content for a selected section of audio.

…in the byte level

Note that one byte is 8-bits so with 16-bit audio every sample is 2-bytes. Audacity is using 32-bit floating-point (4 bytes per sample) for internal processing and it’s just not useful to look at the actual bytes, especially for floating-point numbers.

\

Take a look at [u]Digital Audio Fundamentals[/u].

myjond · August 3, 2022, 10:45am

Thanks Trebor, I’ve been looking for this solution for so long