Distorted Voice Audio Recovery

So I am pretty new to audio processing. However I have been playing with Audacity for a few days now and its a pretty amazing software. I started out with a heavily distorted and overlaid track and managed to extract the audio out from it. Now I cannot figure out how to adjust it to get it the sound like normal speech.

Full disclosure this is part of a cryptography problem/competition, but I have been told it is not against the rules to seek advice from experts. For that reason however I am not going to attach the full clip I have just a short snippet of it. Just incase.

Also this started out as a picture filled with pixels that I pulled the frequencies from - converted into several text files - imported as raw data into audacity in stereo for each file combined into one stereo file and stripped away and amplified and stripped away until I got what I have which I think is mostly vocals. After combining them into one track.

I have tried Paulstretch to slow down and played with different pitches but I think the distortion may be just too much.

There may be some other “channels” hidden in the picture that I may not have found yet but wanted to come here first to see if anybody had any specific tricks I am missing.

There is still some residual noise in there - it sounded like crickets when sped up or slowed down.

Thought about inverse FFT but I don’t have the radian angles needed.



Sounds like a man saying “the number is ten … the sequen[ce]”.

But given the convoluted processing you’ve described, which could scramble the data,
the man’s voice I hear could just be audio pareidolia.

I can’t build in my head what you’re doing. Wasn’t the original a sound track?

Most “Help Me Clean Up” jobs fail. This isn’t Audacity’s strong suit.


Hey @Trebor that could definitely be it! What exactly did you do to the track in order to squeeze this extra out of it? Its just difficult as I do not want to share the entire track. I am really trying to solve this mostly on my own with relatively little input.

@kozikowski no I actually started with a text file of frequencies that I imported as raw data then gradually worked my way down to what I have now. I am just trying to clear up the speech that I now have leftover. Its original length was something like 0.072 seconds and it was super high pitched and bright. I used Paulstretch and just experimented a bunch with brute force, pitch changes and voice isolation.

I did manage to isolate it a little more within the 2000hz band last night that removed most of the other noise.

To give you guys a little more context I have a list of 255 frequencies or numbers that I found a start i.e. 0 and and end hidden within. They correspond to the RED value pulled from a picture. There are also green and blue values i.e. the three separate channels hidden within the picture each of them have 256 values.

There could be more channels but I think they may be red herrings - this guy is tricky!

The first value is huge as it represents (0) in the RGB scale and there are a lot of black pixels or RGB values that have a zero in them but the others rise and fall in a sequence. Then the last number 255 also does not seem to fit.

This is an example of the first and last sets of numbers for the red frequency I am only using the second column as my frequency or raw data set FYI:

0 270469
1 8330
2 3414
3 8006
4 8228
5 3377
6 7685
7 7898
8 7970
9 3500
10 7459
11 7499
12 3071
13 7177
14 7089

241 1436
242 1355
243 1271
244 1323
245 1299
246 1205
247 1268
248 1269
249 1165
250 1042
251 1195
252 1124
253 983
254 1055
255 17163

I think there is some significance to the first and last numbers that would clean up the audio considerably. So any ideas or input there would be amazing!

I have all but been told I am on the right track. So I need to continue on but again just need a little assistance as while I am a mechanical engineer signals sound processing is not something I have had a lot of experience with.

Would anybody have insight into how audacity converts a text file into frequencies what the relationship is between that raw data and what comes in after import?

Thanks for any assistance, ideas, comments in advance!

Changing the pitch by ~ +15% … Change Pitch - Audacity Manual

That type of data-bending usually just produces cacophony.

This topic was automatically closed after 30 days. New replies are no longer allowed.