So I am pretty new to audio processing. However I have been playing with Audacity for a few days now and its a pretty amazing software. I started out with a heavily distorted and overlaid track and managed to extract the audio out from it. Now I cannot figure out how to adjust it to get it the sound like normal speech.
Full disclosure this is part of a cryptography problem/competition, but I have been told it is not against the rules to seek advice from experts. For that reason however I am not going to attach the full clip I have just a short snippet of it. Just incase.
Also this started out as a picture filled with pixels that I pulled the frequencies from - converted into several text files - imported as raw data into audacity in stereo for each file combined into one stereo file and stripped away and amplified and stripped away until I got what I have which I think is mostly vocals. After combining them into one track.
I have tried Paulstretch to slow down and played with different pitches but I think the distortion may be just too much.
There may be some other “channels” hidden in the picture that I may not have found yet but wanted to come here first to see if anybody had any specific tricks I am missing.
There is still some residual noise in there - it sounded like crickets when sped up or slowed down.
Thought about inverse FFT but I don’t have the radian angles needed.
Hey @Trebor that could definitely be it! What exactly did you do to the track in order to squeeze this extra out of it? Its just difficult as I do not want to share the entire track. I am really trying to solve this mostly on my own with relatively little input.
@kozikowski no I actually started with a text file of frequencies that I imported as raw data then gradually worked my way down to what I have now. I am just trying to clear up the speech that I now have leftover. Its original length was something like 0.072 seconds and it was super high pitched and bright. I used Paulstretch and just experimented a bunch with brute force, pitch changes and voice isolation.
I did manage to isolate it a little more within the 2000hz band last night that removed most of the other noise.
To give you guys a little more context I have a list of 255 frequencies or numbers that I found a start i.e. 0 and and end hidden within. They correspond to the RED value pulled from a picture. There are also green and blue values i.e. the three separate channels hidden within the picture each of them have 256 values.
There could be more channels but I think they may be red herrings - this guy is tricky!
The first value is huge as it represents (0) in the RGB scale and there are a lot of black pixels or RGB values that have a zero in them but the others rise and fall in a sequence. Then the last number 255 also does not seem to fit.
This is an example of the first and last sets of numbers for the red frequency I am only using the second column as my frequency or raw data set FYI:
I think there is some significance to the first and last numbers that would clean up the audio considerably. So any ideas or input there would be amazing!
I have all but been told I am on the right track. So I need to continue on but again just need a little assistance as while I am a mechanical engineer signals sound processing is not something I have had a lot of experience with.
Would anybody have insight into how audacity converts a text file into frequencies what the relationship is between that raw data and what comes in after import?
Thanks for any assistance, ideas, comments in advance!