Split recordings

I am doing some research for a project and was wondering if Audacity is suitable…counldn’t find it directly…what i have to look in to is:

Recording radio commercials, these commercials have a start tune, a split (same) tune between the commercials and ad separate end tune…in the ideal world i would like to record and split them automaticly from streaming audio…But what would be great to start with is automatic splitting of these commercials.

Is Audacity capable of doing that? Or is there an other way of doing that?

Any suggestion is welcome

Kind regards,

Audacity can record any audio that is sent to it. You can then edit the recording any way you like but Audacity cannot edit it automatically.

What do you want exactly to do?
It is possible to search for a specific sound within a track by convolving, i.e. if the searched clip (not longer than about 2 seconds) is found, the amplitude will be extremely high at the matching point. These points can afterwards be tagged with labels.
The plug-in would work in two steps:
First you store the sound that has to be found.
Than you can select the whole track to be searched and the appeareances of the sound will be labeled.
The accuracy is somewhat restricted and it might be necessary to set a proper bias/threshold.
Furthermore, the execution speed is very slow when the clip is long.


Thank you very much fo the replies

Robert, what i am looking into is a way to save commercials as seperate files…they always start with a same beginning tune, are separated by a same kind of tune and the end is also marked with a specific end tune.
Since i am more into building websites the whole more advanced audio thing is kind of new to me…i’ll check the convolving plugiun
What i am able to do so far is record on specific times and save the audio…but then i still have to split the audio files seperatelly

Thanks for the input so far!

All the best,

Hi Robert,

I am kindof confused how the convolving works…could you point me out a little?

Kind regards,

So, we are looking for three jingles in total, am I right or do the tunes substantially change inbetween songs and from recording to recording?
How long are those tunes?
How it could work:

  • You have to build a “data base” with the tipical tunes or portions of it.
  • These samples should be put in the first track.
  • The Track(s) to be analysed is/are underneath and all tracks are selected prior to the plug-in call.
  • Returned is a label track for each analysed track and the labels indicate where those tunes are present.

There are still some uncertainties though.
Unfortunately, the separating tunes can’t be eliminated automatically, in case you wanted that.
The algorithm is rather correlation than convolution. The sound we are searching for is firstly reversed. We move it than along the secondary track by mean of convolution, where each sample is multiplied with the other ones and then added up. Where the sample clip is exactly matching the same sample in the second track, the sum will be highest.
A numerical Example might illustrate this. Put the following code in the Nyquist prompt and press the DEBUG button.

(setf sought (snd-from-array 0 44100 (vector 2 4 3)))
(setf examined (snd-from-array 0 44100 (vector 0 5 3 0 2 5 0 6 3 4 2 1 3 3 0)))
(print "The values 3 4 2 are reversed to:")
(print (snd-samples sought ny:all))
(print "And is searched for in:")
(print (snd-samples examined ny:all))
(print "And the resulting sequence is:")
(print (snd-samples (convolve examined sought ) ny:all))

As you can see, the highest peak is in the middle of the sequence we were looking for. I hope you’ve got the idea.
For longer sequences, the difference will increase as well as the computing time.
The actual problem is to differentiate between several peaks and the rest of the audio, that’s where the threshold comes into action.
I presume that the amount of commercials varies from recording to recording.

I’ve assembled a sound example that shows what the signal is like after the auto-corellation. It also shows the difficulties that have to be overcome when extracting the correct label positions.
I’ve only taken the averaged peaks (resulting in a sample rate of 4410 Hz) in order to speed up the process. The result has afterwards been multiplied with the original sound, i.e. been used as an artificial envelope.