Export waveform envelope to CSV


This post can count as a plugin request, but it’s a weird one because I need it very soon and am more than willing to help in making it happen.

I am currently attempting to take a recording of a walk through different locations and map the noise levels to a GPS track to plot on a map. To do this, I need to extract this data from the audio track once per second to match. Basically, I have found that I need a cross between this Wave Stats plugin and the “Sample data export” plugin, where each line in a CSV file is the output from the wave stats for one second of the recording.

It doesn’t seem incredibly hard to do, but with my skill level it will take quite a bit longer than reasonable. So before I attempted doing this myself, I thought I should ask here to make sure I’m not wasting my effort and that I’m on the right track. Is there an easier way to do this, or maybe even some other program that can already do this directly?

Thank you for your help.

P.S. My super dream would be to include other data about the sound such as frequency distributions and auto-correlation, but that is complicated and unnecessary.

Let’s do it the easy way then :smiley:

Yes that would be much harder so let’s leave that out for now.

I think the easiest way will be simply to convert the audio into “one second values”, then use “Sample data export” to convert those values into a csv file.
If you do this as two steps, then all you need is a small bit of code to convert the recording into “one second values”, and then apply the existing (unmodified) “Sample data export”.

The “small bit if code” can be run in the Nyquist Prompt effect (easiest and quickest) or converted into a plug-in (not much harder).

For convenience, you could modify the “control” lines in “Sample data export” so that it uses csv export as the default. (see Missing features - Audacity Support)

I’ll assume that you are working with mono recordings.

If you want peak levels each 1 second:

(setq step (truncate *sound-srate*))
(snd-avg s step step op-peak)

If you want rms levels each 1 second:

(rms s 1)

And if you don’t mind to copy the output from the Debug screen and to paste it directly in e.g. Excel, then you can use this code in the Nyquist prompt:

(setf *float-format* "%.2f")
(Format t "Time (s)tPeak (dBFS)tRms (dBFS)n")
(do* ((sr (truncate *sound-srate*)) (ti 0 (1+ ti)) 
   (pks (snd-samples (linear-to-db (snd-avg s sr sr 2)) ny:all))
   (rms (snd-samples (linear-to-db (snd-avg s sr sr 1)) ny:all)))
   ((>= ti (1- (get-duration 1))) "Done")
     (Format t "~at~at~a~%" ti (max -1000 (aref pks ti)) (max -1000 (aref rms ti))))

Which auto-correlation coefficient do you have in mind? Lag 1 (one sample delay) is usually a good indicator for different types of noise.

Thank you to both of you.

I tried running the version for pasting into excel (Which is fine by me), but the rms didn’t seem to work properly. I’m guessing the simple (rms) version is fine.

I’m making a heatmap sort of like a weather/cloud map, where the idea is to plot where the annoying noises are in the area (like annoying weather). So, frequency would be used to color the area with the frequency spectrum (I.e. Pink noise is pink because it has the same frequency spectrum as pink light.) Maybe RGB values would be the best way to represent it. For auto-correlation, I want to show the annoying pulsing that the human brain notices more than it does pure white noise. Upon some comparisons, It does seem like the first sample does show a large difference between the “annoying” sounds and the others, so that would be cool.

Thanks again for the help.

What does not work? Is it that the rms values are in dB, with -1000 as limit, rounded to 2 places?
Everything can be arranged the way you like it.
I’m a little uncertain how to implement the auto-correlation.
Do you want the average just like peak and rms, over the same block length?
For instance:

(defun autocorr (x lag blk &aux (sr (snd-srate x)))
  (let* ((y (seq (snd-const 0 0 sr (/ lag sr)) x))
     (xy (prod x y))
     (s-x2y2  (recip (s-max 1e-20 (s-abs xy)))))
   (snd-avg (mult xy s-x2y2) blk blk op-average)))
(snd-display (autocorr s 1 2205))

That’s the simple lag-1 version. White noise is around 0, pink around 0.5 and so on.

The rms values are pretty much always reported as -1000. Taking the average sounds about right.

For the frequency (If you want to try it, I really don’t mind if you don’t), maybe just return the three values (Red, Green, and Blue) by getting the rms DBs after running the frame through a band pass filter. So, for red have a filter from 20Hz to 3.25kHz, for green have a filter from 2.79kHz to 6.82kHz, and for blue have a filter from 5.31kHz to 10kHz.

Sorry I can’t be more helpful. I’ve been working on other parts of this project (this map is a side goal), and so I haven’t even been able to get out to record yet.

That isn’t right for the rms value.
(snd-avg s sr sr 1) gives the “average” sample value, not the rms value.
It is the same as (snd-avg s sr sr op-average).
For rms you need:
(snd-sqrt (snd-avg (mult s s) sr sr op-average))

(setf *float-format* "%.2f")
(Format t "Time (s)tPeak (dBFS)tRms (dBFS)n")
(do* ((sr (truncate *sound-srate*))
      (ti 0 (1+ ti))
      (pks (snd-samples (linear-to-db (snd-avg s sr sr 2)) ny:all))
      (rms (snd-samples (linear-to-db (snd-sqrt (snd-avg (mult s s) sr sr 1))) ny:all)))
   ((>= ti (1- (get-duration 1))) "Done")
     (Format t "~at~at~a~%" ti (max -1000 (aref pks ti)) (max -1000 (aref rms ti))))

(personally I usually prefer to use “OP-PEAK” and “OP-AVERAGE” rather than the equivalent “1” and “2” because I think it makes the code clearer)

the -1000 value indicates silence, normally, there would be written -#ind due to the log of zero.
But I guess that you want the values to be in another scale

For instance integer values 0 to 255 = Rms -100 dB to 0 dB.

Are those values exemplary or do they have a special meaning to them?

The sound should obviously be divided into 3 parts. this is normally accomplished by simple low or high pass filters. Those have a center frequency where the gain is -3 dB. This would mean about 3050 Hz for the first cut-off frequency. There’s always a transition band from one EQ band to the other.
However, you can change those values to get a max. coloured output.

Steve’s new version works perfectly for the rms values.

I chose the filter ranges based on this chart of the visible spectrum. Similarly useful is this chart of how each cone cell type responds to light. you can see it has a falloff and is centered.

Ahh, yes, I see, I’ve used only the copied form. Negative values produce of course nan-values.