How to quickly approximate a waveform?


I am creating an open source web app for audio data discovery and easy labelling of samples by computing various audio features and the embedding them into 2d space. The idea is that audio that sounds similar should be close together in the scatter plot, while different sounds further apart. The main goal is to be able to easily label different bird species in vast audio recordings I have from nature conservation charity. Here’s the repo: (heavy work in progress).

In the web app (Python + React) I’d like to, among other things, plot the waveform. Problem here is that for 2h mono recording in 16kHz we’re talking of over 100M points to render. How to display the waveform in a smart way, the way e.g. Audacity does it? Can you perhaps point me to the relevant piece in the source code? I could not find it, but obviously it’s there.

Currently I am using a datashader, which essentially treats the data points like a raster and then bins pixels that are close together: It works, but it’s not fast enough.It’s not important to be very accurate, I am looking for an approximate method. I like how Audacity displays audio. Small twist here will be that the user should be able to zoom into the fragment, which will bring higher resolution piece.


There was a recent post from someone trying to pull bird vocalizations out of background noises and environment sounds. We weren’t a lot of help. Audacity does have Menu on the left > Spectrogram View. That posts your timeline in colored patches according to Frequency versus Time instead of normal waveform Volume versus Time.

Analyze > Spectrum View does it in volume versus frequency and ignores time.

That any help at all?


Birdsong identification software already exists … Need advice about a bio-acoustics project using Audacity - #6 by Trebor

See TrackArtist

Thanks! That’s what I was looking for.

Sure it does, but is closed source and does not address our problem. The goal is to automatically count call of a specified species. Before that’s possible, we need to be able to label hundreds of hours of recordings - and fast. Here’s a very basic demo of the functionality: Scroll down for description. Select e.g. UMAP (Uniform Manifold Approximation and Projection), unsupervised machine learning method. The reasoning here is that dozens of audio features that you can calculate per audio fragment can be in fact mapped to a much smaller (in terms of dimensionality) manifold. Manifold is like an ordinary Euclidean space locally, but non-linear on greater distances. Once you select embedding, you can quite clearly see clusters and bird calls are quite nicely separated from the noise, all in one 2d scatter plot. All in all, it’s quite different than Raven.

I am familiar with the functionality and plotting spectrogram is on my TODO list. You got me thinking though if I need a waveform in the first place, it’s not that terribly useful. Thanks!