AUP File format information

I searched high and low for the AUP file format information, but other than the DTD I was not able to get the info I needed. I hope my question here will get answered:

I am writing a tool for working on AUP files from outside of Audacity. Can someone tell me the details of the XML file format.

If that is not possible, at least the following snippet from an actual AUP file (to begin with)

<waveblock start="6107281">
					<simpleblockfile filename="" len="262144" min="-1" max="1" rms="0.192749"/>

I need to know what are the attributes such as start, min, max, rms, etc

Thank you

I don’t know the full specification, but here’s some of it… that’s the data file that is being referenced
start=“6107281” is the start position of theis data file (in samples)
len=“262144” is the length in samples
min=“-1” max=“1” rms=“0.192749” minimum, maximum and rms values of the samples in the file.

Thank you for your reply. I may be asking some dumb questions here, so pls forgive

What is the duration of one sample?

I guess “rms” is the “root mean square” value. I am still learning this (maybe slowly). How does this affect, say the volume of that specific au file?
What does “max” and “min” refer to? Is it to do with the volume? and the rms is a value within that volume range? (Wild guess)

The application I plan to develop is to use the information in the AUP file to externally mix the AU files using ecasound on Linux boxes.

How do I stitch up each of those au files in the AUP xml file and bring out the final mix using ecasound?


<wavetrack name="Narration" channel="2" linked="0" mute="0" solo="0" height="154" minimized="0" rate="44100" gain="1" pan="0">
		<waveclip offset="18.55777090">
			<sequence maxsamples="524288" sampleformat="131073" numsamples="780681">
				<waveblock start="0">
					<simpleblockfile filename="" len="337046" min="-0.778839" max="0.725403" rms="0.090997"/>
				<waveblock start="337046">
					<simpleblockfile filename="" len="443635" min="-0.711731" max="0.792633" rms="0.114152"/>
			<envelope numpoints="1">
				<controlpoint t="17.702494331066" val="1.000000000000"/>

Disclaimer: this is only what I can infer from the above snippet - I haven’t been able to find a specification of the Audacity XML format.

Wavetrack: a track in Audacity
In this case, the track is named “Narration”, [don’t know what ‘channel=“0”’ and ‘linked=“0”’ mean], it is not muted or soloed, its height on the screen is 154 pixels, it is not minimized, its sample rate is 44100 samples per second, its gain is set to 1, and its pan is set to 0 (panned to the middle).

Waveclip: a clip within a track. A track may contain many clips, but will always contain at least one.
In this case this waveclip is offset 18.55777090 seconds from the start of the track

Sequence: begins a series of waveblocks (?)
Maxsamples is (?) the maximum number of samples a waveblock (?) can contain. Don’t know what the ‘sampleformat’ codes are, but this one means 16-bit integer PCM. Numsamples is the number of samples in the sequence.

Waveblock: a block of audio - seems to usually (or always?) contain one simpleblockfile or aliasblockfile
Start is the start time (in samples) from the start of the sequence.

Simpleblockfile: finally we’re down to specifying the .au files contained in the track
Filename is simply the filename of the simpleblockfile. Note that the path is incomplete. If this Narration.aup file was in a folder named “NarrationFolder” then the complete path for this file (remember, this is an example!) is “NarrationFolder/Narration_data/e00/d00/”. Also note that these .au files (as per my understanding) have a header containing data for drawing the waveform on the screen.
Len is the length of the audio in samples. The length of one sample is the inverse of the sample rate of the wavetrack.
Min is the minimum peak value of the audio in the file, max is the maximum peak value of the audio in the file. rms is indeed the root-mean-square value of the audio in the file, probably the maximum.

Envelope: if the track has an active volume envelope the control points for the envelope will be specified here.

And that’s what I can surmise. Perhaps someone who really knows can correct or confirm my assumptions.

Good luck with your project.

– Bill

That depends on the sample rate.
The “sample” itself has no duration - it’s just a number.
When audio is converted to digital data, it is “sampled” (measured) several thousand times each second. The number of samples that are taken each second is called the “sample rate”.
For CD quality sound, the sample rate is 44100 samples per second.

If you record at 44100 samples per second (also called 44100Hz or 44.1 kHz) then to play back the data so as to reconstruct the sound, you will need to convert the samples back at the same rate - so the time between one sample and the next will be 1/44100 or 0.000022676 seconds.

Each sample is a measurement of the amplitude of the (analogue) sound signal, measured on a scale of +1.0 to -1.0.
Each sample will lie within this range. The min and max refer to the lowest and highest values for any sample in the file.
The rms is the “root mean square” (average) value of all the samples in the file.

Play it in Audacity :wink:

<<<[don’t know what ‘channel=“0”’ and ‘linked=“0”’ mean]>>>

Channel 0 in English is “Left.” Did you note that the code is for Left on top and Right on the bottom?

Here’s that really, really, really simple AUP file I produced and then captured as a graphic. I put extra spaces and carriage returns in to make it clearer.

I opened up one music file (the famous piano2.wav) saved the project, and then went for a cuppa. So that’s as simple a project as you can get, and it’s good for getting a handle on what the basic parts do. The sample is in 48000, 16-bit and it’s full-on stereo.


RMS is Root Mean Square. That, stripped of the Engineering is how loud the signal is – in a fuzzy way. The sound level meters will not do that because they follow the digital peak values of the sound. Notoriously unreliable.
This is the one and only place where the US type VU meters shone. They would measure, roughly again, RMS value of the show and could be relied upon to gauge loudness.

RMS is the energy of the waveform, not the peak value. In the US, the peak value of the power coming out of the wall socket is just over 300 volts, but it’s the RMS value (horsepower) of 120 volts that actually turns motors and makes the lifts (elevators) go up and down.

I’m surprised that value is in there and it leads me to think that’s not what RMS really means in this case.


Yes, that occurred to me too. Just why do you need to play the mix in ecasound?

– Bill

Thank you all for your detailed replies.

I am working on a project (it’s an experiment actually) where I want to put the AUP file and its data folder on a server, and then I plan to make an app (calling ecasound) on the server which parses the AUP file, use the .AU files, etc in order to produce the final mixed file. Such an app will not play the final mixed version on the server. It will simply piece together the final mixed version and make the final audio file. Since Audacity is a GUI program, it would not be suitable for the server machine.

Hmmm…the obvious solution to this could be this: If Audacity can also be run as a command line version, just to produce the final mixed file, that would be neat. It would remove my need to write a separate ecasound script and I need not even get into the AUP file-format

However, when I checked this is what the Audacity man page recommends:

Audacity is primarily an interactive, graphical editor, not a batch-
processing tool. Whilst there is a basic batch processing tool it is
experimental and incomplete. If you need to batch-process audio or do
simple edits from the command line, using sox or ecasound driven by a
bash script will be much more powerful than audacity.

I browsed some more, and I got this page on the Audacity wiki

If anyone can shed some light on that, I would be grateful. Maybe that may do the trick for me. I’ll post my discoveries here.

Thanks for all inputs

I can’t shed any light on the scripting module, other than to say that it is very new and very experimental. Unless you are something of an expert with Pearl and have a good grasp of the Audacity code base, it would probably be best to leave it alone for now.

I’d agree with that.
One problem that you may run into if you try to work from the Audacity .AU files is that they are “Audacity data files”. They are not just “audio files”.

You may have noticed that if you import a large file into Audacity, it takes a certain amount of time for the waveform display to be calculated and drawn. However, if you open an Audacity project that contains a large audio track, the waveform is displayed almost instantly. This is because the waveform does not need to be calculated. The waveform data has already been calculated and is stored in some of the .AU data files. So the .AU files do not just contain audio data - some of them contain data for the waveform display.

It took a bit of tracking down, but sure enough rms appears to be just that, the root mean square value. But there is one thing that confuses the issue which is that rms values are measured against a window function, so for small blocks the rms value will be smaller than you may expect. If you generate a sine tone of amplitude 1.0 with a duration of over 300000 samples (more than 6 seconds at 44.1kHz sample rate), then you will see the expected rms value in the region of 0.707 for most of the .au files, but if there is a small block at the end then the rms value will be lower. I’ve no idea what Audacity uses this for.

You may have noticed that if you import a large file into Audacity, it takes a certain amount of time for the waveform display to be calculated and drawn. However, if you open an Audacity project that contains a large audio track, the waveform is displayed almost instantly. This is because the waveform does not need to be calculated. The waveform data has already been calculated and is stored in some of the .AU data files. So the .AU files do not just contain audio data - some of them contain data for the waveform display.

I hope that is not the case (i.e. not all AU files are audio). If what you say is true, then my work will be a lot more difficult :slight_smile: because I may have to check if the AU file was audio or not.

I have an alternate explanation why Audacity loads the waveform faster: It makes a lot of very small files. A very small file possibly is easy to read for waveform. So it quickly builds the waveform and depending on the zoom factor of the view Audacity need not examine all of them (my hypothesis… could be wrong). Also, it may be quickly reading one AU file and rendering that portion of the view, then moving to the next, etc… making the process seem quite fast. Maybe when we load a large AU file into a new project, Audacity is taking a long time due to other activities (for e.g. cutting up the imported file into parts? ) other than waveform display.

When I tried a few of the AU files, they were indeed truly audio AU files. But somewhere else I saw reference to AUF files. I don’t know what those are.

Thanks for keeping the investigation on.

I’m pretty sure that every AU file has a header containing precomputed waveform data. My testing is not exhaustive but importing 10 AU files showed that all 10 had this data at the start. The length of this header appears to be proportional to the amount of audio the file contains. It looks like you have to take the file apart to find the header and strip it out before passing the audio on to be processed. One thing we are told is this: the audio data in the AU files is 32-bit uncompressed floating point.

AUF files are aliasblockfiles. They are used when uncompressed audio files are imported into a project but not copied into the _data folder. They point to bits within the original audio file, and also contain the waveform data.

– Bill

I seem to remember that in old versions of Audacity the waveform data was just in a few of the AU files (for short recordings this would typically be the first two files). Looking at the AU files from a current version of Audacity. it does indeed look like there’s a bit of waveform data at the top of each file.

May we as why? I mean, why use Audacity at all in this? Why not just use WAV files?

Thank you for all the help. If the AU files contain both audio and waveform information, I guess they must be using some location in the AU file for storing the waveform because when I play the AU file in Linux it seems to sound just fine. I guess such AU files may not interfere with the final mix (Again, I hope I am right here)

The reason why I would rather not use the final WAV file from the local computer and upload the final mix is because I need to process the tracks in several different ways on one central server that runs my own application. I am addressing an unusual need in the way the tracks are to be used: one that requires an external app to decide which of the tracks are to be used and which not for the final mix. And that ‘big’ reason is: asynchronous, collaborative generation of sounds. The individual tracks in the final mix may be produced by different people on different Audacity projects on their local machines, where each project may share some common tracks and may also contain some tracks which are individualistic to each user participating in such an asynchronous collaboration.

So why not export the individual tracks from Audacity as WAV files (or perhaps RAW headerless would be even better), then process those files with your application?

Sorry to bump an old post, but I had relevant information that was best placed here rather than creating a new thread

I understand the original posters problem - I have a number of users that just don’t have the skills to process the files from Audacity and complete the other steps required for our workflow, so it was easier just to write a script to automate the process.

The way I’ve been doing it is to process the Project file, because it contains a lot of useful information

What I’ve done is (in pseudocode) -

For each wavetrack:
    For each sequence in the wavetrack:
        For each simpleblockfile:
            get filename from sompleblockfile XML element
            if this is the first simpleblock file:
                read the first 24 bytes                 // This is the header
            get the length of the audio data      // 'len' attribute * 4 (32 bits)
            read from (size of file in bytes - length of audio data in bytes)

This gives you a AU file per channel, per wavetrack. I then mux all of these tracks using sox

There is a lot of assumptions made in this code (particularly the encoding being 32 bit - this can be refined by reading the encoding value in the header). The other important thing is to make sure the data you read out is in BIG ENDIAN FORMAT. This caused be some headaches. The best place to start looking is the Wikipedia entry on the file format

This worked for my limited number of cases on a very simple Audacity project, so YMMV. If you want to supply the Project Files and Audio data I don’t mind trying to refine it to cater for more use cases

Thanks! This looks useful. I’ll post here on the developments on my project