Extracting data from aup3

hackbarer · September 6, 2025, 11:06am

Hello,

I would like to programmatically extract data from an aup3 file (specifically, the label data). This was easier in previous versions of audacity when the file format was xml-like. Now that it is binary, is there any way to parse track and label info inside, for instance, Python? Thanks, Ben

romontschun · September 6, 2025, 3:41pm

.aup3 is an “Audacity Project” file. It cannot be played. You need to export it to a “playable” format (.mp3, .aiff, .wav, etc.).

DVDdoug · September 6, 2025, 4:21pm

File → Export Other → Label Data.

steve · September 6, 2025, 5:44pm

You would need to reverse engineer Audacity’s “binary XML” format. Are you a software developer?

The way that extracting label tracks is designed to work, is to export the label tracks from within Audacity.

steve · September 6, 2025, 6:05pm

I’ve looked to see if we can get enough from the database to figure out the labels without fully reverse engineering. Unfortunately it appears that it isn’t just UTF encoded, but interleaves offsets, IDs, and other data. There doesn’t appear to be a quick or easy way to do what you want from outside of Audacity.

steve · September 6, 2025, 6:22pm

In case you are a programmer, here is the relevant part of the Audacity code:

github.com/audacity/audacity

au3/libraries/lib-project-file-io/ProjectSerializer.cpp

ed3cc385c

/**********************************************************************

   Audacity: A Digital Audio Editor
   Audacity(R) is copyright (c) 1999-2010 Audacity Team.
   License: GPL v2 or later.  See License.txt.

   ProjectSerializer.cpp

*******************************************************************//**

\class ProjectSerializer
\brief a class used to (de)serialize the project catalog

*//********************************************************************/

#include "ProjectSerializer.h"

#include <algorithm>
#include <cstdint>
#include <mutex>

This file has been truncated. show original

hackbarer · September 6, 2025, 6:59pm

Thanks Steve. I am indeed a software dev. Thanks for the link to the code; I will take a look and investigate further.

It would be amazing if there were a way to programmatically read/write aup3 files outside of audacity (as one can with aaf or reaper files), but I can understand that would be of marginal use and not a priority for audacity devs.

Best, Ben

steve · September 6, 2025, 7:34pm

Good luck with it Ben. I’ve been playing with this idea for a couple of hours, and I’ve come to the conclusion that to do this we would need to implement a full FT parser - heuristic approaches to extracting readable characters does not give us enough to be useful. Nested sub-trees have to be handled correctly, otherwise the offsets and character lengths are wrong and we get gibberish

The comment near the top of that file describes the rationale for why it has been done this way:

// Simple “binary xml” format used exclusively for project documents.
//
// It is not intended that the user view or modify the file.
//
// It IS intended that very little work be done during auto save, so numbers
// and strings are written in their native format. They will be converted
// during recovery.