I would like to programmatically extract data from an aup3 file (specifically, the label data). This was easier in previous versions of audacity when the file format was xml-like. Now that it is binary, is there any way to parse track and label info inside, for instance, Python? Thanks, Ben
I’ve looked to see if we can get enough from the database to figure out the labels without fully reverse engineering. Unfortunately it appears that it isn’t just UTF encoded, but interleaves offsets, IDs, and other data. There doesn’t appear to be a quick or easy way to do what you want from outside of Audacity.
Thanks Steve. I am indeed a software dev. Thanks for the link to the code; I will take a look and investigate further.
It would be amazing if there were a way to programmatically read/write aup3 files outside of audacity (as one can with aaf or reaper files), but I can understand that would be of marginal use and not a priority for audacity devs.
Good luck with it Ben. I’ve been playing with this idea for a couple of hours, and I’ve come to the conclusion that to do this we would need to implement a full FT parser - heuristic approaches to extracting readable characters does not give us enough to be useful. Nested sub-trees have to be handled correctly, otherwise the offsets and character lengths are wrong and we get gibberish
The comment near the top of that file describes the rationale for why it has been done this way:
// Simple “binary xml” format used exclusively for project documents.
//
// It is not intended that the user view or modify the file.
//
// It IS intended that very little work be done during auto save, so numbers
// and strings are written in their native format. They will be converted
// during recovery.