Character encoding of project table

Hi Everyone!

I use Audacity to annotate music, then use the file format to analyze the labels. The last time I used Audacity, it was before version 3.0. With the new file format, I need to update my process to take the sqlite3 file format into account. I assume that the project table has what I need. I’ve been attempting to parse the string because I’m expecting it to look like the xml file format of Audacity pre-3.0.

I’ve been executing:

$ sqlite3 QuincyJones-CominHomeBaby.aup3 "select hex( doc ) from project limit 1;" | xxd -r -p | iconv -f utf-8 -t ascii//TRANSLIT
iconv: illegal input sequence at position 241

I assume that the output is supposed to be closer to:

<?xml version="1.0" standalone="no" ?>
<!DOCTYPE project PUBLIC "-//audacityproject-1.3.0//DTD//EN" "" >

I’d prefer to not have to clone the Audacity repository and use it to write my own document extractor.

Does anyone have any suggestions on how I can decode the string properly?

Alternatively, does anyone have any suggestions on alternative mechanisms for getting a list of label tracks and the labels in them?


Can you not just export the labels from the project?

Good question. When I export the labels, it does not export the names of label tracks. And I need a programatic solution to automatically export the data without any user interaction.

I’m not sure what you mean by “without any user interaction”, or why.

I think I posted a plug-in somewhere that will export labels from the first label track only. Would that fit your use case if I can find it?

Another good question. I have a makefile that runs through all of my *.aup files, and converts them into something readily readable by my code. Essentially:

SCHEDULES=$(subst .aup,.schedule,$(wildcard music/*.aup))

.PHONY: all

music/%.schedule: music/%.aup java Makefile
        java AudToSchedule $< > $@

I’m looking for a similar mechanism that can take an aup3 file, and output a list label tracks (with names), and labels for each label track.

It seems to me that you have two options:

  1. Use Audacity’s scripting API ( Scripting commands may be used via Audacity’s built-in scripting language “Nyquist”, or by “Macros”, or from any external language that supports “named pipes”.
  2. Devise your own method.

    Using option 1, labels can be extracted with:
"GetInfo: Type=Labels"

I can’t help you with the makefiles part of this, but since you seem quite technically capable, I have this to offer:

You can get to your project table if you care to compile Audacity (see Near line 1857 in ProjectFileIO, put a breakpoint immediately AFTER the line: "project = ProjectSerializer::Decode(buffer); ", then run to that point.

Hover over “project” in that now previous line, then (hover/click) on the open triangle in the pop-up window. This will bring up a 2nd pop-up window.
Click on the magnifying glass in the first line of that 2nd pop-up window. Click in the Text Visualizer window which then appears, then Ctrl+A, Ctrl+C, open Notepad, Ctrl+V and Bob’s your uncle :exclamation:

I was able to hack at it and get something that works for me. I wrote a small ProjectSerializerDecode test harness that takes in the binary xml stored in the sqlite3/aup3 files, then decode it and print out the string in ascii.

#include <FileNames.h>
#include <ProjectSerializer.h>
#include <xml/XMLFileReader.h>

#include <cstring>
#include <iostream>
#include <stdexcept>

TranslatableString AudacityMessageBoxCaptionStr() { }
void ShowErrorDialog( wxWindow *                    parent,
                      const TranslatableString &    dlogTitle,
                      const TranslatableString &    message,
                      const wxString &              helpPage,
                      bool                          close,
                      const wxString &              log ) { }
void ShowExceptionDialog( wxWindow *                    parent,
                          const TranslatableString &    dlogTitle,
                          const TranslatableString &    message,
                          const wxString &              helpPage,
                          bool                          close,
                          const wxString &              log ) { }
wxString TranslatableString::DoGetContext( const Formatter & formatter ) { }
wxString TranslatableString::DoSubstitute( const Formatter &    formatter,
                                           const wxString &     format,
                                           const wxString &     context,
                                           bool                 debug ) { }
wxString FileNames::AbbreviatePath( const wxFileName & fileName ) { return {}; }

int main()
    const std::size_t INIT_BUFFER_SIZE = 1024;

        std::freopen( nullptr, "rb", stdin );

        if( std::ferror( stdin ) )
            throw std::runtime_error( std::strerror( errno ) );

        std::size_t                             len;
        std::array< char, INIT_BUFFER_SIZE >    buf;
        wxMemoryBuffer                          input;

        while( (len = std::fread(, sizeof( buf[0] ), buf.size(), stdin) ) > 0 )
            if( std::ferror( stdin ) && !std::feof( stdin ) )
                throw std::runtime_error( std::strerror( errno ) );

            input.AppendData( &buf.front(), len );

        wxString project = ProjectSerializer::Decode( input );
        std::cout << project;
    catch( std::exception const & e )
        std::cerr << e.what() << "\n";
        return EXIT_FAILURE;

    return EXIT_SUCCESS;

set( TARGET ProjectSerializerDecode )
set( TARGET_ROOT ${topdir}/ProjectSerializerDecode )

message( STATUS "========= Configuraing ${TARGET} =========" )

add_executable( ${TARGET}
                ProjectSerializerDecode.cpp )

target_include_directories( ${TARGET} PRIVATE
    ${topdir}/libraries/lib-utility/ )

target_compile_definitions( ${TARGET} PUBLIC STRINGS_API= AUDACITY_DLL_API= )
target_link_libraries( ${TARGET} PUBLIC wxwidgets::wxwidgets )
target_compile_options( ${TARGET} PRIVATE -DPROHIBITED==delete )

djshaw@culsu:~/svn/misc/wizardsInWinter$ diff audacity-Audacity-3.0.5/CMakeLists.txt djshaw-Audacity-3.0.5/CMakeLists.txt
> add_subdirectory( "ProjectSerializerDecode" )

I can compile with

$ mkdir -p build && cd build && cmake -G "Unix Makefiles" -Daudacity_use_ffmpeg=loaded .. && make ProjectSerializerDecode

And I run with:

$ sqlite3 music/QuincyJones-CominHomeBaby.aup3 "SELECT HEX( dict ) || HEX( doc ) FROM project WHERE id = 1 LIMIT 1;" | xxd -p -r | ./ProjectSerializerDecode
<?xml version="1.0" standalone="no" ?>
<!DOCTYPE project PUBLIC "-//audacityproject-1.3.0//DTD//EN" "" >
<project xmlns="" version="1.3.0" audacityversion="3.0.5" sel0="84.7168713477" sel1="84.7168713477" selLow="1309.4169921875" selHigh="1309.4169921875" vpos="0" h="120.0388672121" zoom="26.8246450069" rate="44100.0" snapto="off" selectionformat="hh:mm:ss + milliseconds" frequencyformat="seconds" bandwidthformat="seconds">

I’ll be piping the output of ProjectSerializerDecode into further programs for analysis.

Any feedback and criticisms on the code and CMake changes are welcome.

Congratulations - Thanks for the report back and for sharing. I’m glad to hear that is working for you. :smiley:

The end result of my project is This song was one of projects last Christmas. We’re closing in on November, so it was time to dust off the code and make sure it all still works :slight_smile:

I use Audacity to view the spectrogram of the music, and I create labels representing which channels to turn on and off. I find this process lets me get very accurate sequences where other pre-existing software doesn’t have such fine grain control (often restricting on-and-off commands to be eighth notes on a time signature).