Character encoding of project table

Hi Everyone!

I use Audacity to annotate music, then use the file format to analyze the labels. The last time I used Audacity, it was before version 3.0. With the new file format, I need to update my process to take the sqlite3 file format into account. I assume that the project table has what I need. I’ve been attempting to parse the string because I’m expecting it to look like the xml file format of Audacity pre-3.0.

I’ve been executing:

$ sqlite3 QuincyJones-CominHomeBaby.aup3 "select hex( doc ) from project limit 1;" | xxd -r -p | iconv -f utf-8 -t ascii//TRANSLIT
<?xml
version="1.0"
 standalone="no"
?>
<!DOCTYPE
project
PUBLIC
H"-//audacityproject-1.3.0//DTD//EN"
iconv: illegal input sequence at position 241

I assume that the output is supposed to be closer to:

<?xml version="1.0" standalone="no" ?>
<!DOCTYPE project PUBLIC "-//audacityproject-1.3.0//DTD//EN" "http://audacity.sourceforge.net/xml/audacityproject-1.3.0.dtd" >

I’d prefer to not have to clone the Audacity repository and use it to write my own document extractor.

Does anyone have any suggestions on how I can decode the string properly?

Alternatively, does anyone have any suggestions on alternative mechanisms for getting a list of label tracks and the labels in them?

Thanks!

Can you not just export the labels from the project?

Good question. When I export the labels, it does not export the names of label tracks. And I need a programatic solution to automatically export the data without any user interaction.

I’m not sure what you mean by “without any user interaction”, or why.

I think I posted a plug-in somewhere that will export labels from the first label track only. Would that fit your use case if I can find it?

Another good question. I have a makefile that runs through all of my *.aup files, and converts them into something readily readable by my code. Essentially:

SCHEDULES=$(subst .aup,.schedule,$(wildcard music/*.aup))

.PHONY: all
all: $(SCHEDULES)

music/%.schedule: music/%.aup java Makefile
        java AudToSchedule $< > $@

I’m looking for a similar mechanism that can take an aup3 file, and output a list label tracks (with names), and labels for each label track.

It seems to me that you have two options:

  1. Use Audacity’s scripting API (https://manual.audacityteam.org/man/scripting.html). Scripting commands may be used via Audacity’s built-in scripting language “Nyquist”, or by “Macros”, or from any external language that supports “named pipes”.
  2. Devise your own method.


    Using option 1, labels can be extracted with:
"GetInfo: Type=Labels"

I can’t help you with the makefiles part of this, but since you seem quite technically capable, I have this to offer:

You can get to your project table if you care to compile Audacity (see https://github.com/audacity/audacity/blob/master/BUILDING.md). Near line 1857 in ProjectFileIO, put a breakpoint immediately AFTER the line: "project = ProjectSerializer::Decode(buffer); ", then run to that point.

Hover over “project” in that now previous line, then (hover/click) on the open triangle in the pop-up window. This will bring up a 2nd pop-up window.
Click on the magnifying glass in the first line of that 2nd pop-up window. Click in the Text Visualizer window which then appears, then Ctrl+A, Ctrl+C, open Notepad, Ctrl+V and Bob’s your uncle :exclamation:

I was able to hack at it and get something that works for me. I wrote a small ProjectSerializerDecode test harness that takes in the binary xml stored in the sqlite3/aup3 files, then decode it and print out the string in ascii.

#include <FileNames.h>
#include <ProjectSerializer.h>
#include <xml/XMLFileReader.h>

#include <cstring>
#include <iostream>
#include <stdexcept>

TranslatableString AudacityMessageBoxCaptionStr() { }
void ShowErrorDialog( wxWindow *                    parent,
                      const TranslatableString &    dlogTitle,
                      const TranslatableString &    message,
                      const wxString &              helpPage,
                      bool                          close,
                      const wxString &              log ) { }
void ShowExceptionDialog( wxWindow *                    parent,
                          const TranslatableString &    dlogTitle,
                          const TranslatableString &    message,
                          const wxString &              helpPage,
                          bool                          close,
                          const wxString &              log ) { }
wxString TranslatableString::DoGetContext( const Formatter & formatter ) { }
wxString TranslatableString::DoSubstitute( const Formatter &    formatter,
                                           const wxString &     format,
                                           const wxString &     context,
                                           bool                 debug ) { }
wxString FileNames::AbbreviatePath( const wxFileName & fileName ) { return {}; }

int main()
{
    const std::size_t INIT_BUFFER_SIZE = 1024;

    try
    {
        std::freopen( nullptr, "rb", stdin );

        if( std::ferror( stdin ) )
        {
            throw std::runtime_error( std::strerror( errno ) );
        }

        std::size_t                             len;
        std::array< char, INIT_BUFFER_SIZE >    buf;
        wxMemoryBuffer                          input;

        while( (len = std::fread( buf.data(), sizeof( buf[0] ), buf.size(), stdin) ) > 0 )
        {
            if( std::ferror( stdin ) && !std::feof( stdin ) )
            {
                throw std::runtime_error( std::strerror( errno ) );
            }

            input.AppendData( &buf.front(), len );
        }

        wxString project = ProjectSerializer::Decode( input );
        std::cout << project;
    }
    catch( std::exception const & e )
    {
        std::cerr << e.what() << "\n";
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}



set( TARGET ProjectSerializerDecode )
set( TARGET_ROOT ${topdir}/ProjectSerializerDecode )

message( STATUS "========= Configuraing ${TARGET} =========" )
def_vars()

add_executable( ${TARGET}
                ${topdir}/libraries/lib-strings/Internat.cpp
                ${topdir}/src/AudacityException.cpp
                ${topdir}/src/FileException.cpp
                ${topdir}/src/ProjectSerializer.cpp
                ${topdir}/src/xml/XMLWriter.cpp
                ProjectSerializerDecode.cpp )

target_include_directories( ${TARGET} PRIVATE
    ${topdir}/src 
    ${topdir}/libraries/lib-strings/
    ${topdir}/include
    ${topdir}/libraries/lib-utility/ )

target_compile_definitions( ${TARGET} PUBLIC STRINGS_API= AUDACITY_DLL_API= )
target_link_libraries( ${TARGET} PUBLIC wxwidgets::wxwidgets )
target_compile_options( ${TARGET} PRIVATE -DPROHIBITED==delete )



djshaw@culsu:~/svn/misc/wizardsInWinter$ diff audacity-Audacity-3.0.5/CMakeLists.txt djshaw-Audacity-3.0.5/CMakeLists.txt
522a523
> add_subdirectory( "ProjectSerializerDecode" )

I can compile with

$ mkdir -p build && cd build && cmake -G "Unix Makefiles" -Daudacity_use_ffmpeg=loaded .. && make ProjectSerializerDecode

And I run with:

$ sqlite3 music/QuincyJones-CominHomeBaby.aup3 "SELECT HEX( dict ) || HEX( doc ) FROM project WHERE id = 1 LIMIT 1;" | xxd -p -r | ./ProjectSerializerDecode
<?xml version="1.0" standalone="no" ?>
<!DOCTYPE project PUBLIC "-//audacityproject-1.3.0//DTD//EN" "http://audacity.sourceforge.net/xml/audacityproject-1.3.0.dtd" >
<project xmlns="http://audacity.sourceforge.net/xml/" version="1.3.0" audacityversion="3.0.5" sel0="84.7168713477" sel1="84.7168713477" selLow="1309.4169921875" selHigh="1309.4169921875" vpos="0" h="120.0388672121" zoom="26.8246450069" rate="44100.0" snapto="off" selectionformat="hh:mm:ss + milliseconds" frequencyformat="seconds" bandwidthformat="seconds">
 <tags>
...

I’ll be piping the output of ProjectSerializerDecode into further programs for analysis.

Any feedback and criticisms on the code and CMake changes are welcome.

Congratulations - Thanks for the report back and for sharing. I’m glad to hear that is working for you. :smiley:

The end result of my project is https://www.youtube.com/watch?v=HTj9_EMPsEU. This song was one of projects last Christmas. We’re closing in on November, so it was time to dust off the code and make sure it all still works :slight_smile:

I use Audacity to view the spectrogram of the music, and I create labels representing which channels to turn on and off. I find this process lets me get very accurate sequences where other pre-existing software doesn’t have such fine grain control (often restricting on-and-off commands to be eighth notes on a time signature).