Sample Printer

A simple utility plug-in to print out the values of samples within the selected audio.

Downloads:
SamplePrinter.ny (obsolete version)
sample-data-export.ny (obsolete version)

FINAL VERSION sample-data-export.ny

NOTE: Sample Data Export is included in Audacity from version 2.0.1 onwards. Get the latest Audacity release here .

This is an “Analyze” plug-in (appears in the Analyze menu) which I made for testing purposes in another project.
It creates a simple html file that lists the sample values for the first x samples of the selection.
Do not set the number of samples too high or your computer will probably lock up (very memory hungry). Should be OK for up to a few thousand samples.
It is currently for mono tracks only.

[Update: The current version resolves the memory problems and may be used for mono or stereo tracks. Many additional enhancements - read this topic for full information]

It probably needs an error forbidding more than 1000 samples. I had to reboot the computer even on 8000 samples on Win 7 x64.

Maybe info on the number of channels (when > 1 is supported) and the sample rate would be good to have in the file. I think it’s more usual to have time offset information even though an ID number for each sample is handy.

Can you strip the opening and closing " out of the file?

Thanks


Gale

I’d rather not do that as the number of samples that can be processed is dependant on the available RAM.
I think that 1000 samples is a sensible limit for “average” computers, though some users that have heaps of RAM may want to process larger amounts of audio.
I’ve added a clear warning to the main interface.

The original version was intentionally very simple for 2 reasons - 1) because that was all that I needed when I wrote it, and 2) so that it would be easy for other users to modify for their own use. However, since you’ve asked…

This new version supports mono or stereo tracks and has options for additional information in the output file.


Yes.

Here’s version 2 of the plug-in with the additional features.
This plug-in requires Audacity 1.3.x (it’s a version 3 plug-in)

[Edit] Attachment removed - see next post

Your call, but people who want to push the envelope could instead be made to change the ny file…

I noticed a small typo “dependant”.

A few more ideas:

  • Maybe the path control example should be a Windows style path?
  • Maybe “Include track info” should be “include sample rate” (unless more features are coming like the track name?)
  • Stereo works fine but not of course split stereo, nor multiple mono tracks (each file is successively overwritten by that for the next track, leaving just the last file). Could it handle this better, or even produce one file per track?
  • Would text be a better output format?

Thanks for this.



Gale

I think that most novice users are likely to accept the limits of the slider range (many users do not even realise that values can be typed into the text box of slider widgets), but the dire warning at the top of the GUI should be sufficient to discourage reckless experimentation.

Oops :blush: (spelling was never my strongest subject)

Unfortunately back-slashes cannot be used with the current Audacity-Nyquist interface.
There was a discussion about the difficulty of setting paths in Nyquist on the audacity-nyquist mailing list last year http://audacity.238276.n2.nabble.com/Setting-Nyquist-working-directory-tp4134136p4134136.html
In response to this issue Rodger Dannenberg updated Nyquist with the addition of an environment variable (getenv), but unfortunately Nyquist in Audacity has not been updated.
The best workaround that I can think of is to use forward slashes (which also worked on Windows last time I tested it).


Edgar highlighted an additional difficulty on Windows operating systems:

IMPORTANT: On MS Windows you cannot reliably re-construct the
user’s HOME directory only by the username, because on Windows
the path of the HOME directory is localized:

In an english Windows the user’s home directory is something like:
C:documents and settings[username]my files

In a german Windows the same home directory is something like:
C:Dokumente und Einstellungen[username]Meine Dateien

You have no doubt guessed that the default path used in this plug-in is set for my (Linux) computer.
Can you can suggest a better default path? One that works on Windows? (the old truncated 8.3 DOS file names may be required)

When stereo tracks are analysed the output also prints which channel the data belongs to. This is omitted for mono tracks for obvious reasons.

It’s possible, but until the problem with SCRATCH (property lists not surviving from one invocation to another) is fixed, then I’d rather stick with Rodgers advice

I
understand that the property list might not be working now, but it’s a
bad idea to design on top of bugs.

The full discussion can be found here: http://audacity.238276.n2.nabble.com/Patch-improving-Nyquist-input-behavior-tp3874068p4068444.html


Possibly for some applications it would, but for other cases XML may be better. The advantages of HTML are that text is easily formatted, easily displayed, there are no LF/CR confusions across different platforms, and the output can be easily copied and pasted from a web browser into text editor if plain text is required. The original version of this plug-in was deliberately rudimentary so as to be easy for users with little experience with Nyquist to tailor to their own requirements. The more complex the plug-in becomes, the harder it will be to do that. Although I’m happy to add useful functionality I’d like to keep the code relatively simple and easy to follow.

Thanks for the feedback and ideas Gale. The attached file has the spelling mistake corrected and the “dire warning” more prominent.
sampleprinter2.ny.zip (1.09 KB)
(This replaces the previous version of sampleprinter2.ny.zip).

I found another typo “amout” (fixed in attached).

Not really, and using truncated file names or ones that overflow probably won’t help. However if this is to go public I don’t think we can leave it as it is. For example if one just uses the current default, the confirmation will say it’s been written to that default when of course it hasn’t. Best suggestion I have at the moment is the attached “Unique, pre-existing output pathn(writes to current folder if empty/invalid)” with a suggested path of C:/ (or even leave the path empty). What happens on XP if the path is empty/invalid is that it writes to the Audacity folder the plug-in folder is in, unless you have written to a valid path previously in that session, in which case it reverts to that previous path if the path you enter is invalid.


I still think “include sample rate” gives more idea what you’ll get than “include track info” :slight_smile:

Trouble with that is that it just looks like a bug unless you document it or maybe write the first track only if they select more than one.


Gale
sampleprinter3.ny (2.25 KB)

Thanks.

That sounds as good as anything else. I prefer it to leaving the path empty as it at least gives a hint of the correct format.

I think that is what should happen, but is frequently broken and goes to the modules folder instead. :frowning:

OK, I’ve no problem with that. I’ll change it in the next version.

I’ve had an idea that may work and does not require the variable SCRATCH but I need to try it to find out if it works.

Steve, Sample Printer is proving popular with feedback@ enquirers. Unless you have a fix in progress for handling multiple tracks It may be good to bring it forward a bit for promotion to “Download Nyquist Plug-ins”.

Feedback is pretty consistent on these points:

  1. Option for text output is wanted. HTML is OK but not as only choice (the rationale is that text is easy to convert to a graph in a spreadsheet)
  2. Output format descriptions unclear (you don’t know if it’s text, xml or whatever until you save, and wordings confusing)
  3. Option wanted for sample rate with sample values only (so maybe a separate control for sample rate or not)?

One of my own - call it “Sample Data Export”?



Gale

That’s encouraging, and I find somewhat unexpected as there appear to be very few downloads from the forum.

That was a problem at the time due to scratch not working correctly, but I think that has now been fixed.

Not sure what you mean :confused:

Like it :stuck_out_tongue:

I just attach the file to the e-mail as it’s quicker.

The problem I had in mind was if you select multiple tracks the plug-in writes a file for each track, but overwrites to the same file name. Could it not do like export multiple and add a suffix to the file name for subsequent files in the same process, while still overwriting the file if it already exists?

You can export a file that has sample values without indices, but not one like that which also has the sample rate.

Also I notice unless you choose “Include track info” it doesn’t say which channel is which in a stereo track. For the HTML it might be nice for stereo to have the two channels side by side in a table.



Gale

Yes it could, but the plug-in would need to keep a count of which track it is processing.

Try this code on multiple tracks:

(if (boundp 'count)
   (progn
      (setq count (1+ count))
      (format nil "Writing track number ~a as filename-~a.txt~%" count count))
   (progn
      (setq count 0)
      (format nil "Writing first track as filename-0.txt~%")))

One might expect that when processing the first track “count” would be initialised to zero, then for subsequent tracks “count” would be incremented. However that does not happen because Nyquist runs as a separate instance for each track.

To make this work we need to use a variable that will survive from one instance of Nyquist to the next, and that’s why we need “scratch”.

(This is NOT the best way to do this, just a simple example)
If we replace “count” with “scratch” then it works as expected:

(if (boundp '*scratch*)
   (progn
      (setq *scratch* (1+ *scratch*))
      (format nil "Writing track number ~a as filename-~a.txt~%" *scratch* *scratch*))
   (progn
      (setq *scratch* 0)
      (format nil "Writing first track as filename-0.txt~%")))

However, scratch will survive for as long as the current Audacity session is open, so running the code again will continue counting from the last value of scratch. So we need to be able to manually reset scratch, which we can do with:

(setf *scratch* '*unbound*)

As a side note, this is another case where it would be extremely useful to be able to run a plug-in and keep the plug-in open after it has finished (the other classic example being Noise Removal).

*The “correct” way to use scratch" is to assign the value to a key in a property list, but this is currently broken and does not work up to and including Audacity 1.3.12.
It has been fixed in Audacity 1.3.13

The options that are available to us are:

  1. Write the plug-in using scratch the “correct” way but do not officially release the plug-in until the release of Audacity 1.3.13.
  2. Write the plug-in using scratch and hack around the problem for Audacity 1.3.12, then update the plug-in when Audacity 1.3.13 is released.
  3. Write the output from processing multiple tracks into one file (append data to current file if file exists)
  4. Write sequential files (filename-number) without using scratch
  5. Keep it as a plug-in for processing one track.

I’m not keen on option 2 because the hacked code would probably be floating around for a long time (with my name on it) and bad programming should not be encouraged.

The only way I can think of achieving Option 4 (sequential list without using scratch) would be to check for the existence of “filename-0.txt” and if it exists, check for the existence of “filename-1.txt” and so on until an unused file name is found. This method would not allow the plug-in to overwrite an existing file (ever), which I think is a rather ugly option.

Option 5 (one track only) could check if the file already exists before writing to it. There could be an option in the GUI for “Overwrite previous file: Yes / No”

Isn’t that what samplprinter3.ny does if “Sample Values Only” is selected?

I get the impression that HTML output is not generally very useful (though it suited the original purpose of the plug-in).
For the sake of simplicity, would it be better to drop that option altogether and stick with plain text output?

I guess that for importing into a database the most useful output would be:

  1. A list of sample values only (plain text file)
  2. Sample values only (.CSV file)
  3. Index and sample value (.CSV file)

For stereo files, would it be better to:
Alternate values as “Left channel sample > Right channel sample > Left channel sample > Right channel sample > …”
or
List the Left channel values first, then list the Right channel values?

Perhaps also a HTML formatted output for a “human readable” output. This could include additional information such as the sample rate, number of channels and optional user input text.


Regarding export path.
This can be much improved now that we have the (get-env) function. However (get-env) was introduced after the release of Audacity 1.3.12 so it requires Audacity 1.3.13 or later.

Regarding the limited number of samples that can be analysed.
With recent improvements to memory management I think it may be possible to work around this limitation. How much of an issue is this for users? Is it worth the effort?

I’m currently thinking along these lines:
sample-data-export.ny (3.39 KB)

Steve,
Concerning your option 4: the (listdir path) will return a list of all files in a directory, so the names can be searched quickly to see what number should come next.

Thanks storer, I’d not thought of that.

I presume you’re thinking of something like this:

(setq dirlist (listdir path))
(setq basename "sample-data")

(setq count 0)
(while 
  (member (format nil "~a~a.txt" basename count) dirlist :test 'string=)
  (setq count (1+ count)))

(setq filename (format nil "~a~a.txt" basename count))

(print filename)

I’ve been through this topic and there’s not really much to be done.

The main thing from a programming standpoint is to see if I can remove the sample number limitation.
Other than that, it’s mostly decisions about what features to include and what to miss out.

Decision Time:

  1. Decide on Output Path.
  2. Decide Multiple tracks vs.Append. (prefer append if not too slow)
  3. Decide on output file formats.
  4. Decide on output data formats.
  5. Decide on additional information (header)

1)
How do we want to handle the output file path:
Valid paths that are common to all platforms are

  • The users “Home” folder
  • Root folder (often not writeable)
  • Desktop (could be a problem)
  • Fully qualified existing folder (must use forward slash)

“Home” folder is probably the safest.
“Append to Home” would be very convenient for many users (I would probably use this option).
“Fully Qualified Existing Path” is the most flexible, but may be too complex for some users.
Rather than just failing with an invalid path I think I could use the home folder as a fall-back.

2)
I think “append” to file is less problematic than multiple files, and may be more convenient for many users (saves needing to open multiple files). I don’t know how slow this will get for large files so I’ll check that.

3)
Plain text is the easiest (.txt)
Comma Separated Variables (.csv) may be useful.
HTML can be prettier and easier to read.

4)
Current options include “Data Only” and “Indexed Data”
Additional information could be included as an additional user option.

5)
Additional information could precede the data (for any of the format options) and could include:

  • Sample Rate
  • Name (from file name) Append number for multiple tracks.
  • Mono/Stereo
  • Output file path/name
  • Number of samples
  • Duration (seconds)
  • Average sample value
  • RMS value
  • Maximum / Minimum values
  • Optional user text (text field in GUI)

I don’t think that we can have “switches” for each of these in the interface, so it’s a matter of deciding which will be most useful and giving the option of including all of those or none of those (or some other subset of combinations).

I’ve just printed 10,000,000 samples (stereo). The file was only about 4 million samples long, so there are a lot of “NIL” results.
Processing and writing the file took about 3 minutes and Audacity memory usage never went above 35MB.
Opening the file in GEdit took about a minute. Memory usage in GEdit is now up to 1.7 GB and I’m still waiting to scroll to the bottom of the page (currently down to line 14,000,000 of about 20,000,000).
I think we can safely say the sample number limitation is fixed.

[Update: just got to the bottom of the list (line 20,000,005)]

I suppose the question now is whether we still want to have the “Number of Samples” slider.
I think that it is probably still good to keep the slider so that the unwary user does not print out a multi-Gigabyte file and crash their computer trying to open it.
Perhaps we should just increase the range of the slider up to say 10,000 samples, or are most of the users happy with a range up to 1,000?

Thanks for spending time on this, Steve.

No, it produces sample values without indices and without sample rate.


Nobody I have conversed with wanted HTML, but there seems no harm including it.


No-one has objected to “list left, then right” but I think “alternate values” is more usual. If someone wants “left then right” they can split the stereo track I guess.


What is the workaround for the limitation? I assume Nyquist still doesn’t release the memory until processing is finished?

I think the slider is best kept for the reason you state, but if the limitation does not exist any more, don’t limit the value that can be entered in the box. I think the help needs to state that exceeding the slider range will produce very large output files.

However I notice that in sample printer if I select say 50 samples and the slider is still at default (100) then “Nyquist did not return audio” occurs. I guess at the least that needs an error message. In fact I wonder if it would be more convenient if the number of samples specified in the plug-in was the first in the selection as now, but export continues if the selection is shorter than . In other words, the control could be “limit output to: [samples]” or similar and there would be much less need to fiddle with the slider if you had already selected exactly the samples you wanted on the screen.


Personally I would keep it simple as in 78rpm EQ Curve with only one “output” box for path, without options to choose. Your suggested “use home as fall-back” would I think be better for 78rpm EQ Curve too.

The users I’ve come across so far seem to prefer multiple files. Appending could limit the number of samples you could usefully specify. I think though it’s important enough to offer a choice for “multiple track write mode” as a control (more so than having two “output” boxes).

I’m a bit concerned about the “reset” choice being confusing until I see it working. It could be easier (though much less functional) just to have the plug-in export the first selected track.

Are we still facing a choice about “waiting for 1.3.13”? I don’t think it matters to wait.


I think the current list of five “file types” is fine (though maybe “Samples List” would be a better term than “Data List”). I’m not sure if HTML really needs to be “extended” or user-defined given there is little demand for HTML anyway. I think additional header information should be an option for all file types.


My take is that the first three in the list above should be included as standard - you get them whether you like it or not. I assume most would like it since there are complaints that some current options don’t include sample rate. I think 4 and 10 are unimportant. A couple of people asked for sample format.

A couple of people asked for output in dB instead of dB FS which would have to be another control. If this is a feasible request I think it’s an important one.

One person asked for the index value to be an actual time position (he was selecting small numbers of samples at arbitrary regions in the track, so 1,2,3… wasn’t that helpful). I guess the only sensible way to accommodate that if we wanted to is to accept that if someone asked for indices it would be both the index number of the sample and its time position.




Gale

So the data formats that we definitely want are:

  1. Sample Values only.
  2. Indexed Sample Values
    and for the following “header” information to be included:
    “Sample Rate, Name (from file name) Append number for multiple tracks, Mono/Stereo”.

So the output could look something like:

indexed list .txt

Sample-data01
Mono 44100 Hz

1   0.032451
2   0.042184
3   0.062345
....
....

values only list .txt

Sample-data01
Mono 44100 Hz

0.032451
0.042184
0.062345
....
....

I’m getting a lot less keen on .CSV due to all of the different variations. I think best to keep it simple - if users want CSV they should be able to easily convert a plain text list to the particular CSV format they require. If we’re leaving out CSV, then may as well leave out HTML also and that eliminates one control altogether (and simplifies the code).

I think that splitting the track is probably unnecessary hassle for the user. If you can think of a good, clear, brief description I could add an option to output stereo samples values as either “Left Channel then Right Channel” or “Alternate L,R,L,R…”

It’s obvious now, I don’t know why I didn’t think of it before :stuck_out_tongue: (probably because when I originally wrote the plug-in I only required a small number of sample values).
The “workaround” is that there is really no need to load the sound into memory in the first place. We don’t need to return the sound, so we can just read samples (destructively) from “s” as we need them. This way, “s” is not loaded into RAM so there is no problem with long selection.
Unfortunately this approach cannot be used to solve the “Normalize long selections” issue because for normalising we need to retain “s” so that we can return it (normalised) back to the track.

I would have thought that it should be pretty obvious that if you have millions of lines of text the file will be quite large?
I would expect users try the effect within the slider range before they start experimenting with extreme numbers. Once they’ve processed a few thousand samples they will see the size of the files produced. However, if there’s room in the help file there’s no harm in mentioning it. (It’s that darned small help screen issue again :frowning: )

Yes, spotted that one.

Personally I find the “Append to Home” option really convenient.

What I may be able to do (need to test this) is just have one “Output Path” box without options, and:
If is empty, use “Home”
If home/ exists, use "home/
If home/ does not exist, look for (fully qualified output path)
If no valid path found, fall back to “Home”

For example, on my computer:
Desktop or /Desktop or /Desktop/ would output to /home//Desktop.
/home//Desktop would output to /home//Desktop (because /home//home//Desktop does not exist).
would output to /home/.

Yes I think it could be a useful option. I’ll keep it unless user feedback indicates that everyone prefers one or the other (once they have the choice, which they don’t yet).

Yes, me too.
It is required if writing to multiple files is supported so that the file counter can be reset to zero.

For example, with 5 tracks in Audacity, multi-file export, base file name “data”:
Output files will be:
data0.txt, data1.txt, data2.txt, data3.txt, data4.txt (or would it be better to start numbering from “data1.txt” ?)

Then you want to process a longer section of those tracks and overwrite the existing files.
Using Storer’s suggestion, the user could manually delete the files before re-exporting, but otherwise the files will not be overwritten and the exported files will be:
data5.txt, data6.txt, data7.txt, data8.txt, data9.txt

The “Reset” button provides a convenient method to reset the file number counter (it removes the property from SCRATCH)
A separate control “Reset file number counter: Yes / No” would be no more clear, because it is not possible to reset the counter and run the effect. It has to be done as a separate “run” of the plug-in. As I mentioned before, this is a case where it would be extremely useful to be able to run a plug-in and keep the plug-in open after it has finished.

The “Reset” button is only required if exporting to multiple files. Multiple tracks could be written (appended) to a single file without the need to use SCRATCH and so no need to have a Reset button. However, without using *SCRATCH" the “header” information could not include a track number.

SCRATCH and (get-env) are not in Audacity 1.3.12
SCRATCH is used as a track/file counter and (get-env) is used to locate the “home” directory. It is not possible to determine the users home folder without the (get-env) function.

Yes I think so.

I can include “Name (+ number), Mono/stereo and Sample Rate” as standard.
My reasoning for making them optional was if anyone was wanting to directly import the files into another application. If they were doing that then they would probably just want the data and no other text in the file. If no-one is doing that then there’s no need for the option. It would probably be easier to provide the option now but comment it out rather than adding the option later.

I think that “4 Output file path/name” needs to be displayed after the export has completed. Particularly important if the path entered was invalid and the fall-back path was used. Number of samples and duration would probably be useful too so that the user can see that they have processed what they intended. Something like:

Data written to:
C:Documents and Settings<username>data.txt
27 samples - 0.000 seconds.



You mean the bit-depth?
That’s a “Feature Request” or an item for the Nyquist Wish List
Nyquist does not know what the track bit-depth is. Nyquist always receives the data as 32 bit float.

It would often be useful if Nyquist knew more about “s”.
Audacity has information about the track that is not available to Nyquist.
The number of samples is passed to Nyquist in the variable “LEN” and the actual audio data in “S”, but that’s all.
What I would like to see is additional track data passed to Nyquist, possibly as a property list, to include:

  • Track name,
  • Channel allocation (for mono tracks),
  • Peak Amplitude (solves the Normalising issue for some situations),
  • RMS,
  • Start time,
  • Envelope Points,
  • Bit-depth.


Do you mean dBFS instead of linear?
Current output is the linear value. It could be output as “dB” (which would be dBFS as in the “Waveform (dB)” view).
I can add that as an option.

That’s another Nyquist wish list / Feature Request.
Nyquist does not know the track time. As far as Nyquist is concerned the beginning of the selection is t=0.

We could have the “index” as a time value relative to the start of the selection, but that’s probably not what the user wants, and at 48 kHz sample rate the index would be incrementing in steps of 0.000020833333 seconds.

If the user wants the track time value they need to look in the Selection Toolbar. Sample times will then be “start time + (index/sample rate)”

I think HTML could be a time saver for web publication and I might be +0.5 on keeping it even though none of my small sample of users were interested in it. Your call.


My best shot so far:

;control chan "Channel layout for stereo" choice "L  -  R lines,L block above R" 0



It may depend whether this goes in an Audacity release (ATM I think there is a strong case especially as Analyze menu is not yet overloaded). If someone downloaded it optionally, the assumption would be they have some idea what it will do. If it’s in a release it could be open to a naïve user who just “tries” the effect to see what happens and opens the resulting file. They’ll be used to 3 MB for an MP3, so 3 MB for a text file won’t on the face of it seem outlandish.


I think the drawback is the amount of explanation this would need in the Help. in particular I think few on Windows would understand “Append to Home”. They can understand typing a path in a box (probably).


I think better to start from “data1.txt”. A “0” file is only used as far as I recall in export multiple for a file before the first label.


OK. I think the benefits of *SCRATCH" probably outweigh the potential confusion. I think the worst confusions are a) the word “Reset” itself; b) you can’t “see” the counter to know what value it has.

I think a “phrase” rather just “Reset” would help for a), as would not having it as first choice. I don’t think “Reset Counter” will help much. Does “Multi-file from 1” or similar convey any more, so that people leave that choice alone if they are not writing multiple files?

For b), could a choice “Show Multi-file name” or similar show you the counter, as Show Path shows you the path?


And even the multi-choice boxes (of which we’ll have many here) require 1.3.x, so I guess we accept you need latest Audacity for this plug-in, and also possibly consider a stripped down, simple legacy version if there is any demand.


I’m sure people are doing that but those same people still seem to want “sample rate” in the header. I think the reason is that in e.g. spreadsheets you can simply select the data to be used and then type what you want for the legend. If so then it’s handy to have text for the legend visible in the input. I suppose you could have the choice for header of “none”, “minimal” or “full”, but if only two choices are desired then “minimal” or “full”, not “none” or “full”.

Agreed.


Yes “sample format” as in the Quality Preferences = bit depth.

I’ve added that list (and “track time” as per below) to Nyquist Wish List.


Sorry for terminology mix up - just copied what someone wrote. But yes, the logarithmic Waveform (dB) values would be very welcome as an option.


Perhaps that might be a useful “tip” for the “Help” and possibly not that hard to get a spreadsheet to calculate it? I think “time from selection start” and “track time” are both useful, but not worth adding either until we can support both?



Gale