Sample Printer

Edgar’s patch for handling slashes and back-slashes correctly in “string widgets” is implemented in 1.3.13, so that’s another good reason to make this a plug-in for 1.3.13 and later.
(This means that Windows users can use the standard path separator character).

Probably just as easy with plain text and wrap in

 tags.
I’m inclined to just loose the option and save one less control.

How about

;control chan "Channel layout for stereo" choice "Alternate Lines,L Channel First" 0



I put “Reset” first to draw your attention to it - I agree it’s problematic and I wanted to be sure that you would notice it as such but without biasing your view of what it should be. :slight_smile:
Roger’s recommendation regarding using SCRATCH is that there should always be a “clean up” option. However, what I think he had in mind was ensuring that SCRATCH did not get left with a large amount of data. In this case the amount of data is extremely small, and the “property” being used is almost certainly unique making data collision extremely unlikely, so I don’t think that it’s really an issue if it does not get cleaned up.

I think that “Multi-file from 1” would suggest that the plug-in would run and generate files starting at file number 1, but that is not the case. Resetting SCRATCH has to be done as a separate “run” of the plug-in from exporting multiple files.
How about

;control fcount "Mult-file counter" choice "Continue,Show Counter,Reset Counter" 0

Revised interface. The only thing that works is the help menu.
Does this cover all of the options that we want?
The help menu should make clear what each of the options is supposed to do.


OBSOLETE VERSION
sample-data-export.ny (3.95 KB)
NEW VERSION sample-data-export6.ny

Here’s a completely reworked “Sample Data Export” effect.

There’s no manual for it, but there are 4 help screens built in.
It’s a far cry from the simple effect that I originally intended, but now has a lot of cool features.

As it has become a rather complex piece of code I’ll be surprised if it’s bug free, so please test and let me know if there are any problems.

Unfortunately Audacity does not currently support Nyquist plug-ins in Chains, and the current alpha version only supports “Effect” plug-ins and not “Analyze” plug-ins.
Users of Audacity 2.0.1 that wish to use this plug-in in a chain will need to edit the file in a plain text editor (such as NotePad).
Change line 3 from:

;type analyze

to:

;type process

On restarting Audacity Sample Data Export will then appear in the Effect menu and may be used in a Chain for batch processing.
sample-data-export5.ny (18.2 KB)

Thanks! I would certainly like to get this released into Audacity.

I have not tested extensively but I have a few suggestions:

  • Do something more useful with an empty or zero samples value. Probably people who can’t work out how many samples they need to enter may enter “0” thinking this will give them the maximum in the selection or one million, whichever is less. Would that be good? Or it could give them one second for an empty value. I don’t think it would hurt to mention in the help that the project rate is the number of samples per second.
  • An option to show the header information in a message box (and/or possibly in Debug output) instead of running a file might be useful. It could be another choice in “Show Help File”.
  • Your reported offset is reduced but never zero even after running “Remove DC offset” in “Normalize” - different calculation system?
  • “Overwrite” isn’t useful with multiple tracks as only the file for the last track remains. Can you still produce separate files for each channel, but still overwrite each of those files when re-running?
  • Following on from that, I suppose some people will want an option to append results for subsequent tracks to the same file. If so, maybe that should limit the maximum number of samples accordingly?
  • Can the “Alternative Lines” layout append L or R after the value, or separate the pair values with a space?

I thought I might catch it out in no overwrite mode by renaming an already saved text file to the next number that it would save to, but it was clever enough to save to the number vacated by the rename. :slight_smile:

PS typo “Miscelaneous” in “Show Help FIle”.


Gale

Thanks as always for the comments.
“Miscellaneous” wasn’t a typo it was a spelling error :wink:

Before starting this rewrite I read through this entire thread several times to consider and reassess each suggested feature. I don’t claim to have got it exactly right and it’s inevitable that some users will want slightly different options from whichever we decide on. The code is very complex now because of all of the included features - I have tried to structure it as clearly and simply as possible, but with over 600 lines of code it is not easy for a beginner to customise unless they know what they are doing. For these reason I will also upload a very basic version (similar to the original version) that users can easily customise.

No that would be bad :stuck_out_tongue:
The default is set to 100 (and will be restored to 100 on launching Audacity because Nyquist plug-ins don’t remember their settings from previous sessions).
1 million samples produces a huge file. The minimum file size for 1 million samples is 4.8 MB (5 million bytes), and depending on the file data format it may be many times bigger. If a user enters “0” then we have no idea what they intended - perhaps they intended to enter “10” but missed the “1”, in which case they would be very surprised to get a massive file that may take several minutes to open.

The other possible occurrences of no samples selected is if (a) an empty file is analysed or (b) there is no selection. (This can occur with analyze type plug-ins, but for process plug-in Nyquist will throw an error). In such a cases the plug-in will correctly warn the user that there are no samples selected.

Of these possibilities I think the most likely is that the user has not made a selection.
On balance I think that if they type “0” then it is best to treat it as “0” but warn them of what has happened.

If the “Limit output to:” is left empty, it may be better to fall back to the default (100) rather than throw an error. I’ve added that now.


I’d expect that most people that use this plug-in will already know that. We have the usual problem with Nyquist help screens, very little space. If you really want that, where do you suggest putting it? (I don’t think it is necessary, but if released in Audacity that probably should be mentioned in the manual).


I originally had the entire output in the debug window, but took that out as it made the effect much slower for large numbers of samples. Showing the header in the debug window is fast, so that could be always enabled. I’ve enabled that for text/csv output options and added a note in the “Output Files” help screen (not very useful for html as the header info is in html format).


Three points here:

  1. The dc offset is for the specific number of samples that are analysed, not for the entire selection.
  2. The plug-in sums the sample values using the (integrate) function, which is single precision whereas I think the Audacity Normalize function uses double precision,
  3. I may have got the calculation wrong. I’ve tried some more tests and the discrepancy looks too great so I think this is a bug. I’ll look into it.


Correct, and that’s why it isn’t the default. The option to overwrite is only really useful when writing one file at a time, otherwise “Allow files to be overwritten” allows the file to be overwritten.

Not without creating other problems.
You may have noticed that there is no “Reset Counter” option, in fact there is none of the complexity of file numbering from the previous version. File numbering is now fully automatic, but at the cost of losing a little flexibility. With the default setting the plug-in looks to see if the file already exists, if it does then it adds a number to the end of the file name, and checks if that exists, if it does then it increments the number until it finds a unique file name. The result can get a bit odd if the user has a lot of numbered files and deletes some of them, but that will be a problem with any form of automatic file naming. By default the plug-in displays the full file name name of the file written.

As you say, that will limit the number of samples.
With the current code the plug-in could handle up to 2.2 million samples. With modification it could be unlimited (the rms calculation crashes out at 2.2 million samples).
Beyond about 1 million samples the output file size is becoming too big to be manageable. “Scite” text editor will open big files quickly, but NotePad and GEdit takes ages.
For smaller files, copy and paste is a wonderful thing :slight_smile:


Can we get some user feedback on that?
Yes it can be done (easily).
How will users use the data? Simple alternation without additional characters may be better for further analysis.
Will adding L/R or line breaks be a help or a hindrance?
Along the same lines I was wondering whether “Left Channel/Right Channel” should be left out of the “No header” option. We don’t have room in the interface for many more options, and no room in the help screens without adding “Miscellaneous 2” (which I’d rather not do) so users need to say what they want.

Update.
There is a bug in my dc-offset code AND a bug in the Audacity dc-offset (Normalize) code.

I’ll need to investigate further.

I’ve raised a new bug on Bugzilla http://bugzilla.audacityteam.org/show_bug.cgi?id=519
I’ll see if I can correct my code next week (busy weekend).

I’ve looked again at my code and and it appears to be accurate to within the precision available for the (integrate) function, but it can still be quite a way out so I’ve rewritten the function. It’s a bit slower, but as we’re only dealing with a maximum of 1 million samples that’s not too bad, and it is a lot more accurate.

Certainly that is an improvement, thanks.

My first thought, knowing that 100 samples was a miniscule length, was to enter 0 to see if I would get the maximum (or to see what it was). The reasoning is that if I have e.g 19.65 seconds of audio at 44100 Hz I really don’t want to start making those calculations to ensure I get the entire selection. I don’t know what the maximum is until I read the Help screens.

I wonder if it may not be useful to be able to enter seconds instead (I see someone asked for this a while ago but I have not added this to Wiki Feature Requests yet). One existing FR for Sample Printer asks to “include RMS data, not just peak values (1 votes)”.

Wavosaur has samples export as text. I don’t know what if any limit it has, but it produced a 90 MB file from 1 min 50 seconds of stereo music.

I would hope we can encourage some new explorers by putting this in Audacity. There isn’t much in the Analyze Menu now.

Here is my attempt (one more line than currently, three short of the maximum 25 lines):

 
'LIMIT OUTPUT TO FIRST' SAMPLES MAXIMUM:
This limits the number of samples even if the 
selected length exceeds this number. The maximum
samples written is 1 million, but files of this
size may be hard to open. To calculate the number
of samples to enter for a given number of seconds, 
multiply the project rate by number of seconds.n

Why not have this fourth screen as the first (including “Overview”, minus “OPTIONAL HEADER TEXT”) then re-order the screens so you have a logical progression (more or less) from top to bottom through the controls?

I thought a message box might perhaps be a quick way to find peak and RMS without running Amplify and Contrast.

I think that behaviour is a bit unexpected unless the user stops to think “why”. So I suggest the help could be a bit clearer on this limitation. This is three more words on one extra line:

By default, files will not be overwritten. If you
select multiple tracks, they will be saved to
separate files with a number appended to the
name. If you set "Allow files to be overwritten" 
to "Yes", only the last file for multiple tracks 
will be retained.n



It would help me to scan visually without software analysis. The relevant help screen has seven spare lines, and I think only one line of explanation is needed if we had a second, annotated “Alternate Lines”. I guess L/R may be less disruptive than line breaks for analysis software.


Thanks



Gale

The problem that I see with this is that “6 seconds” does not seem like much, but to print out every sample value it is “much” (about 530 thousand lines of text for a stereo track, or just short of the number of words in War and Peace).


I’ve not seen that Wavosaur feature, but 90 MB seems about right (about 9.7 million samples, and plain text being 1 byte per character). How long did it take NotePad to open that file?


It won’t be particularly quick. For anything other than a very short selection “Amplify” can calculate the peak very much faster than Nyquist can because it has access to the “summary data” (which Nyquist does not have access to).
I think that a simple analysis tool to show peak and rms levels would be very useful, but it would be much better coded in C++. Peak level can be calculated virtually instantly even for very large selections within Audacity because most of the work is already done within the project.

The other problem for Nyquist is that calculating peak level is normally done in memory, which limits the length of audio that can be computed (the old “Normalize big files” problem). It is possible to compute the peak value without running into memory problems, but to do so Nyquist must discard the audio data, which then prevents further processing.

I have a little plug-in for calculating peak levels (based on some code that Roger Dannenberg wrote a couple of years ago). The advantage that it has over the Amplify effect is that it gives separate left/right peak levels for stereo tracks. The disadvantage is that it’s not very quick (though it can handle extremely long selections).
peak-amplitude.ny (765 Bytes)
I’ve also previously looked at calculating rms level for large files, but doing so in Nyquist is either very complicated or very slow and I gave it up as not worth the effort.


Thanks for the suggestions regarding the help screens. I’ll spend some time on the help screens after the weekend.

Milliseconds? But then not quite so understandable?



Gale

Assuming that the tracks are the same sample rate as the project rate, look in the Selection Toolbar.

That’s why I decided to throw an error for too big a number rather than silently limiting the number. It makes the limit more easily discoverable.
Also, one of the reasons that I chose 1 million rather than 750,000 or 1.3 million is because it is memorable. The user sees the error message once and then they know that the limit is 1 million.

Reworked Help Screens.

OVERVIEW.
Sample Data Export reads the values of successive
samples from the selected audio and prints to a
file. Additional information may be added as a
‘header’ at the top of the page.

LIMIT OUTPUT TO FIRST (maximum number of samples):
Enter a number to limit the number of samples
processed from the selection. The maximum number
of samples is 1 million, but files this large may
be hard to open. The track sample rate indicates
the number of samples per second.

LINEAR/dB SCALE:
Sample values may be displayed on a linear scale
+/- 1 (as in the Audacity audio track “Waveform”
view) or on a dB scale relative to full scale (as
in the “Waveform (dB)” view).

HELP SCREENS:
Select only one track before viewing to avoid
repeated help screens. To run the plug-in set the
help option to “No”.
Select “Save Help File” to write all help
screens to a printable file.



FILE FORMAT.

Following any header information:

SAMPLE LIST: produces a list of sample values.

INDEXED LIST: includes the sample number.

TIME INDEXED: includes the sample time.
Both types of index are relative to the start of
the selection.

DATA (csv): prints the sample values separated
by commas.

WEB PAGE (html): produces an HTML 5 document that
contains all of the header information and a table
of sample data with sample number, time, linear
and dB values. Browsers that are not HTML 5
compliant may not display the page correctly.

CHANNEL LAYOUT: for text/csv output, stereo tracks
may be printed alternate left/right samples or all
of left channel then all of right channel.



OPTIONAL HEADER TEXT:
This is provided for adding notes to the output
file. For HTML output
may be used to start a
new line.

NO HEADER: Prints only the optional header text
(leave blank for none) followed by the sample data.

MINIMAL HEADER:
The sample rate.
Units (linear or dB).
Optional header text (leave blank for none).

STANDARD HEADER: minimal header plus:
File name.
Number of samples.
Duration (seconds).
Mono/Stereo.

FULL HEADER: standard header plus:
peak amplitude linear and dB.
Unweighted rms level (dB).
DC offset.



OUTPUT FILES.

The default output folder is the “home folder”:

To select a different output folder, enter the
full path name. The output folder must exist.

By default, files will not be overwritten. If you
select multiple tracks, they will be saved to
separate files with a number appended to the
name. If you set “Allow files to be overwritten”
to “Yes”, only the last file for multiple tracks
will be retained.

A notification message is displayed on completion
indicating the name and location of the file.

If the plug-in is used in a Chain (Audacity 2.0.1
or later) it may be useful to disable messages.

For text/csv output the file header is shown in
the debug window.

I’ve been thinking about this and it quickly gets complicated.

For easy viewing (visual analysis) the html format is probably best - all of the information nicely laid out in a table. The downside is that for 1 second 48 kHz stereo, the html file weighs in at a whopping 5.2 MB.

If we had L/R before alternate lines, would we want that for all of the text formats?
Would the L/R come before or after the index?
What about the csv format?
Would it need to be an option rather than pre-set?
Would we want “L/R” or “Left/Right” or a line break?

Not only would providing these options complicate the user interface but would probably need quite extensive documentation in the (tiny) help screens. To date we really don’t know if there is any demand for these features.

I think that for now we leave that as it is. Once the plug-in is in circulation we may get some useful feedback and we can update it if necessary.

Yes a reasonable workaround (but assuming it is already set to samples).

These look good, except maybe put the Help Screen info after the “Overview”?

OVERVIEW.
Sample Data Export reads the values of successive
samples from the selected audio and prints to a
file. Additional information may be added as a
‘header’ at the top of the page.

HELP SCREENS:
Select only one track before viewing to avoid
repeated help screens. To run the plug-in set the
help option to “No”.
Select “Save Help File” to write all help
screens to a printable file.

LIMIT OUTPUT TO FIRST (maximum number of samples):
Enter a number to limit the number of samples
processed from the selection. The maximum number
of samples is 1 million, but files this large may
be hard to open. The track sample rate indicates
the number of samples per second.

LINEAR/dB SCALE:
Sample values may be displayed on a linear scale
+/- 1 (as in the Audacity audio track “Waveform”
view) or on a dB scale relative to full scale (as
in the “Waveform (dB)” view).


Probably you would just have it for just all text formats. People who want a csv list probably really don’t want any interpolations.

I don’t think we need to go overboard; if we did it I would just append an L or an R to the values (no choice in the matter). It isn’t completely clear now that HTML files for “L Channel first” and “Alternate lines” actually have the same layout, so we don’t need to get hung up on descriptions of what “Alternate marked lines” or whatever means.

To be fair I don’t think the two indexed text files have much of a display problem, but it is more of a problem for the “sample list”. And if you had to submit any of the text or csv files to someone else, the other person doesn’t actually know for sure from the alternate lines versions that left comes first. So for all but html, how about putting “Left channel and right channel follow on alternate lines” or similar, not as a header, but just above the data, as you do with “Left Channel” and “Right Channel” for “L Channel first”?


I like the text export of the help file. I wonder if that is worth progressively including in all plug-ins that have help screens, without any choice of location? User could see the location from the help file.


Gale

Yes, good idea.


The same may well be true for people that want a plain list with no header.

I’m inclined to change:

NO HEADER: Prints only the optional header text
(leave blank for none) followed by the sample data.

To:

NO HEADER: Prints only the sample data list
unless optional header text is entered.

and, as implied by this wording, not include L/R or Left/Right text.

If there is a header, then as you suggest, add a line above the data to the effect of “Left channel and right channel follow on alternate lines”.


Yes, I thought that was a nice touch.
If the help screen is printed to the “home” folder then the code is fairly straightforward. If the destination folder is user defined than it adds a lot of code to the plug-in, but there may be a way to make writing files to user defined paths a lot more straightforward. More about this later.

That sounds reasonable. To be consistent then, shouldn’t you remove “Left Channel” and “Right Channel” from above the data when choosing “L Channel First” and “no header”?



Gale

Yes, that’s the plan.
When “no header”, only the sample value, index (if selected) and optional header (if any) are written.
If the user wants the file to start with a message “Left Channel First” (or any other text) they can enter the text that they want in the “optional header”.


Another minor change:

OPTIONAL HEADER TEXT:
This is provided for adding notes to the output
file. In text files, use ~% to start a new line,
in HTML files use
.

After trying this a few different ways, it seems to make sense to add this information immediately after the “number of channels”,
so (for example, with a stereo track and “standard” header)

sample-data1.txt   2 channels (stereo)
Sample Rate: 44100 Hz. Sample values on dB scale.
Length processed: 100 samples 0.00227 seconds.

will become:

sample-data1.txt   2 channels (stereo)
Left and right channels on alternate lines.
Sample Rate: 44100 Hz. Sample values on dB scale.
Length processed: 100 samples 0.00227 seconds.

I don’t expect there to be much demand for this, but just in case anyone reallydoes want alternate lines to be prefixed L/R, they can uncomment line 29
(code from line 26 to 29)

;; To enable L/R prefix before alternate L/R channels 
;; (text outout with header only)
;; remove the semicolon from the start of the next line:
;(setq LR-prefix 1)

Here’s the revised version.
sample-data-export6.ny (20.1 KB)