Help With Rounding Number of Samples

I’ve just come on board helping a team with an app that generates lots of different tones with a sine function and plays them at various intervals (in the interest of health, apparently). There are no external libraries being used to do this; we’re doing it in Java with a straight-up Math.sin call and writing bytes to a 16-bit .wav file.

We are having quite a tremendous bit of trouble with an issue where, under certain circumstances that we have somewhat narrowed down, sounds that ought to be gentle sine waves come out sounding like machine clanking. The head engineer for the project has narrowed this down to an issue with the number of samples used. Sometimes, based on the total duration for the entire list of frequencies to be played, the calculation for the number of samples is not a whole number, so the program rounds up in this instance. When this happens, the distortion is produced. (Oddly, though, this isn’t an issue at all if the number of frequencies generated, which affects the number of samples in each pass, is less than 71. Very strange.) We have been comparing this to the ‘Generate Tone’ functionality of Audacity. I have experimented with creating tones at arbitrary lengths to create situations where the amount of seconds I specify will result in a non-integer number of samples, forcing Audacity to round. While Audacity will round up or down according to basic rounding principles, our engineer insists that rounding down will render the program unable to produce the .wav file. (We’re going to be talking about this a lot tomorrow. I don’t know if I definitely buy it.)

The team lead wants me to find out more about what Audacity is doing differently from us. Here is the extent of the code I’ve been shown so far (I am an outsider, after all). If you need to see more, say so and I can request it.

double durationSecsEachFreq = lengthSeconds / freqList.size();
//get the sample count required for each freq
int numSamples = (int) Math.ceil(durationSecsEachFreq * sampleRate);

double[] frequencyArray = new double[numSamples];
byte[] frequencyByteArray = new byte[2 * numSamples];

// fill out the array
for (int i = 0; i < numSamples; ++i) {
    frequencyArray[i] = Math.sin(2 * Math.PI * i / (sampleRate / freqOfTone));
}

// convert to 16 bit pcm sound array
// assumes the sample buffer is normalised.
int idx = 0;
for (final double dVal : frequencyArray) {
     // scale to maximum amplitude
     final short val = (short) ((dVal * 32767));
     // in 16 bit wav PCM, first byte is the low order byte
     frequencyByteArray[idx++] = (byte) (val & 0x00ff);
     frequencyByteArray[idx++] = (byte) ((val & 0xff00) >>> 8);

}
return frequencyByteArray;

Java is not one of my languages, and you’ve only given a fragment of code, so you’ll have to help me out a bit here.

Where does “freqOfTone” come from? Are you iterating over frequency values in “freqList” ?
If so, are you simply concatenating the sounds from each “frequencyByteArray”?
If so, are you expecting a click each time you start a new array? Even if each frequencyByteArray ends at zero there will be a glitch as the next tone starts at zero. If a tone does not end at zero there will be a very significant click when the next tone starts.

Please give some example numbers for “freqList” and “lengthSeconds”.


For my reference, this is a literal translation of what I think you are doing, written in Nyquist (though it would not normally be written like this)

(setf freqList (list 440 300 1000 400))
(setf lengthSeconds 2.0)

(setf durationSecsEachFreq (/ lengthSeconds  (length freqList)))
; get the sample count required for each freq
(setf numSamples (truncate (+ 0.5 (* durationSecsEachFreq *sound-srate*))))

(setf frequencyArray (make-array numSamples))

(defun one-tone (hz)
  (dotimes (i numSamples (snd-from-array 0 *sound-srate* frequencyArray))
    (setf (aref frequencyArray i)
          (sin (* 2 PI (/ i (/ *sound-srate* hz)))))))

(seqrep (i (length freqList))(cue (one-tone (nth i freqList))))

Please post a short sample in WAV format. Just a few seconds will be fine, so long as it clearly shows the problem.
See here for how to attach files to forum posts: https://forum.audacityteam.org/t/how-to-attach-files-to-forum-posts/24026/1

Where does “freqOfTone” come from? Are you iterating over frequency values in “freqList” ?
If so, are you simply concatenating the sounds from each “frequencyByteArray”?
If so, are you expecting a click each time you start a new array? Even if each frequencyByteArray ends at zero there will be a glitch as the next tone starts at zero. If a tone does not end at zero there will be a very significant click when the next tone starts.

Please give some example numbers for “freqList” and “lengthSeconds”.

Hi, Steve. Thanks for the help on such limited information. Yes, freqList is a list of frequencies, such as:

[0] 7.83
[1] 62
[2] 9.83

lengthSeconds is the total length in seconds of the resulting audio file. We’ve been shooting for 1020, which results in a file exactly 17 minutes long.

Yes, we are concatenating all of the sounds from each frequencyByteArray. Minor clicks between frequencies are alright. This isn’t for a media application really; the idea is that people listen to these frequencies to ‘tune their health’ or something, so it really is meant to just sound like a bunch of frequencies slapped together.

I’ve attached two samples of the same frequency, one as it should sound and one “distorted.”

Oh, and as for your code translation, it’s not in a format I am terribly familiar with, but it all looks right to me except that, crucially, truncate (which, if I understand, simply removes the decimal data and leaves the whole number) is a ceiling function in ours, which rounds up to the next whole number every single time.

That is very curious “distortion”. It’s not at all random, but has a very clear cyclical pattern that can be clearly seen in the Spectrogram view (http://manual.audacityteam.org/man/spectrogram_view.html)
Set the “Window Size” to a small value (say: 128) to see the pattern.
My first guess is that you have a coding error that is reversing the low and high bytes.
I need to go for a while - back later.
firsttrack000.png

Does this sound familiar?
It is a sine wave with the low and high bytes reversed as signed 16-bit values.
Check that your code is always using an even number of bytes per tone.

Brilliant. That is pretty definitive. We’re very confused as to how that’s happening, but we’re on the lookout now. I’ll post updates as they come. Thanks so much, Steve.

So I think our problem boils down to this:

A file 17 minutes long (1020 seconds) at 44.1kHz requires 89964000 bytes.
We have 169 frequencies, meaning 532331.3609467456 bytes per tone.
How can we keep that an even number while also keeping our file at exactly 17 minutes?

Any thoughts?

Each 16-bit sample has 2 bytes.
Round your durations to a number of sample - it does not matter if you round up or down or to closest, but it is essential that you have 2 bytes for each sample.
The code that you posted already appears to do that so far as i can tell, so my guess is that the problem occurs when you switch from one tone to the next.

To answer your original question “about what Audacity is doing differently from us”, Audacity works internally with 32-bit floats. Conversion to 16-bit only occurs on mix-down and export (though in my opinion it should only occur on export).

I think we’ve got it now. I can’t post as much detail as I’d like, because the solution ended up being in a part of the code I’d not been shown. It wasn’t our rounding exactly that was the issue, but rather, when the rounding was occurring, the total number of bytes the file was expecting was not adjusted for the rounded value, so the total number of bytes was too great/few. Really could not have done it without you, Steve. A very tired and frustrated engineer (not me) now gets to rest easy.