"Change Tempo" is also increasing volume

When I use the “Change Tempo” effect, Audacity is also greatly increasing volume and as a result introducing major clipping.

See the screenshot. Top is the original track, middle is a copy of the top track with the speed doubled via “Change Tempo” (100% change), and the bottom is a clipping analysis. Looking at the top track, it’s obvious that the volume level is much lower, and there is no clipping whatsoever.

I’ve used “Change Tempo” before, and don’t remember ever running into this. Is it possibly a new bug, or maybe I fat fingered another setting that also needs to be changed?

Thanks in advance for any help you can offer!
Audacity screen shot - Volume increase and clipping with Change Tempo effect.png

Which version of Audacity?
What settings are you using?
That audio does not look like a “normal” audio recording. What is it?
Why are you working in 16-bit?
What’s wrong with your sound card?

> Which version of Audacity?

Sorry I neglected to include that info, was going to update it but since I’m a new forum user I had to wait for the post to be approved.

It’s the latest version (2.1.3), running on Windows 10.

> What settings are you using?

There’s an awful lot of settings in Audacity, which one(s) are you referring to?

> That audio does not look like a “normal” audio recording. What is it?
It’s a perfectly normal WAV file, recorded on a Tascam DP32-SD multi-track recorder at 44.1 KHz with a depth of 16 bits, which is CD quality.

It contains a lead guitar track, which was played at half speed. That was due to the guitarist’s inability to play the entire thing fast enough without flubs here and there. So, he decided the best approach was to record at half speed and increase the tempo by 100% later via a DAW, allowing the band to move on. That’s also why you see gaps, since lead guitar is not played throughout the whole song.

That is the entire reason I need to use the “Change Tempo” function in the first place.

> Why are you working in 16-bit?

See above.

> What’s wrong with your sound card?

Not sure what gave you the idea there is a problem with my sound card, but there’s not a thing wrong with it.

Audacity has no problem with playing back audio through it. I’m in the middle of mixing down the song that this track belongs to (along with 21 other tracks) using SONAR Cakewalk DAW software, and everything works just fine there as well.

The settings that I need to set to be able to reproduce the problem that you are describing. We could start with the settings that you are using in the Change Tempo effect.

Perhaps you could post a few seconds of that WAV file (see here for how to attach an audio sample: https://forum.audacityteam.org/t/how-to-post-an-audio-sample/29851/1)

For editing / processing audio it is better to use 32-bit float format. Audacity uses 32-bit float internally, so using 32-bit float tracks avoids unnecessary conversion losses when applying effects / processing the audio. 32-bit float format also has the major advantage that it does not clip at 0 dB, so if you inadvertently stray over 0 dB, it can be corrected by “amplifying” back down below 0 dB with no damage done.

There is no recording input device listed in the device toolbar.

The settings that I need to set to be able to reproduce the problem that you are describing. We could start with the settings that you are using in the Change Tempo effect.

“Change Tempo” settings:

  • “Percent Change” to 100 to double the speed.
  • "Beats per minute left blank.
  • "Length (seconds) moved from 12.05 to 6.02 as I would expect (speed doubled), and that was left untouched.
  • “Use high quality stretching (slow)” box is ticked.

Using these settings, the speed doubling works fine, except for this volume boost and clipping problem.

FYI, I’ve been doing a bit of experimenting and here’s a few additional things I’ve found:

  • If I un-tick the “Use high quality stretching (slow)” box, the speed is doubled with absolutely no change to the volume. But of course, since we’re talking about a music recording here, I want this effect to output the highest quality audio possible, which is why I ticked that box. It’s definitely not a matter of processing power on this DAW computer, as it’s running an Intel Core i7-7700 CPU at 3.6GHz. It only took 17 seconds to process “high quality stretching” of this entire track, which is about 02:30 long. I think that’s pretty quick!

  • I was also able to accomplish the same speed doubling using the “Sliding Time/Pitch Shift” effect by simply entering 100 as the “Initial Tempo Change”. Unfortunately, it also produced what appears to be exactly the same clipped waveform as “Change Tempo”.

  • I thought maybe the length of the track was too much for whatever algorithm the effect uses, so I tried applying it to much smaller sections. Same deal, the issue persists.

Perhaps you could post a few seconds of that WAV file (see here for how to attach an audio sample: > https://forum.audacityteam.org/viewtopi > … 49&t=72887)

  • Sure thing, attached. Curiously, that page instructs: ‘Ensure that “WAV (Microsoft) signed 16 bit PCM” is selected as the file type’. No matter here, since that’s what it is anyway.

For editing / processing audio it is better to use 32-bit float format. Audacity uses 32-bit float internally, so using 32-bit float tracks avoids unnecessary conversion losses when applying effects / processing the audio. 32-bit float format also has the major advantage that it does not clip at 0 dB, so if you inadvertently stray over 0 dB, it can be corrected by “amplifying” back down below 0 dB with no damage done.

Hmm… I guess I’m misunderstanding, but that seems to be a contradictory statement. If 32-bit float has the advantage of not clipping at 0db (I assume similar to a limiter), how could the waveform possibly stray above that? Also, once a track is clipped, reducing the volume will do nothing to actually remove the clipping on any DAW I’ve ever worked with. Yes, the volume will be reduced, but the waveform will still have a crew-cut that is impossible to remove since that data has already been lost.

Actually, most professional recording studios today use 24-bit recording, but point taken. I’m no pro, this is all being done in a home studio (although a very well equipped one).

There is no recording input device listed in the device toolbar.

That’s because the track was recorded on the Tascam, copied to the DAW PC, then imported into Audacity.

All of that aside, I still don’t see why using a 16-bit track would cause this issue, especially since it doesn’t happen at all when high quality is not selected. The original waveform is fine, nowhere close to any sort of clipping. That certainly seems to point to the effect as the cause, and specifically something to do with “Use high quality stretching” because of the results of the experimenting I previously described.

Hmm… I guess I’m misunderstanding, but that seems to be a contradictory statement. If 32-bit float has the advantage of not clipping at 0db (I assume similar to a limiter), how could the waveform possibly stray above that? Also, once a track is clipped, reducing the volume will do nothing to actually remove the clipping on any DAW I’ve ever worked with. Yes, the volume will be reduced, but the waveform will still have a crew-cut that is impossible to remove since that data has already been lost.

In the normal floating-point mode, Audacity is showing you potential clipping. You can do your own experiment… Normalize to 0dB, then amplify by 20dB (allowing “clipping”). Then “amplify” by -20dB and notice that there’s no clipping!

It’s not limiting… I don’t remember exactly, but I think 32-bit can go up to something like +1000dB. For all practical purposes, there is no upper or lower limit with floating-point. With integer formats 0dB is “as high as you can count”, so at 16-bits 0dB is -32,768 or +32,767. In floating-point 0dB is 1.0. Your drivers take care of any scaling, so a 0dB file is equally “loud” in 8-bits, 24-bits, or floating point.

Actually, most professional recording studios today use 24-bit recording, but point taken. I’m no pro, this is all being done in a home studio (although a very well equipped one).

Yes, the analog-to-digital and digital-to-analog converters are 24-bit (integer) but digital signal processing is almost always done in floating-point.

In the normal floating-point mode, Audacity is showing you potential clipping. You can do your own experiment… Normalize to 0dB, then amplify by 20dB (allowing “clipping”). Then “amplify” by -20dB and notice that there’s no clipping!

Thanks for all of the great information Doug, it’s much appreciated! I tried your experiment and I see what you’re talking about, but when using “Change Tempo” it’s a different ball game.

Try applying a 100% speed increase (double) with the “Change Tempo” effect on the track section I uploaded, and ensure the “Use high quality stretching (slow)” box is ticked. You’ll see clipping galore in the output waveform, and the Audacity “Analyze → Find Clipping” function confirms this (see my previous screenshot). Then, try the same thing with that box un-ticked and there is no increase in amplitude at all, and therefore no clipping.

I can positively confirm that true clipping is occurring when the “high quality stretching” box is ticked and then the processed track is imported back into the Tascam recorder. It’s indicating an overload (clipping), and the level meters are completely pinned wherever there is audio. You can also clearly see the same clipping when the output file is imported into SONAR Cakewalk, and I’ve attached a screen shot of the waveform panel from that. Compare track 10 (Strat Fill) to track 11 (Strat Lead 1), both of which are vertically zoomed in the display. Track 11 is the output from Audacity after the “Change Tempo” effect is applied, but only if the “Use high quality stretching (slow)” box is ticked.

With the original track, of course there is absolutely no clipping, as I try to keep things no higher than -9 to -12db when recording. That leaves me plenty of headroom for mastering the mixed track.

In conclusion, here’s a summary of what I’ve found.

“Use high quality stretching (slow)” box ticked:

  • Speed is doubled, but with a large increase in waveform level, resulting in clipping (the original problem reported in this thread).
  • Processing takes about 17 seconds on a 02:30 track.
  • Output file size is roughly half of the original, with an identical bit rate. That makes sense since the track is now doubled in speed, and therefore half the length of what it was originally.

“Use high quality stretching (slow)” box un-ticked:

  • Speed is doubled with absolutely no change to waveform level.
  • Processing time is nearly instantaneous. Keep in mind the DAW computer has an extremely fast processor, so YMMV.
  • Identical properties in the output regarding file size and bit rate as with the box ticked.

My ears can’t detect any difference in the quality of sound with or without high quality stretching, listening using excellent headphones and studio monitors. That of of course isn’t a conclusive way of checking quality, but it does help confirm the observations above. Using high quality stretching doesn’t seem to accomplish anything at all, except for causing the volume increase/clipping issue. So, I’m going with the output file with that box un-ticked.

Bottom line is, I still think someone should still have a look at the “Use high quality stretching (slow)” option, as it doesn’t make any sense that increasing tempo should increase the waveform level as well. That in mind, there definitely appears to be some sort of bug there.

Thanks once again all!

The increase in peak level is not a ‘fault’, but rather just a side effect of the time stretch algorithm on that particular (heavily clipped) waveform.

Looking at a close up of the guitar waveform, notice how the peaks are flattened horizontally by the distortion effect on the guitar sound:
firsttrack002.png
In this next image, the second track is a duplicate of the first, and then having an all-pass filter applied.
To try this yourself, you can use this code in the Nyquist Prompt effect:

;version 4
(allpass2 *track* 1000 1)

tracks002.png
An all-pass filter passes all frequencies, but changes the phase relationship depending on frequency.
The effect is particularly marked on square waved (and clipped waveforms):
tracks001.png
The “High Quality” setting for time stretching has some advantages over the “standard” setting. Namely, the amount of stretch is much more accurate (when changing pitch, the duration remains the same, whereas the “standard” setting will usually change length a bit), and for percussive sounds, the standard setting tends to sound echoey, which is not a problem with the High Quality setting.

On the other hand, with some types of sound, the standard setting can sound just as good or even better than the “high quality” setting. The"standard" setting is also very much faster. The standard setting tends to sound better when large amounts of stretch are required.

The two algorithms (“standard” and “high quality”) each have their strengths and weaknesses, which is why both algorithms are available - use whichever works best for the job in hand.

Thanks for taking the time to document all of that good information, much appreciated!

A couple of points though…

The increase in peak level is not a ‘fault’, but rather just a side effect of the time stretch algorithm on that particular (heavily clipped) waveform.

As I’m sure you’ll recall from the screenshot contained in my original post (Fri Sep 29, 2017 12:16 pm), the input waveform had no clipping whatsoever and plenty of headroom, and I confirmed this in several ways. So, I assume you’re referring to the “Change Tempo” high quality output waveform as the one that is “heavily clipped”, and that’s one heck of a side effect. The Audacity “Analyze → Find Clipping” function confirms this, in the same screenshot. I should have included the same Audacity clipping analysis on the original waveform in that screenshot as well, but I did indeed perform one and Audacity reported absolutely no clipping.

The standard setting tends to sound better when large amounts of stretch are required.

Yes, because the output isn’t getting clipped, which naturally causes distortion.

As I had previously mentioned, I tried the high quality setting on both long and very short clips, and the results were identical (large increase in level and associated clipping introduced). Without the high quality setting, neither long or short clips showed any notable level change in the waveform. So, I can only draw the conclusion that the length of the clip has nothing to do with the issue as a result of this experimentation.

When I have both my recorder and SONAR Cakewalk DAW (as well as Audacity) indicating heavy clipping on “Change Tempo” output and none on the original, it’s pretty obvious to me that this problem is being introduced by the effect, at least on this particular clip. I’m going to try the effect on some of the other tracks in the song when I get a chance just for the heck of it, to see what happens. Both the recorder and Cakewalk confirm that the original waveform was recorded somewhere between -3db and -6db, never approaching zero.

I disagree. The input waveform has much clipping, but it is intentional clipping - an important part of the “over-driven” electric guitar sound. This is a close-up of the input waveform that you provided, and the “clipping” (flattened peaks) is very clearly visible:

This is a very late response, but I replicated the behavior reported by CraigG58 that “Change Tempo” is increasing my volume. In my case I was decreasing the tempo by 6.1% and my peak volume increased roughly 2.5 DB - which was causing clipping (my original 24 bit 88.2K recording was mixed very “hot”).

The good news is that I also replicated the behavior reported by DVDdoug - in that I could amplify by -3 DB and the clipping disappeared - my nice “rounded” waveform was restored. Those extra 8 bits are quite handy. :slight_smile:

My previous post was overly optimistic. The increased amplitude caused by change tempo occurs mostly on peaks - so if I amplify the final result by -3 db the total sound is significantly lower. :frowning:

Apart from the obvious such as doing some peak limiting, does anyone have a better idea?

Well, my previous post was overly optimistic. It turns out that most of increasing volume - at least on my project - occurred on peaks. I.e., existing peaks in the original were accentuated. So simply reducing the gain by -3 DB has the effect of noticeably reducing the overall volume. Not good. :angry:

Apart from the obvious - such as some peak limiting - does anyone have a better idea?

To answer the obvious question - the instrumentation is Viola & Acoustic Piano

The clue is here:
my original … recording was mixed very “hot”

The only way to get a really loud (“hot”) recording, is to heavily limit the peaks.
If you process audio that is heavily limited, with any effect that has a variable phase delay, then the peak level will increase.
The only way to restore the original loudness while avoiding clipping, is to limit the peaks again.

In short, if you wish to apply an effect that has variable phase delay (such as SBSMS time stretching), to a very “hot” (high level and heavily compressed / limited) audio track, then the way to do it is:

  1. Ensure that the track is in 32-bit float format.
  2. Apply the effect (in this example “Change Tempo” with “high quality” enabled).
  3. Apply a limiter to limit peaks to 0 dB or a tiny bit lower.

Apologies if this response comes off as a bit hostile, but I don’t understand your response. Here was my question:

So clearly I did not need any instructions on how to implement peak limiting - and in fact that is exactly what I have done. But perhaps you were attempting to speak to a wider audience - perhaps there are folks who are not familiar with this technology. Or maybe you just wanted to emphasize that this is the only possible approach. If that is the case then I apologize for any perceived hostility.

All that aside, I have not made myself clear. As the OP (and I) have complained - a well designed audio effect algorithm should only do what it is advertised to do - and nothing more. Imagine (if you will) that a Peak Limiter unknowingly boosted certain frequencies by 3 dB. That would be highly undesirable. Yes, you could insert a filter after the fact to undo that - but unless you knew ahead of time that this was going to happen you would be hard pressed to figure out why your recording did not sound like it should.

Now it may be the case that the increased volume is an inherent property of SBSMS, but I could not find this documented anywhere. If you are aware of any such documentation, I would appreciate you pointing me in the right direction.

Getting back to my situation, the fact that my recording was mixed very hot was actually a good thing - since the peaks alerted me to the fact that that something else was going on besides the time stretching. If my recording had been mixed at a lower level I likely would not have noticed anything until mastering - at which point it would have been a mystery where the additional peaks came from.

a well designed audio effect algorithm should only do what it is advertised to do - and nothing more.

Changing tempo without changing pitch (and vice-versa) is a very complex process that’s imperfect and it requires trade-offs. And, it’s not perfectly-reversible.

It’s the only approach if you want to use SBSMS time stretch (the “high quality” algorithm) with audio that is heavily limited and you want to maintain the same loudness. It’s the combination of all three things that make it necessary:

  1. an effect that has a variable phase delay
  2. heavily limited audio (or any “flat top” audio, such as square waves)
  3. maximum loudness

Conversely:

  1. If maximum loudness isn’t a condition, then no need to limit, just normalize below 0 dB.
  2. If the audio isn’t heavily compressed, then the problem is unlikely to occur.
  3. If you use the “Sound Touch” time stretch algorithm (“high quality” check box not selected), then the problem is unlikely to occur.

SBSMS does not “increase volume”. The “loudness” (ie “volume”) of the output is pretty close to the loudness of the input. The “limitation” is due to variable “phase delay”. This limitation is common to many effects, notably most IIR filters (such as Audacity’s “Low Pass” and “High Pass” filters.

As I illustrated in this post, if there is a frequency dependent phase shift, then although the “sound” may be the same, the “shape” of the waveform will change.

In the special case of “flat top” waveforms, the phase of different frequencies are aligned to produce minimum “amplitude”, so if there is a frequency dependent phase shift, then the peak amplitude will increase.


I agree, and that is why the Change Pitch / Tempo effects do NOT apply limiting after SBSMS time stretch. If you want the processed audio to be peak limited, then you can do that manually after applying the time stretch, but Change Pitch/Tempo does not assume that you want that.

Steve - Thank you for your timely response. I understand and agree with almost everything you said.

That said, I still haven’t made myself clear. My issue is with the documentation. If an effect that is labeled to alter X (the time domain) also significantly alters Y (peak level), then that should be documented.

I have put in a request for a documentation enhancement to the SoundTouch libraries. One additional sentence in the description of the SBSMS algorithm would have saved me several hours of time!