I recorded a podast where this is JUST ME,
I mean, just my voice recorded with a dynamic microphone (to cut background noise).
Question 1: Can you confirm there is no need for stereo in such a case? Mono seems to be less heavy than stereo for the same quality for just a podcast. Please note that I am not trying to play speaking to the right or left ear in this podcast. I just want a very clear radio effect voice.
Question 2: What is the best bitrate not to lose quality and have the file being not so heavy?
I tested all bitrate and 110kbps seems very clean, but I read everything and its opposite online on this topic. Some recommend around 192kbps minimum.
I would like the best voice for minimum weight, obviously
As an example, these are the results of my test for 10 min podcast:
65kbps: 3656 Ko
80kbps: 3986 Ko
95kbps: 4707 Ko
110kbps: 5583 Ko
145kbps: 6670 Ko
155kbps: 8118 Ko
AT 110kbps, for 40 min, it weighs already 21609 Ko
Can you give us a link to what you’re aiming for ?, (e.g. your competitor’s podcasts),
then we can advise on the the minimum requirements for sample-rate and bit-rate to achieve that result.
( e.g. High quality music requires higher data-rates than for speech).
It’s very common for voice only podcasts to be mono.
64 kbps is generally considered OK for mono voice only podcasts. If you find that the 64 kbps doesn’t sound good enough with your voice, then you could go a bit high. It’s unusual (and probably excessive) to go higher than 128 kbps for a mono, voice only podcast. If you really want to keep the file size down to a minimum, then you may be able to go as low as 32 kbps, but probably no lower than that. At the end of the day, you need to decide what file size / quality trade-off you are willing to make.
Variable bit-rate will often give slightly better sound quality for a given file size. However, Constant bit-rate has better compatibility with other software, so podcasts are nearly always CBR.
What is the best bitrate not to lose quality and have the file being not so heavy?
You should be careful to save all your original performances in perfect quality WAV uncompressed files. That will be your backup if anything happens to your Project files before you post the podcast. Do all your editing, filtering, effects and production in high quality and later, when everything is perfect, then create the compressed, small sound files and post them. You can’t edit or correct a tiny, compressed sound file without increasing the sound distortion, but you can correct WAV files or Audacity Projects.
Same problem if you decide to combine two or more shows into one compilation or use one show’s pieces to make another show. Use the WAV or Audacity Project to make the compilation, not the small posted files.
I’ve recently been working on a solo podcast and arrived at: MP3, Mono, 48kHz sampling, 96kbps constant bitrate.
I chose 96kbps as the podcast has some intro and outro music. You could use 44.1kHz sampling but I use ffmpeg external to audacity to set loudness at -16LUFS with a -1LU True Peak. This processing upsamples to 192kHz and using 48kHz (as an integer sub-multiple) makes for easier conversions.
This produces a file size of 1.5MiB for 2 minutes of material - so 750KB per minute. This I find acceptable for a 15 minute podcast having a file size of 11.25MiB.
While that certainly sounds logical, it is actually no more or less “difficult” whether it is an integer sub-multiple or not. The resampling calculations inside the computer will be done using floating point athematic, regardless of whether it is an exact integer multiple or not.
(If you’re interested, you can view Audacity’s resampling code here: https://github.com/audacity/audacity/tree/master/lib-src/libsoxr/src)
Doesn’t your statement make assumptions about the sampling/interpolation method used? By adopting the sampling frequency of 48kHz when upsampling to 192kHz there is scope for optimization (it’s a simple x4 and /4 and also a power of two). Granted, a generalised algorithm may very well be used, in which case there’s no advantage, but even in that case my choice of sample frequency won’t break things.
Yes and no. It assumes that a high quality resampling algorithm is being used.
The over-simplified idea is that 96 kHz PCM can be converted to 48 kHz PCM simply by dropping every other sample. Indeed you “could” do that, but that’s not the way that modern resampling algorithms would do it. The reason that it’s not done like that is because doing so will cause “aliasing distortion”. For good quality “down-sampling” it is essential that the waveform is band limited to less than half the new sample rate. Typically the band limiting is performed with some kind of “sinc filter”.
Sure. Using 48 kHz is not going to break anything. MP3 quality may be fractionally worse when using 48 kHz because the encoder may try a little harder to maintain high frequencies that are not really important for speech, but any subjective difference is likely to be extremely small.
The only reason that I mention it is because it is a common misconception that it is better to stick to integer sub-multiples.
It’s good to discuss these things. I should add that my background is in hardware engineering and looking for possible optimisations when designing hardware is advantageous. In the realm of software it’s a flexibility vs. performance perspective. In the hardware world cost and performance are generally the driving factors.
Just for fun, I’ve got an analogy from mechanical engineering:
Let’s say that you have a milling machine set up to make a gear with 200 teeth.
The milling machine cuts the first notch with a rotary cutter, then the gear is rotated by 1.8 degrees and the next notch is cut, and so on until there are 200 notches, and 200 teeth.
Then you are told to make a gear with less teeth.
Someone may naively think that it will be easier to make this new gear with 100 teeth, than say 90 teeth, because “you only need to skip every other notch”.
In fact, it is no easier to set up for 100 teeth than for 90 teeth, because the amount of rotation is only one small part of the job. You still need to change the rotary cutter to get the correct notch shape, and you still need to adjust the amount of rotation to match the number of teeth (in the case of 100 teeth, the rotation being 3.6 degrees, and for 90 teeth, 4 degrees).
The idea that “100 teeth will be easier” is based on an over-simplified idea about how the gear is made.
Similarly, the idea that sample rate conversion from 96 kHz to 48 kHz is “easier” than converting from 96 kHz to 44.1 kHz, is based on an over-simplified idea about how resampling is done.
Coming back to resampling, the idea that converting 96 kHz to 48 kHz is “easier and therefore better” than converting 96 kHz to 44.1 kHz, would suggest that converting from 88.2 kHz to 44.1 kHz is “easier and therefore better” than converting 96 kHz to 44.1 kHz. This is something that we can test fairly easily in Audacity, by generating test sounds into tracks with the specified sample rate, and then resampling to 44.1 kHz and analyzing.
Testing should demonstrate that whether the initial sample rate was 96 kHz or 88.2 kHz, the conversion to 44.1 kHz is virtually perfect for audio frequencies up to about 20 kHz, and above 20 kHz the frequencies are progressively cut to zero. We can also do timed tests with very long tracks, and see that there is very little difference in processing time (one might expect that converting from 96 kHz will be a bit faster because there are more samples to convert, but in practice, the time taken to write the 44.1 kHz samples to disk is the dominant factor, so there is in fact, virtually no observable difference).
Here’s the results of an experiment I performed to time an ffmpeg loudnorm operation involving upsampling to 192k of audio at 44.1kHz and 48kHz sampling with final conversion to the same sampling rates. It is a two pass process with the first pass gathering measurements which are used in the second pass to perform the loudness normalization.
The audio file was an 18 second clip initially recorded at 44.1kHz and a resampled copy at 48kHz was made. Both files were 32-bit float WAV format and the 48kHz sampled audio file was larger.
44.1kHz Pass 1 timings 48kHz Pass 1 timings
real 0.883s real 0.553s
user 0.507s user 0.521s
sys 0.040s sys 0.033s
total 1.43s total 1.107s difference -0.323s
44.1kHz Pass 2 timings 48kHz Pass 2 timings
real 0.615s real 0.607s
user 0.579s user 0.562s
sys 0.032s sys 0.045s
total 1.226s total 1.214s difference -0.012s
It’s interesting to note that the operations involving 48kHz sampled input and output files did execute faster. The first pass (analysis) showing quite a significant reduction in total time. The second pass (processing) difference isn’t so great. It should also be noted that both passes involving the 48kHz sampled audio are processing a larger file.
The basis for the experiment was the two ffmpeg commands documented thus (repeated, with adaptations for 48kHz):
LUFS mastering to EBU-R128 with ffmpeg
This is the procedure to master a 44.1kHz Mono WAV track for -16 LUFS with a -1 LUFS True Peak.
ffmpeg -hide_banner -i infile.wav -af loudnorm=I=-16:LRA=7:TP=-1:dual_mono=true:print_format=summary -f null -
We will use the Input_* and Offset values in Pass 2.
ffmpeg -hide_banner -i infile.wav -af loudnorm=I=-16:LRA=7:TP=-1:measured_I=**:measured_LRA=**:measured_TP=**:measured_thresh=**:offset=##:dual_mono=true:print_format=summary -ar 44.1k outfile.wav
** = taken from Input_* section of Pass 1
## = taken from Output_* section of Pass 1
So it does, or rather"did" (now updated). I do run macOS (high Sierra), but may main machine, which I did these tests on, is Xubuntu 18.04 64-bit.
I think my results are more “expected” (10% longer to process 10% more data), so perhaps the question should be, why do your results appear to be contrary to mine
If you only timed the run once on each file, perhaps it’s just that you got a particularly fast run on the 48 kHz file and a particularly slow run on the 44.1kHz file.
I presume that your test files were identical other than for the sample rate, and both were on the same disk partition?
Have you considered using the Opus codec (assuming you have the option to use Opus on your podcasting platform)? Its quality is far better than MP3 at the same bitrate (or the same quality at a much lower bitrate), and it’s competitive with (or better than) the best codecs out there (such as AAC). It’s also an IETF standard and is fairly well supported by most systems these days; even my 5-year-old Android phone supports it, at least in Chrome and VLC.
Based on https://wiki.hydrogenaud.io/index.php?title=Opus#Speech_encoding_quality and on my own limited experience with Opus, 24 to 32 kbps can give you “nearly transparent” to “transparent” speech quality. Since your podcast has incidental music, you may want to go with something like 40 to 48 kbps, which is about half the bitrate as what you picked for MP3.
Opus is a terrific compressed audio format, but unfortunately it is not supported “out of the box” by Windows Media Player, iTunes, Groove music app or Spotify, which are some of the most widely used computer audio players. It’s hard for a new podcast to gain a substantial audience, and sadly, using Opus format will make it even more difficult. For this reason, I wouldn’t recommend using Opus as the only format option, though it could be used as a secondary option in addition to CBR MP3.
Yeah, when it comes to compatibility, MP3 is the king of compressed formats. If storage space (on the server) is at a premium, then I’d go with only MP3. On the other hand, if bandwidth (for the server and/or users) is at a premium, I’d go with both Opus and MP3: Opus for the users who have a device that supports it, and MP3 for the rest.
Of course, a lot of this depends on the podcasting platform, but I’d just love to see greater adoption of Opus.