there are issues with the current zoomed-out waveform view, i.e., the waveform view that doesn't show individual samples. It's a kind of two-layer "monochrome" rendering (pixel on/pixel off) with two kinds of information about the samples in the time frame of the pixel column: the maximum peak and the RMS ("root mean square", a kind of average).
This waveform view can be rendered better in my opinion, i.e., with shades. The outer shape of the waveform would stay the same and the shades within the waveform would provide more information than before--potentially (and in my opinion indeed) more useful information. The RMS value now visible would be somehow contained in a newly rendered pixel column. In my understanding the RMS value is "just a kind of representation" and doesn't contain exact information about frequencies or the loudness (there might be frequencies in it too low for the human ear). An important part of a waveform is to provide a shape for orientation and hints to the acoustical characteristics. So in my opinion the RMS value isn't something to desperately keep.
I'm no professional on this topic. The proposal of the algorithm I make might have to be refined with respect to edge cases. I expect that this can be done by the developers with a sufficient technical base provided here. The good news is that the algorithm is relatively easy to implement--not harder than the current one I think.
Let me try to explain how I suggest to render a pixel column. Please see the image "Algorithm" (you might be able to view it better by dragging it in a seperate browser tab). See also this explanation:
Code: Select all
DESCRIPTION OF THE CALCULATION OF THE VALUE ASSOCIATED
WITH EVERY PIXEL (UNIT: SAMPLES; SEE ALSO IMAGE)
x x x 3
/ \ / \ / \
x x x x x 8 = 3 + 5
/ \ / \
x x x 11 = 8 + 3
/ \
x x 13 = 11 + 2
---/-------------------------\----------------------------------- Average sample value
x x x x x x 18 = 12 + 6
\ / \ / \ / \ /
x x x x x x x x 12 = 4 + 8
\ / \ / \ / \ /
x x x x 4
Number of samples: 31. Each value associated with a pixel is divided by the
number of samples. Each value of 0 leads to full transparency, otherwise the
minimum visibility of the waveform would be made to nonsense.
- There are these concepts:
- Value associated with a pixel: This value is a sample count that is later divided by the number of samples of the pixel column to get the intensity factor of a pixel.
- Maximum value: When the intensity value for a pixel is calculated, experience shows that this results in the darkest value being gray. This is because of the alternating character of audio waves. The value--which has a maximum possible value of 1.0--has to be stretched according to a maximum value. Example: With a maximum value of 0.5 a value of 0.4 becomes 0.8. The attached images apply a maximum value of 0.5. See also the image "Electric guitar, 1000 samples per pixel column, without applying maximum value".
- Minimum visibility/opacity/intensity: If you would have a minimum visibility of 0 %, the waveform would ease out into the background without giving information about the peak. So there has to be a minimum opacity over the background to be able to see peaks. The attached images apply a minimum intensity of 0.15, if not stated otherwise. See also the images "Electric guitar, 1000 samples per pixel column, minimum visibility of 0.0" and similar.
Conclusions:
- Most peak pixels are light gray. If someone has a mathematical/algorithmical alternative, let me know. For most waveforms shown there isn't anything dissatisfying to me regarding this matter. Personally I can easily think of it as related to the data since high peaks are not present the most in most files.
- See this image to prove that the data is correctly represented: "1000 Hz sine wave with 1000 Hz square wave, 500 samples per pixel column".
- See this image to prove that higher peaks should be counted to lower peaks (the image doesn't do this): "Electric guitar, 1000 samples per pixel column, pixels without higher peaks".
- You might see the DC offset of a subrange better. See image "Female voice singing, 200 samples per pixel column".
- Dark pixels might reach a peak pixel. See this image for an explanation of this: "Subwaves on high peaks of low waves". The flute waveform shows the difference. Dark parts are high and shrill. Parts with a little amount of dark are low.
- Please view waveforms in Audacity to see the same effect if you ask yourself about the alternating character of waveforms like: "Male speech, 200 samples per pixel column".
- As to my experience the current logarithmical waveform view removes too much shape information to still provide proper orientation hints. Otherwise I would use it more often. I tried to show a logarithmical scaling of the samples, but it doesn't look like Audacity's shape. I'm unsure whether I'm doing it correctly since I'm just handling insentity values in the range from 0.0 to 1.0. This is the formula (C#):
- sample = Math.Log(1 + Math.Abs(sample) * (logarithmBase/*10*/ - 1), logarithmBase) * Math.Sign(sample);
- When having a nearly unrecognizable waveform of a song amplified to the maximum you might be better able to see the beats. See the image: "Trance music with strong rhythm, 500 samples per pixel column".
- You are better able to identify areas of a song with alternating loud and silent parts. See image: "Orchestral music with strings, 5000 samples per pixel column".
Note: There has been some discussion before about this topic, but mixed with other topics and not as thoroughly described as here. You can find the old discussion here if, e.g., you want to learn more about the future of the zoomed-in waveform view.
Please share your opinion with some details in case it's not already been said. Please share also the technical insight you might have that would improve the concept, if you're as well interested in Audacity having this feature.
See also my second post as an addition to this one.
You're able to generate waveform images with the demo converter program attached. It's a command line programm, that might be best to use with a batch file.