I’ve just been getting an explanation from another forum too!

I thought that 1 bit per sample meant the value of the wave would increase 1 step up the ‘ladder’ per sample of bitrate, which would slow down any transients dramatically. It was pointed out to me it doesn’t work that way, but a ‘1’ value can move more than one step up (in bit depth terms) at a time.

For simplicity, if we imagine 3 bit unsigned audio data, and we think of the analogue output as “voltage” (which ultimately it is), then 000 represents the lowest voltage, and 111 represents the highest voltage. The ‘ladder’ is: 000, 001, 010, 011, 100, 101, 110, 111, and we could say that an increase of “one” (that’s an increase of 001) represents one step of the ladder. In this example, where we have 3-bit samples, the ‘ladder’ has 8 rungs. However, when converting from digital data to analogue, we don’t actually increase the voltage in steps. The rate at which the voltage rises from one level to the next is limited by a low pass ‘reconstruction’ filter so that the rate of change is never more than half the sample rate.

For DSD, and leaving aside “wide DSD”, we don’t have “samples” in the conventional sense, we just have a stream of 0s and 1s. When a 1 is encountered by the DAC, the output voltage rises, and conversely when a 0 is encountered, the output voltage falls. As with converting PCM to analogue, the rate of change is limited by a low pass filter. A “1” in the stream tells the DAC to increase the output voltage, but not by how much. If the next, and the next, and the next bits are all ones, then the output voltage will continue to rise up to the upper “rail” voltage, and the rate of change is governed by the filter.

Another way of looking at DSD is in purely mechanical terms. If we have a ‘black box’ with a DSD input at one end and a speaker cone at the other, then each “1” represents a little push on the speaker cone, and a “0” is “not a push”. The low pass filter removes the jerkiness of individual pushes so that the speaker moves smoothly at frequencies within the audio range.

Thanks for that, Steve. There are things you mention that I hadn’t picked up on from reading the general blurb available.

I need to read the PDM stuff more carefully. I don’t get why the middle of a group of 1’s would be used as a peak, instead of at the end (where the ‘real’ peak would be). No doubt there’s a mathematical reason.

In the picture you provide of PDM, and your quote: “The red line represents the analogue waveform, that is “high” when the density of “ones” is greatest, and “low” when the density of “ones” is least”.

The high is shown as being at the centre of the high density area, not at the end of the high density area as I would have expected?

OK, I see what you mean, but you’re drawing a rather more strict interpretation of the illustration than I intended. In reality it is a bit more complex, not least because of the low-pass filtering that I mentioned. The low-pass filter, in effect, averages the influence of the individual “bits”. What the illustration shows is that the amplitude of the analogue waveform is higher when the density of “on bits” is high. Due to filter delays and other factors, the DAC will not output high voltage at that exact moment, but some time later.

A more accurate representation would be to have the blue and white “bit stream” on the left going into the ‘black box’ and the red analogue waveform coming out of the black box (the colours are arbitrary).