left and right are reduced by a mere 6 dB.
I was admiring the flowers and clouds and semi-thinking about this. 6dB is a piffle compared to the full range of sounds and production, but it is half.
Half the voltage in the presentation signal just vanished. Could you not use that as the cue for further processing? Sense the parts of the show that dip 6dB and assume they are the ones that need further processing?
Or is that what you're already doing and I'm late to the ball.
Indeed, I'm just doing that:
The audio is first transformed into side (L-R) and mid (L+R).
In the Short Fourier transform, the side is divided by the mid signal (magnitudes)--for all of the 4096 frequency bands.
The resulting array of values is then remapped to make a weighting for the mid channel, which is multiplied by those, still bin by bin.
The reconstructed mid channel is now indeed the center with a magnitude proportional to the distance from the center-panned position.
Subtracting the center from L and R respectively gives the stereo audio without center.