You have two problems. The swishing and talking into a wine glass, sure enough, is echo cancellation, sound error management and noise suppression. The actual human voice back there is so far off mic that it’s missing all the crisp, clear bits needed to make out the words.
It’s all MMM, NNN and AAA. No SS and TT.
Even with weapon-grade equalization and boost, all that happened was the trash got much clearer and nothing happened to the voices. The voice quality just never made it into the recording.
That’s what it sounds like when somebody tries to record from the back (wrong) side of a directional microphone. Past being very quiet, the voice clarity becomes muddled. The better microphones come with instructions detailing this effect so you can intelligently place it in a complicated recording.