This “test” is a waste of time. Any voices present are so deeply buried below other sounds that they are inaudible. If you have money to waste you could hire the services of an audio forensics professional.
IMO there’s no speech whatsoever on that. The deep noise is the phone’s vibrate buzzer noise.
Noises processed by a phone codec designed for speech end up sounding like speech, even though no speech is there,
then the brain misinterepets this as speech, aka audio-pareidolia , aka Rorschach-audio.
The giveaway sign that it’s pareidolia is that there are not the typical-length sentences: just occasional words/phrases.
If you pitch-shift the vibrate buzzer noise it does sound like a muffled voice repeatedly saying “please”,
but no people are involved, its just a mechanical noise …