The reverb from the room is extending the sibilance.
Which itself is odd because room resonances and echoes usually make a voice muddy and less distinct.
That’s not to say you can’t have both. Sibilance happens up around 8000Hz and intelligence tends to cluster around 3000Hz. So you can have essy, painful speech that nobody can understand. Everybody wins!
And just to cover it, that damage can pass ACX Check. Technically perfect trash. That’s why ACX has Human Quality Control.
I ran across this recently. This is a very successful home studio. Note the furniture moving blankets behind the performer to cut down room noise and echoes.
Also see:
And
Koz