similar question … Removing words from speech, and maintaining the tone
“envelope follower” is another possibility, but the result would be monotone, (whereas kazoo can vary in pitch) /uploads/default/original/2X/5/591acbd12a1d12dd761a44cc185889b1c63103d5.mp3