How do I create and prepare IVR messages with/without music?

Hi Everyone,

I am using Audacity 2.2.2 in Windows 8.1.

I am a voice over talent and would like to ask if someone could explain how to create IVR messages, with or without background music, for a voice over client. That is to say, should each vocal prompt be sent to the client separately as a WAV or MP3 (If so, how much ‘room tone’ should be at the beginning and end)? And how do I add the music? I know how to add music to a voice over in Audacity, but what are the nuts and bolts of doing that for an IVR - do I add music to each vocal prompt (my general understanding/recollection is that background music only plays while someone is on hold)?

I have already read and reread “Tutorial - Making Ringtones and IVR messages” ( but did not find the info I need.

And I have not found a forum post that addresses this matter.

Thanks in advance!


I don’t think there is a “standard” way to create IVR messages, though I believe that they are normally recorded without music, and music is added later if required. The best thing to do is to speak to your client and ask them to give you a brief / specification of what they require.

Hi Steve,

Thank you so much for your reply. I thought I had posted a new topic incorrectly and was worried about breaking the rules, as I find this forum so very special and helpful.

So, then, a potential client requested an IVR with music, but I would basically provide them with “dry” prompts, correct?

Would each prompt be a separate MP3 file (or whatever format that his/her system uses)?

If so, how much silence would I leave at the beginning and end of each prompt? 1 second? 1/2 second?

How does background music get installed in an IVR? I have music that I can use but don’t know how to “prepare” it for the client. Do I just send them the music and they upload it into their IVR?

Thanks again for all your help.

Best regards,


You will need to ask your client about their specifics. I know that in some cases they have very precise specifications for the file format, duration, peak and RMS levels … that are “required” by the hardware / software that they use.