how to record a phone interview in 2 tracks?

I’m running Kubuntu 12.04 LTS amd64 and Audacity 2.0.0 (Unicode).

I want to use Audacity to record audio for podcasting. The podcasts will be interviews conducted via voice-over-IP (SIP) telephone.

I want the caller on left channel and the local voice on right channel (or vice-versa) of a single 2-channel audio file.

My audio source hardware is listed at bottom. From experimentation, I believe the two sources I want to record at the same time are these:

  1. alsa_output.usb-Focusrite_Scarlett_2i2_USB-00-USB.analog-stereo.monitor/#2: Monitor of Scarlett 2i2 USB Analog Stereo
  2. alsa_input.usb-Focusrite_Scarlett_2i2_USB-00-USB.analog-stereo/#3: Scarlett 2i2 USB Analog Stereo

Here’s my hardware info:

$ pacmd list-cards
Welcome to PulseAudio! Use “help” for usage information.

3 card(s) available.
[snip other cards]
index: 2
name: <alsa_card.usb-Focusrite_Scarlett_2i2_USB-00-USB>
driver: <module-alsa-card.c>
owner module: 6
alsa.card = “1”
alsa.card_name = “Scarlett 2i2 USB”
alsa.long_card_name = “Focusrite Scarlett 2i2 USB at usb-0000:04:00.0-2, high speed”
alsa.driver_name = “snd_usb_audio”
device.bus_path = “pci-0000:04:00.0-usb-0:2:1.0”
sysfs.path = “/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/usb3/3-2/3-2:1.0/sound/card1” = “usb-Focusrite_Scarlett_2i2_USB-00-USB”
device.bus = “usb” = “1235” = “Novation EMS” = “8006” = “Scarlett 2i2 USB”
device.serial = “Focusrite_Scarlett_2i2_USB”
device.string = “1”
device.description = “Scarlett 2i2 USB”
module-udev-detect.discovered = “1”
device.icon_name = “audio-card-usb”
output:analog-stereo: Analog Stereo Output (priority 6000)
output:analog-stereo+input:analog-stereo: Analog Stereo Duplex (priority 6060)
output:analog-stereo+input:iec958-stereo: Analog Stereo Output + Digital Stereo (IEC958) Input (priority 6055)
output:iec958-stereo: Digital Stereo (IEC958) Output (priority 5500)
output:iec958-stereo+input:analog-stereo: Digital Stereo (IEC958) Output + Analog Stereo Input (priority 5560)
output:iec958-stereo+input:iec958-stereo: Digital Stereo Duplex (IEC958) (priority 5555)
input:analog-stereo: Analog Stereo Input (priority 60)
input:iec958-stereo: Digital Stereo (IEC958) Input (priority 55)
off: Off (priority 0)
active profile: output:analog-stereo+input:analog-stereo
alsa_output.usb-Focusrite_Scarlett_2i2_USB-00-USB.analog-stereo/#1: Scarlett 2i2 USB Analog Stereo
alsa_output.usb-Focusrite_Scarlett_2i2_USB-00-USB.analog-stereo.monitor/#2: Monitor of Scarlett 2i2 USB Analog Stereo
alsa_input.usb-Focusrite_Scarlett_2i2_USB-00-USB.analog-stereo/#3: Scarlett 2i2 USB Analog Stereo
analog-output: Analog Output (priority 9900, available: unknown)

analog-input: Analog Input (priority 10000, available: unknown)

iec958-stereo-input: iec958-stereo-input (priority 0, available: unknown)

iec958-stereo-output: Digital Output (S/PDIF) (priority 0, available: unknown)

You have a SIP phone. There’s no guarantee the person at the other end will have one.

I know the spells for analog land-line and Skype, but not SIP. We have a corporate SIP service at work and the best we were able to do is set up a conference and record that, but that puts both parties in the same sound file. Two people in a “conference” is a valid condition.

Does your SIP service have sidetone? Landline telephones send a little bit of your own voice back to you as a confirmation that the system is working. That prevents the “talking into a dead piece of wood” problem. Our system at work has sidetone because our handsets sound just like a regular phone – given a little bit of delay. That can mess up recordings because it’s more difficult to get isolation between the two directions.

It’s not just choosing the right sound drivers or modules (that I know of). Recording a telephone sound conversation is not for the easily frightened.


To get started, what I really need to know is simply how to create a 2-track recording from the two sources I listed (or any similar sources) where one is a normal source and the other is a monitor.

I have been googling around and it seems like the program Jack might be one way to go… Another possibility may be to edit /etc/pulse/ and create a null sink and two loopbacks. I have zero experience with either option. I would appreciate knowing “how to” do it.

Once I can actually do it, I’ll find out what issues I encounter next (such as “sidetones” or other quality issues).

I do not want to use Skype, a Skype recorder, or a conference call solution. Thanks.

You just talked yourself into waiting for one of the other elves. Good luck, although I’m pretty sure you can’t split Left and Right to originate from two different devices.


Do you mean Audacity can’t do this? Other software can do it. For example, you mentioned the Skype call recorder (which I don’t want to use). It can split the sound so the microphone goes to right channel and the incoming voice goes to the left channel. That’s exactly what I want to achieve (with Audacity or similar open source tool).

I know of a way to do it with hardware and a landline, but I also want to do it with VoIP and a SIP softphone (Twinkle).

I also have a hunch that either avconv or another Linux CLI tool could do it if I knew how to use the tool sufficiently well. Here’s a post I started about that possible approach to solving this: alsa - audio recording - record two sources simultaneously, merge into a single 2-track recording - Unix & Linux Stack Exchange

I asked here because I was hoping that Audacity would be an easier solution for me, but maybe learning enough to do it with avconv will be easy enough too.

Audacity can only record from one input device at a time. Can you attach a cable between the output and input of the Scarlett so that the output is sent to the second input of the Scarlett, then monitor from its headphones output?


I do not believe that would accomplish my goals (see OP). Thanks.

I saw:

I want the caller on left channel and the local voice on right channel (or vice-versa) of a single 2-channel audio file.

Hence my suggestion, assuming the input/output are as you identified. Or do you mean you are recording the podcast with two mics?