DTMF Decoder plugin

Hi guys,

Im playing around with creating a dtmf decoder plugin.

Starting with a silence detector, I work out the start and end of the various tones (in terms of the sample number). My goal is at this point to perform a FFT of the sound between the start and end point to work out if its a DTMF tone.

I have found examples where an FFT is performed on the selection/whole file but Im stuck trying to perform the FFT on a sub set of the selection/file. Is this possible?

Thanks in advance.


Why do you want to use FFT?
It will become rather complicated.
I would use the zero crossing rate as a pitch estimator.
Divide the signal into two parts, one below and one above ~1100 Hz.
The two Zcr (with a silence threshold applied) gives a low-sample rate signal (e.g. 2205 Hz).
You can gather the values and search in an association list for the corresponding number or character.

Alternatively you could use a filter bank in a similar way to how DTMF tones were originally decoded, for example along the lines of:

(defun filter (sig n)
  (setf hzlist
    (case n
      (1 (list 697 1209))
      (2 (list 697 1336))
      (3 (list 697 1477))
      (4 (list 770 1209))
      (5 (list 770 1336))
      (6 (list 770 1477))
      (7 (list 852 1209))
      (8 (list 852 1336))
      (9 (list 852 1477))
      (0 (list 941 1336))))
    (rms (bp sig (first hzlist)))
    (rms (bp sig (second hzlist)))))

(defun bp (sig hz)
  (dotimes (i 8 sig)
    (setf sig
      (mult 2.3 ; make up for lost gain 
          (highpass8 sig hz)

(setq num 1)  ; the number being tested

;; filter signal and make distinct pulse
  (mult 10 
    (s-max 0 (sum -0.3 (filter s num))))

Yes, originally, the decoding is made with gotzels, i.e the eight frequencies are tested for their presence.
I’ve chosen the ZCR because it does only need two tests (or even one) and is not dependant on the amplitude (but the silence detector is).

Here’s the realization of the decoder with the mentioned concept.
rjh-dtmfdec.ny (4.77 KB)
The main screen shows the 16 standard characters
Space stands for silence.
The detection will fail for very fast sequences or such with extreme tone/silence ratios.
The debug screen can display the possible letter replacements for 2-9.
It would be nice if probabilistic ordering were applied.

That’s a little tricky because syllables can range from “V” type to “CCCVCCCC” type.
Also, there are no spaces between words.

Thanks guys. Im new to this stuff, so thats why FFT came up as the logical means for getting the frequencies for DTMF decoding. Never occurred to me to use any other method!

Still getting my head around the code posted, but it gives me a good start!



A most natural way of thinking.
In this special case, FFT has its draw-backs. Firstly, there are only 8, mostly stationary signals involved and FFT returns perhaps 2048 bins.
However, our target values do not match the FFT Bins and thus some kind of interpolation or zero padding is required.
Nowadays, the tone pairs would probably rather defined with regard to perfect bin-matching.
Resampling to say 6000 Hz helps a bit to reduce unnecessary calculations and to center the frequencies in the spectrum.
As you say, a perfect match of FFT size and a single pulse would be nice but it won’t work because the sizes must be a power of 2 (at least in Nyquist).
It gets worse if the DTMF sequence is generated by a human.