I always use Audacity’s Plot Spectrum to see the frequency components of audio files. When I try to implement this function with Python fft, I have a problem on measuring the amplitude of frequency.
The x-axis of the plot is in decibel scale and 0dB means the maximum. How do Audacity defined the 0dB reference value? Is it a fixed or adaptive value?
I also posted a question on stackoverflow yesterday: https://stackoverflow.com/q/51057369/1951254
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
from scipy.signal import hanning
rfft_max_ref = 7943.0 ################## How should I calculate this value?
fs, x = wavfile.read('sample.wav')
x = x * np.hanning(len(x))
rfft = np.abs(np.fft.rfft(x)) / len(x)
rfft_max = max(rfft)
p = 20*np.log10(rfft/rfft_max_ref)
f = np.linspace(0, fs/2, len(p))
Audacity normalizes the values such that a sine wave with a peak value of 0 dB shows a value of 0 dB in the spectrum plot.
Did you mean I need to calculate fft of a sine wave of arbitrary frequency, maybe 3kHz, and get the peak value of fft. Let’s denote it as fft_ref. Then scale other amplitude into decibel based on that? Like:
Does the frequency matter?
I think it will depend on the implementation of FFT that you are using.
You need to be careful about frequencies that occur close to the edge of a bin as they will contribute to the bins on either side. If you pick a frequency that is the mid frequency of one bin, then you can count how many samples end up in that bin. For a sine wave that has a frequency equal to the mid point of a bin, all of the samples contribute to that bin). If the frequency of a sine wave is on the border between two bins, then you need to interpolate the two bin values to get the peak. Audacity uses cubic interpolation, though I recall a discussion where the case was made that other forms of interpolation may be better (sorry but I don’t recall the exact details - it was several years ago).