I think you misinterpret me. For each fft frame, I find some fixed percentile of the power spectrum. No logarithms involved there. That defines a frequency.
That frequency varies frame to frame, defining a frequency-valued function of time. I’m trying to detect boundaries of phones by some criterion applied to the rate of change of that function. I take a logarithm of that function, before differentiating once or twice, because I thought it’s a good idea to make the criterion independent of pitch level. So it’s changes of the musical step, not the Hz, that I really test.
Then I take absolute value, then trigger at crossings above a cutoff value. So the cutoff value has dimensions of log frequency per second or per second squared. And instead of “log frequency” I can call it “octaves” with appropriate choice of units.
I also made a Nyquist “effect” that replaces a sound with a waveform that graphs my frequency-valued function, so I can trick Audacity into presenting it visually to me. I can examine that function to get a better idea how to devise my criterion. Labelling phones is my goal, and graphing the function is an aid to that goal.
In one of the examples I posted, I showed a sound waveform and in parallel tracks, the results of my graphing applied to three duplicates of the sound, with three different quartiles shown for illustration. I can choose any percentile I like for one graph, it’s a parameter of my plug-in. The three curves look similar for the example but suggest that the first quartile, say, might be better than the third for finding some sounds and worse for others. Some weighted combination of the percentiles might therefore be a useful function to plot instead, but I have not added that capability.
I think the proper linguist’s term is phone, not phoneme. When a native English speaker says “tot,” the two t’s are articulated differently, one aspirated, one not. They are different “phones.” Whether different phones are one phoneme depends on language. English does not contrast words by the aspiration of voiceless stops, but Hindi does. Hindi has a writing system that distinguishes those two t’s with different letters, but English has no need for such writing distinctions. So we say aspiration is a phonemic distinction for Hindi and not for English.