Improvements to various effects

noise reduction: not sure about current algorithm, but maybe add non-local means to (use similar looking audio-parts to guide the process) (similar to image denoising )
if spectral analysis, http://image.slidesharecdn.com/lca-opus1-120911231032-phpapp02/95/opus-codec-18-728.jpg do multiple windows http://www.pnas.org/content/103/16/6094/F6.expansion.html

click-removal add UI interface to preview the nature of the clicks (not just remove and residual)
but a scatter-plot of click threshold vs width vs (loudness) and allow for cut-off be based on both dimensions in a fuzzy-logic manner
(and option of soft threshold instead of hard-trim-cutoff if worse than criteria)

equalizer for hum-removal allow for removal of drifting transient hums
gradually changes in both frequency and amplitude (e.g microphone in non-ideal and moving from location to…)


time-pitch stretching: doing it without making it sound maybe not blurry (similar to tube-yoplait commercial effect)
is it the phase information being lost (similar to STFT vs reasssigned spectrogram)
http://photosounder.com/download.php edits sound by converting it into an image but suffers form quality loss similar?
(based on http://arss.sourceforge.net/)

Being able to have multiple spectrum windows open at once would be valuable. I do it with a screen capture and display. Desperation method.

DeNoising has the advantage that everybody knows what a half-tone screen looks like. Audio noise tends to be a good deal more free-flow. Your ability to do content recognition in your head goes a good long way. I recognize instantly the sound of a Chevy Nova with a bad tail-pipe, but there’s no way to tell the software what that is so it can be managed or removed.

We famously can’t split a performance apart into individual instruments, voices and sounds. The profile step in noise reduction is the closest—and it depends on you being able to get a terrific, clean profile.

removal of drifting transient hums gradually changes in both frequency and amplitude

That’s also singing. The minute the target starts moving, it snaps us back to content recognition and performance splitting. This is hum in motion, that’s my voice.

Not so far.

Koz

Are you going to write such an algorithm for us? You can read the Manual to see the tradeoffs between using Change Tempo/Change Pitch and Sliding Time Scale/Pitch Shift.

In the next 2.1.3 version of Audacity when released, Change Tempo and Change Pitch have a new option to use the SBSMS algorithm that Sliding Time Scale uses. Otherwise they use a different algorithm called SoundTouch. You can research those two algorithms online if you require more information about them.


Gale

about transient hum removals or any other effects where parameter change with time

maybe allowing to manually add parameters or features to track that are automatically refined
vs time, using keyframes
(similar to what animation programs such as blender3d do https://www.blender.org/manual/editors/movie_clip_editor/tracking/clip/stabilization.html )

fanchirp spectrogram seems to look good for speech , (tilted uncertainty ellipses) http://iie.fing.edu.uy/~pcancela/fcht/
if noise removal’s spectral gating could make use of that

other than just plain reassigned spectrograms there’s ConceFT, it appears to less artifacts

https://arxiv.org/pdf/1507.05366v1.pdf
https://github.com/HaizhaoYang/SST_compare
https://www.researchgate.net/publication/280323900_ConceFT_Matlab_Code

maybe zplane in past (dont remember) said they used pitch tracking to help time-stretching

for normalization maybe make option for EBU128 standard (maybe not much difference, though) http://bs1770gain.sourceforge.net/
http://www.ffmpeg.org/ffmpeg-filters.html#loudnorm http://www.foobar2000.org/components/view/foo_r128norm

for compression maybe making it also frequency depandent, similar to https://en.wikipedia.org/wiki/Auditory_fatigue

for click removal diagram explanation see diagram
s.png

https://gstreamer.freedesktop.org/data/events/gstreamer-conference/2012/opus.pdf (page 43-45)
https://github.com/xiph/opus does adaptive time window size analysis


http://photosounder.com/blog/labels/denoising.html
whether use information from spectrogram also

image spectrogram analysis similar to https://commons.wikimedia.org/wiki/File:Gabor-ocr.png
or PSNR or (MS?)SSIM comparisons

one of the reasons mentioned conenceft was that when tried reassigned spectrograms on photosounder, beating caused artifacts…

if denoise could also show a heat map showing frequency vs amplitude
and super-imposed on a vertical-area graph showing the subtracting noise-floor
and option for a soft-threshold instead of a hard-cut of frequencies that dont-make-it

make more internal details/processes visible (probably what $oftware might wanna hide, and here code is open)
allow to tune noise floor cut to for better cut/split


also optional for spectrograms use of constant-q transform ( lower-frequencies& longer wavelength → longer window)
http://www.tsi.telecom-paristech.fr/aao/en/2016/10/07/reassigned-cqt/

option for multitaper instead of only one window-instance https://github.com/melizalab/libtfr
conceft is a multi-taper synchrosqueezed, synchrosqueezed different form reassignement in that it reallows reconstructing the signal from spectrogram

https://www.researchgate.net/figure/280243431_fig2_Figure-2-Top-left-STFT-based-synchrosqueezing-transform-SST-of-the-clean-signal-s-t#
histo spectro.jpg

You’re absolutely right, the source code is open and it is available here: https://github.com/audacity/audacity/blob/master/src/effects/NoiseReduction.cpp
Hack away to your heart’s content. I’ll be happy to test your improved version.

http://www.cmap.polytechnique.fr/~yu/software/AudioReferenceSoftware.zip (2008)
old code but of window-adaptive denoising

http://www.hackathon.io/extreme-time

https://www.researchgate.net/publication/234063389_Automatic_Adaptation_of_the_Time-Frequency_Resolution_for_Sound_Analysis_and_Re-Synthesis

https://arxiv.org/pdf/1512.04811v2# 2016 Entropy-Based Time-Varying Window Width Selection For Nonlinear-Type Time-Frequency Analysis
https://arxiv.org/abs/1109.6314 2011 An Entropy Based Method for Local Time-Adaptation of the Spectrogram

also maybe use of tonemapping based filter on spectrgram to function as a
compander (compressor/expand)

What is the point of posting all these links?
Are you. for example, suggesting that the “Extreme Time Stretch in python” from your first link, or IRCAM’s SuperVP/AudioSculpt on which it is based, are better in some way than the extreme time stretch algorithm used in PaulStretch? If so, then in what way is it better? What are the pros and cons? What is your assessment of paulnasca’s response to why phase is randomised in PaulStretch? Have you considered or compared performance (speed) of this effect? What exactly do you want us to look at and why?

also from looking at http://iie.fing.edu.uy/~haldos/downloads/pitch-visualization-fcht-v1.pdf
i dont see why reassignment related transforms cant be combined with fan-chirp
provided resampling done right

All your posted links will be made unclickable .


Gale

about paul’s response but Amaz Slow Downer (ircam based)
sounds relatively crisp, also it’s like the argument for blurry photos look nice (perference problem)

attempting for text-to-speech speedup

oops - didnt notice page 2 ~ careless sigh

Steve: “What is the”

thus “also from looking at …” looked confrontational it wasn’t meant to be


sorry i mislead - i’m not that deep in programming (barely)

also was rushing, availability only occasionally


fan-chirp and reasignment(especially)/synchroquezzing and are probably slower than other methods

time-stretching test on text to speech ( similar to eye test on text ) requires intelligibility not just pleasant
but music that sounds nice is fine too :wink:

here’s a comparison (2008) of a lot of proprietary time-stretchers
http://en.audiofanzine.com/pitch-shifter-time-stretcher/editorial/articles/time-stretching-pitch-shifting-comparison-part-i.html

non-spectral time-stretcher (faster) (WSOLA such as soundtouch comparison) https://smartech.gatech.edu/bitstream/handle/1853/54587/WAC2016-48.pdf https://codereview.chromium.org/19111004 https://src.chromium.org/viewvc/chrome/trunk/src/media/filters/wsola_internals.cc?view=log&pathrev=261323#

but ircam’s amaz-slw-downer
for 3minutes audio 48kHz stereo ×2
takes 10seconds for×2 ; 1:45 for ÷2

audacity sliding time-pitch ×2 0:45sec

Topic moved to the Audio Processing board.

for effects that configured to use a certain window-size expressed in terms of # of samples
it probably would output very differently depending on what sample rate the audio track was in

also the uncertainty-ellipse’s aspect ratio would scale proportionally but ²
should it take this into account?


also the effects seem to use a single CPU are upgrades-to-parallel on the roadmap?

i lost the code the generating the heatmap (histo spectro.jpg)
but the vertical axis : is audio-frequency
the horizontal axis : is spectrogram’s intensity
the heatmap’s brightness : is number of pixels with those properties

(due to sampling problems, those artifact vertical lines are visible)

if not via programming, wikipedia.org/wiki/Portable_pixmap
(allows for rough export to spreadsheets, and use of COUNTIF’s)

it’s essentially a image waveform (not audio, different technical word language)
trac.ffmpeg.org/wiki/WaveformMonitor#Lowpassfilter
userbase.kde.org/Kdenlive/Manual/View_Menu/Waveform
helpx.adobe.com/premiere-pro/using/using-waveform-monitors-vectorscope.html#yc_waveform
docs.blender.org/manual/en/latest/editors/image/scopes.html?highlight=luma%20waveform