I think that identifying the clicks would be the hardest part. Can your phoneme analyser help with that? Speech segmentation (not recognition!)
There are several easy techniques for reducing mouth smack (and other) clicks.
One approach is to use a low pass filter, but in order to avoid creating clicks at the start/end of the repair it is necessary to have a short transition period between the original and the repaired section. It is not sufficient to just set the start and end at zero crossing points.
Commented example:
; sweep filter frequency range in Hz
(setq nyqf (/ *sound-srate* 2)) ; Nyquist frequency
(setq lowf 2000) ; lowest frequency of the the sweep
(setq iter 4)
; one cycle of a sine wave the same duration as the selection.
(setf sine (osc (hz-to-step (/ (get-duration 1)))
1 *sine-table* 90))
; make amplitude 0 to 1
(setf sine (mult 0.5 (sum 1 sine)))
; make amplitude lowf to nyqf
(setf frq-env (sum lowf (mult sine (- nyqf lowf))))
;; Envelopes to make very short crossfade.
(control-srate-abs *sound-srate*
(progn
(setf filt-env (pwlv 0 0.05 1 0.95 1 1 0))
(setf fade-env (pwlv 1 0.05 0 0.95 0 1 1))))
(setf s1 s)
;; Filter s1
(dotimes (i iter)
(setf s1 (lp s1 frq-env)))
;; Crossfade
(sum (mult s fade-env)
(mult s1 filt-env))
Another approach, specifically for clicks during “silence” is to patch over the click with a bit of “room tone”, which we discussed in these topics:
https://forum.audacityteam.org/t/alternate-equal-length-paste-command-resolved/28836/1