Make an audio file of a few seconds of silence. Audacity can generate one for you and save it.
Call it Silence.wav or whatever you want.
Make a text file containing the filenames of the files you wish to concatenate. Between each of the files, add the silence audio file. Like this…
content of Filelist.txt
file 'input1.mp4'
file 'Silence.wav'
file 'input2.mp4'
file 'Silence.wav'
file 'input3.mp4'
Run FFMPEG using the Filelist.txt file as the input and whatever filename you want as the output. FFMPEG will make one big audio file for you.
I’ve left a lot of details out, just to demonstrate the concept.