synthesizing voices from voice clips

Windows 10 Audacity version 3.0.2

This should be the correct section for this question, but if it Isn’t I can move it elsewhere.

I wish to know if it is possible with or without plugins to take various Voice samples of a character (or multiple characters) and use Audacity to synthesize full voices that I could use to make a character say what I want with correct inflection and Phonetics when writing dialogs.

as an example of what I wish to do, I have multiple voice clips of two video game characters saying various things, I want to take all the clips to be analyzed/learned and be able to have voices be created so that I can write dialog and have the characters interact realistically without the painstaking process of going through every individual vocal clip and trying to form every word/expression by hand such as this very generic conversation:

“Hello George, are things going well?”
“Hey Charles, yea everything is going well, though I’m a bit tired lately.”

Side note: If such a thing is possible, though unlikely, I also wonder if it is possible to somehow write the TTS/dialog in audacity in order to test this and perhaps have a setting to alter emotion/inflection so that if I write that a character suddenly sounds angry from being neutral or happy its still possible by simply changing the emotion in the "voice instead of needing a whole new voice per emotion

LyreBird … https://youtu.be/VnFC-s2nOtI

I have tried lyrebird before, the issue is as far as i could find is that it doesnt work with audio clips, it has to hear the voice for up to a minute which many of the sound clips would not reach that length of time

Come back and ask again in a couple of years time. Technology may have moved on by then.

the main issue with lyrebird is that it requires at least a minute of constant speech, plus you have to say what’s in their training prompt for the voice to work, but if you have 100+ 3-5 second voice clips of a character saying “Hello” or “alright” you would have to use audacity to individually splice together a bunch of the audio just to make a voice that could work with lyrebird and then project it back into a microphone (as far as I can tell it cant take already done audio).

if it didnt take so long to splice the audio together i wouldnt need lyrebird in the first place, thats why i figured someone may have come up with some sort of workflow or plugins for audacity that could help with splicing the audio into reasonably cohesive speech, or at least take the various phonetics from whatever audio i run through it and save that as a file so its easier to splice to then be used in lyrebird

I was hoping that a voice could be re-synthesized by taking audacity spectra from a clip of the voice you want to create and the voice you have, find the difference and apply it to the voice you have with effects>Filter EQ. I wrote a python program to do that and create the filter EQ import .txt file with the equalizer changes you need. (below). I applied the filter and seems to be right. Though it changes the sound, it is nowhere near what is needed and it doesn’t really work even like Lyrebird. Any ideas?

Python Program:

spectrum1 is reference - voice you want

spectrum2 is you have -the one you want to apply filter to

with open(‘spectrum1.txt’,“r”) as spec1:
lines=spec1.readlines()[1:]
print(‘but here is the 3rd line’,lines[3])
print(len(lines))
f1=[]
v1=[]

for l in lines:
data=l.split()
f1.append(data[0])
v1.append(data[1])
spec1.close()

with open(‘spectrum2.txt’,“r”) as spec2:
lines=spec2.readlines()[1:]
print(‘but here is the 3rd line’,lines[3])
print(len(lines))
f2=[]
v2=[]

for l in lines:
data=l.split()
f2.append(data[0])
v2.append(data[1])
spec1.close()


with open(‘filterdiff.txt’,“w”) as filt:
filt.writelines(“FilterCurve:”)

for i in range(len(f2)):
mystr=“f”+str(i)+“=”+‘"’+f2_+‘" ’
filt.writelines(mystr)
for i in range(len(f2)):
vdiff=float(v1)-float(v2)
mystr=“v”+str(i)+“=”+’“‘+str(vdiff)+’” ’
filt.writelines(mystr)
filt.close()_