How about AI voice generation using the already existing voices?

STRUNK · October 25, 2024

I have installed F5-TTS with pinokio: https://pinokio.computer/
It's remarkable how fast this model is and how easy to use. Just find some longer voice clip and use it as reference audio.
Under Multi-Speech you can upload more voice clips and give them tags/Speech Type Names, of use this for different voices alltogether, then simply start the scentence with {your tag}. {your tag2} etc.
I took 4 clips from the moor tagging them soft, normal, angry and shout, then making them say the same line:

It works very fast and the quality is very much the same as I got out of tortoise-TTS, but seems not as good in keeping the character of the voice.

JackFarmer · October 25, 2024

2 hours ago, STRUNK said:

I have installed F5-TTS with pinokio: https://pinokio.computer/
It's remarkable how fast this model is and how easy to use. Just find some longer voice clip and use it as reference audio.
Under Multi-Speech you can upload more voice clips and give them tags/Speech Type Names, of use this for different voices alltogether, then simply start the scentence with {your tag}. {your tag2} etc.
I took 4 clips from the moor tagging them soft, normal, angry and shout, then making them say the same line:

It works very fast and the quality is very much the same as I got out of tortoise-TTS, but seems not as good in keeping the character of the voice.

yeah, the first three are very close to the original voice actor.

STRUNK · October 25, 2024

18 minutes ago, JackFarmer said:

yeah, the first three are very close to the original voice actor.

Here is a demo page where you can test it yourself: https://huggingface.co/spaces/mrfakename/E2-F5-TTS
I don't know if it directly takes .ogg but it does take .wav. (Batch) convert ogg(s) to wav(s) with vlc player works very well.

Edited October 25, 2024 by STRUNK

JackFarmer · October 25, 2024

43 minutes ago, STRUNK said:

Here is a demo page where you can test it yourself: https://huggingface.co/spaces/mrfakename/E2-F5-TTS
I don't know if it directly takes .ogg but it does take .wav. (Batch) convert ogg(s) to wav(s) with vlc player works very well.

Very impressive, though, sometimes the results sound as if somebody talks into a tube.

STRUNK · October 25, 2024

22 minutes ago, JackFarmer said:

Very impressive, though, sometimes the results sound as if somebody talks into a tube.

Yes the results can vary a lot but this F5 model is so fast I think it might still be a lot faster to generate the same thing a lot of times and finaly pick one that is good enough, then training voice models in tortoise-TTS and still have to generate multiple times to get a good result.

After what I have been trying for almost a week now, TTS AI could be fun to use in some cases, preferably with background audio in the scene, to have NPC's react to specific situation, but not for real dialogue or cutscenes (yet).

Edited October 25, 2024 by STRUNK

JackFarmer · October 25, 2024

5 minutes ago, STRUNK said:

Yes the results can vary a lot but this F5 model is so fast I think it might still be a lot faster to generate the same thing a lot of times and finaly pick one that is good enough, then training voice models in tortoise-TTS and still have to generate multiple times to get a good result.

After what I have been trying for almost a week now, TTS AI could be fun to use in some cases, preferably with background audio in the scene, to have NPC's react to specific situation, but not for real dialogue or cutscenes (yet).

Yeah, I think this is also very good for tentative voice over when voice talents are late with their contributions to have at least something for mapping or beta testing.

STRUNK · October 25, 2024

F5-TTS does take .ogg but it doesn't work with drag and drop, but does work when you click upload and select the .ogg file.

Edited October 25, 2024 by STRUNK

STRUNK · October 25, 2024

I picked up a conversation between some NPC's regarding our goabouts on this forum!

Edited October 25, 2024 by STRUNK

STRUNK · October 25, 2024

For the sake of comparison, the same clip with background audio:

It sounds robotic .. that keeps being true. Edited October 25, 2024 by STRUNK

JackFarmer · October 30, 2024

On 10/25/2024 at 10:17 PM, STRUNK said:

For the sake of comparison, the same clip with background audio:

It sounds robotic .. that keeps being true.

Hm...you really think it sounds robotic? I have a similar feeling as I already had with the earlier samples you postee; if I hadn't known, I would probably have said it was real.

STRUNK · October 30, 2024

4 hours ago, JackFarmer said:

Hm...you really think it sounds robotic? I have a similar feeling as I already had with the earlier samples you postee; if I hadn't known, I would probably have said it was real.

Yes I still do, but .. I know they are AI generated ofc. but also I have edited audio and vocals so much in my life that I maybe listen to audio in a different way.
Non the less, even this "home made" TTS that can run on your own pc/mac can be usefull in some cases, and will only get better.
And if you can have the NPC's say really funny things in a mission, I guess no one will care that is was generated : P

Edited October 30, 2024 by STRUNK

JackFarmer · October 30, 2024

3 hours ago, STRUNK said:

Yes I still do, but .. I know they are AI generated ofc. but also I have edited audio and vocals so much in my life that I maybe listen to audio in a different way.
Non the less, even this "home made" TTS that can run on your own pc/mac can be usefull in some cases, and will only get better.
And if you can have the NPC's say really funny things in a mission, I guess no one will care that is was generated : P

I think this is a idea worth further discussions, though I'd prefer (to avoid bad blood plus I think the existing sets do not really need new barks) if we created entirely new vocal sets with it, perhaps using voice samples from volunteer forum members that sound reasonably believable? It just occurred to me: What actually happens if we use English voice samples with a German or Dutch accent? Would the results then also have the accent?

datiswous · October 30, 2024

I was actually already thinking about trying voicing for tdm, but maybe cloning my voice is more interesting.

JackFarmer · October 31, 2024

8 hours ago, datiswous said:

I was actually already thinking about trying voicing for tdm, but maybe cloning my voice is more interesting.

Could you provide sound samples for us? A thievish voice would be cool.

@STRUNKI'd like to do a briefing video for an existing mission. You then could process datiswous' samples with the already existing briefing text and I would create a briefing video with sound fx and images.

That would be a good test, I think.

@thebigh: What about you? I recall you had an interesting voice, yet bad recording equipment when you provided me with samples back during development of TBM?

Edited October 31, 2024 by JackFarmer

STRUNK · October 31, 2024

10 hours ago, JackFarmer said:

I'd like to do a briefing video for an existing mission. You then could process datiswous' samples with the already existing briefing text and I would create a briefing video with sound fx and images.

That would be a good test, I think.

Ok, I like to try that : )

Sotha · November 6, 2024

This is very interesting!

So... Do we have have text to speech synthesis for existing TDM characters available already?

It would make sense if someone proficient sets that up and makes that available for mappers.

If yes, I will add custom conversation to my WIP mission for sure.

STRUNK · November 7, 2024

On 11/6/2024 at 12:29 PM, Sotha said:

This is very interesting!

So... Do we have have text to speech synthesis for existing TDM characters available already?

It would make sense if someone proficient sets that up and makes that available for mappers.

If yes, I will add custom conversation to my WIP mission for sure.

Well .. making "models" for 1 NPC, splitting it in normal, soft and loud, in tortoise TTS takes me 3 days and produces 3 1.6 Gb models. It's just too much. Besides that, tortoise TTS with all the NPC voice models could be hosted for instance on discord, but one should have a dedicated high spec computer running 24/7 to make it available for everyone at any time.
So that is kinda not realistic : P

A friend of mine has made a Discord bot that runs LLM's from her own pc, and she can also run stable diffusion/sdxl and probably also TTS. I think she wouldn't mind sharing her discord bot/app code if someone is really intrested to use it for this purpose.

Edited November 7, 2024 by STRUNK

Sotha · November 7, 2024

2 hours ago, STRUNK said:

Well .. making "models" for 1 NPC, splitting it in normal, soft and loud, in tortoise TTS takes me 3 days and produces 3 1.6 Gb models. It's just too much. Besides that, tortoise TTS with all the NPC voice models could be hosted for instance on discord, but one should have a dedicated high spec computer running 24/7 to make it available for everyone at any time.
So that is kinda not realistic : P

A friend of mine has made a Discord bot that runs LLM's from her own pc, and she can also run stable diffusion/sdxl and probably also TTS. I think she wouldn't mind sharing her discord bot/app code if someone is really intrested to use it for this purpose.

Ok, from mappers point of view it would be either:

1) contact AI voice cloning specialist and send them the lines the mapper would like the AI to say. And then the AI specialist gets AI to read the lines and sends audio files back to the mapper, or

2) contact voice actor and send them the lines the mapper would like the AI to say. And then the voice actor performs the lines and sends audio files back to the mapper

From mappers point of view there is no much difference. But granted, original TDM character voice actors can easily be unavailable, but AI voice cloning could immortalise the characters forever.

datiswous · November 8, 2024

On 11/6/2024 at 12:29 PM, Sotha said:

So... Do we have have text to speech synthesis for existing TDM characters available already?

I think you would have to ask the original voicers for their aproval to use their voice and many will probably say no.

peter_spy · November 8, 2024

There was a discussion about this issue on TTLG at some point, this post articulates my thoughts on it more than I ever could myself:

JackFarmer · November 22, 2024

On 10/31/2024 at 9:14 AM, JackFarmer said:

Could you provide sound samples for us? A thievish voice would be cool.

@STRUNKI'd like to do a briefing video for an existing mission. You then could process datiswous' samples with the already existing briefing text and I would create a briefing video with sound fx and images.

That would be a good test, I think.

@thebigh: What about you? I recall you had an interesting voice, yet bad recording equipment when you provided me with samples back during development of TBM?

@thebigh

@datiswous

Are you interested? If so, then please be so kind and provide us with voice samples of yours.

datiswous · November 22, 2024

Sorry, not at the moment. I'll try voicing someday, but not now.

thebigh · November 22, 2024

Maybe some day, but not right now. My recording equipment is still trash, and also I'm not sure how I feel about the use of AI for this kind of thing.

Sign In

How about AI voice generation using the already existing voices?

Recommended Posts

STRUNK

JackFarmer

STRUNK

JackFarmer

STRUNK

JackFarmer

STRUNK

STRUNK

STRUNK

JackFarmer

STRUNK

JackFarmer

datiswous

JackFarmer

STRUNK

Sotha

STRUNK

Sotha

datiswous

peter_spy

JackFarmer

datiswous

thebigh

Join the conversation

Recent Status Updates

Browse

Activity