Jump to content
The Dark Mod Forums

How about AI voice generation using the already existing voices?


STRUNK

Recommended Posts

I have installed F5-TTS with pinokio: https://pinokio.computer/
It's remarkable how fast this model is and how easy to use. Just find some longer voice clip and use it as reference audio.
Under Multi-Speech you can upload more voice clips and give them tags/Speech Type Names, of use this for different voices alltogether, then simply start the scentence with {your tag}. {your tag2}  etc.
I took 4 clips from the moor tagging them soft, normal, angry and shout, then making them say the same line:

 

 

It works very fast and the quality is very much the same as I got out of tortoise-TTS, but seems not as good in keeping the character of the voice.

Link to comment
Share on other sites

2 hours ago, STRUNK said:

I have installed F5-TTS with pinokio: https://pinokio.computer/
It's remarkable how fast this model is and how easy to use. Just find some longer voice clip and use it as reference audio.
Under Multi-Speech you can upload more voice clips and give them tags/Speech Type Names, of use this for different voices alltogether, then simply start the scentence with {your tag}. {your tag2}  etc.
I took 4 clips from the moor tagging them soft, normal, angry and shout, then making them say the same line:

 

It works very fast and the quality is very much the same as I got out of tortoise-TTS, but seems not as good in keeping the character of the voice.

 yeah, the first three are very close to the original voice actor.

Link to comment
Share on other sites

18 minutes ago, JackFarmer said:

 yeah, the first three are very close to the original voice actor.

Here is a demo page where you can test it yourself: https://huggingface.co/spaces/mrfakename/E2-F5-TTS
I don't know if it directly takes .ogg but it does take .wav. (Batch) convert ogg(s)  to wav(s) with vlc player works very well.

Edited by STRUNK
Link to comment
Share on other sites

43 minutes ago, STRUNK said:

Here is a demo page where you can test it yourself: https://huggingface.co/spaces/mrfakename/E2-F5-TTS
I don't know if it directly takes .ogg but it does take .wav. (Batch) convert ogg(s)  to wav(s) with vlc player works very well.

Very impressive, though, sometimes the results sound as if somebody talks into a tube.

Link to comment
Share on other sites

22 minutes ago, JackFarmer said:

Very impressive, though, sometimes the results sound as if somebody talks into a tube.

Yes the results can vary a lot but this F5 model is so fast I think it might still be a lot faster to generate the same thing a lot of times and finaly pick one that is good enough, then training voice models in tortoise-TTS and still have to generate multiple times to get a good result.

After what I have been trying for almost a week now, TTS AI could be fun to use in some cases, preferably with background audio in the scene, to have NPC's react to specific situation, but not for real dialogue or cutscenes (yet).

Edited by STRUNK
Link to comment
Share on other sites

5 minutes ago, STRUNK said:

Yes the results can vary a lot but this F5 model is so fast I think it might still be a lot faster to generate the same thing a lot of times and finaly pick one that is good enough, then training voice models in tortoise-TTS and still have to generate multiple times to get a good result.

After what I have been trying for almost a week now, TTS AI could be fun to use in some cases, preferably with background audio in the scene, to have NPC's react to specific situation, but not for real dialogue or cutscenes (yet).

Yeah, I think this is also very good for tentative voice over when voice talents are late with their contributions to have at least something for mapping or beta testing.

 

Link to comment
Share on other sites

On 10/25/2024 at 10:17 PM, STRUNK said:

For the sake of comparison, the same clip with background audio:

 


It sounds robotic .. that keeps being true.

Hm...you really think it sounds robotic? I have a similar feeling as I already had with the earlier samples you postee; if I hadn't known, I would probably have said it was real.

  • Like 1
Link to comment
Share on other sites

4 hours ago, JackFarmer said:

Hm...you really think it sounds robotic? I have a similar feeling as I already had with the earlier samples you postee; if I hadn't known, I would probably have said it was real.

Yes I still do, but .. I know they are AI generated ofc. but also I have edited audio and vocals so much in my life that I maybe listen to audio in a different way.
Non the less, even this "home made" TTS that can run on your own pc/mac can be usefull in some cases, and will only get better.
And if you can have the NPC's say really funny things in a mission, I guess no one will care that is was generated : P

 

Edited by STRUNK
Link to comment
Share on other sites

3 hours ago, STRUNK said:

Yes I still do, but .. I know they are AI generated ofc. but also I have edited audio and vocals so much in my life that I maybe listen to audio in a different way.
Non the less, even this "home made" TTS that can run on your own pc/mac can be usefull in some cases, and will only get better.
And if you can have the NPC's say really funny things in a mission, I guess no one will care that is was generated : P

 

I think this is a idea worth further discussions, though I'd prefer (to avoid bad blood plus I think the existing sets do not really need new barks) if we created entirely new vocal sets with it, perhaps using voice samples from volunteer forum members that sound reasonably believable? It just occurred to me: What actually happens if we use English voice samples with a German or Dutch accent? Would the results then also have the accent?  😆

 

 

Link to comment
Share on other sites

8 hours ago, datiswous said:

I was actually already thinking about trying voicing for tdm, but maybe cloning my voice is more interesting.

Could you provide sound samples for us? A thievish voice would be cool.

@STRUNKI'd like to do a briefing video for an existing mission. You then could process datiswous' samples with the already existing briefing text and I would create a briefing video with sound fx and images.

That would be a good test, I think.

@thebigh: What about you? I recall you had an interesting voice, yet bad recording equipment when you provided me with samples back during development of TBM?

Edited by JackFarmer
Link to comment
Share on other sites

10 hours ago, JackFarmer said:

I'd like to do a briefing video for an existing mission. You then could process datiswous' samples with the already existing briefing text and I would create a briefing video with sound fx and images.

That would be a good test, I think.

Ok, I like to try that : )

  • Like 1
Link to comment
Share on other sites

This is very interesting!

So... Do we have have text to speech synthesis for existing TDM characters available already?

It would make sense if someone proficient sets that up and makes that available for mappers.

If yes, I will add custom conversation to my WIP mission for sure.

Clipper

-The mapper's best friend.

Link to comment
Share on other sites

On 11/6/2024 at 12:29 PM, Sotha said:

This is very interesting!

So... Do we have have text to speech synthesis for existing TDM characters available already?

It would make sense if someone proficient sets that up and makes that available for mappers.

If yes, I will add custom conversation to my WIP mission for sure.

Well .. making "models" for 1 NPC, splitting it in normal, soft and loud, in  tortoise TTS takes me 3 days and produces 3 1.6 Gb models. It's just too much. Besides that, tortoise TTS with all the NPC voice models could be hosted for instance on discord, but one should have a dedicated high spec computer running 24/7 to make it available for everyone at any time.
So that is kinda not realistic : P

A friend of mine has made a Discord bot that runs LLM's from her own pc, and she can also run stable diffusion/sdxl and probably also TTS. I think she wouldn't mind sharing her discord bot/app code if someone is really intrested to use it for this purpose.

Edited by STRUNK
  • Like 1
Link to comment
Share on other sites

2 hours ago, STRUNK said:

Well .. making "models" for 1 NPC, splitting it in normal, soft and loud, in  tortoise TTS takes me 3 days and produces 3 1.6 Gb models. It's just too much. Besides that, tortoise TTS with all the NPC voice models could be hosted for instance on discord, but one should have a dedicated high spec computer running 24/7 to make it available for everyone at any time.
So that is kinda not realistic : P

A friend of mine has made a Discord bot that runs LLM's from her own pc, and she can also run stable diffusion/sdxl and probably also TTS. I think she wouldn't mind sharing her discord bot/app code if someone is really intrested to use it for this purpose.

Ok, from mappers point of view it would be either:

1) contact AI voice cloning specialist and send them the lines the mapper would like the AI to say. And then the AI specialist gets AI to read the lines and sends audio files back to the mapper, or

2) contact voice actor and send them the lines the mapper would like the AI to say. And then the voice actor performs the lines and sends audio files back to the mapper

From mappers point of view there is no much difference. But granted, original TDM character voice actors can easily be unavailable, but AI voice cloning could immortalise the characters forever.

Clipper

-The mapper's best friend.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recent Status Updates

    • Sotha

      Brushes: ~1300
      Patches: ~990
      Entities: ~960
      Ambients: Done, EFX: Done, Objectives: Done, Briefing, Done, Location System: Done.
      Going to final polishings before beta.
      · 0 replies
    • Sotha

      WIP mission name confirmed: "The Last Offering"
       
      · 7 replies
    • Sotha

      Today I started writing readables for my WIP mission.
      I wrote my usual text and then crammed it into AI and boom, high quality stuff comes out.
      I used to say that clipper is the mappers best friend, but now it seems it is more like "AI is the mappers best friend."
      · 2 replies
    • The Black Arrow

      Just saw further into 2.13 development, or is it 2.14? Anyway, proper Parallax Mapping...Absolutely fuck yes, please!
      · 2 replies
    • nbohr1more

      Happy Halloween! "Gem of Souls" is out:
       
      Psst, someone let Darkfate know...
      · 1 reply
×
×
  • Create New...