How about AI voice generation using the already existing voices?

STRUNK · October 14, 2024

I think it would be quite a handy tool to have an AI that can generate dialog in the voices that are already included, and even the various voice actors, if they are ok with that ...
and have it run local on your own computer.

Any thoughts on that?

datiswous · October 15, 2024

Yeah I thought about that too. I think it can also be used for your own voice, to speed up voicing.

I looked for an open source local one, but couldn't find it yet.

chakkman · October 15, 2024

I think some of the TDM and Thief missions already have AI generated voices. Not a big fan, TBH. Still sounds artificial.

Edited October 15, 2024 by chakkman

nbohr1more · October 15, 2024

See:

datiswous · October 15, 2024

This seems pretty good:

https://huggingface.co/spaces/collabora/WhisperSpeech

Edited October 15, 2024 by datiswous

STRUNK · October 15, 2024

4 hours ago, datiswous said:

This seems pretty good:

https://huggingface.co/spaces/collabora/WhisperSpeech

I was looking at this one yesterday:
https://github.com/neonbjb/tortoise-tts

6 hours ago, nbohr1more said:

See:

Thanx.

So I see voice actors don't like it and some other people neither.

My idea came when I locked out a guard on a balcony and thought; would be nice is he was yelling some
about being stuck on the balcony in his normal guard voice.
Maybe I/we just have to try what the quality of speech generation is these days.
But .. is there any copyright on these "in game" npc voices?

Goldwell · October 16, 2024

I experimented with using AI voices as a placeholder before I asked human actors to do the lines. There are still morality and copyright questions regarding AI voice usage, but that aside, I just find the technology isn't there yet.

Here is a comparison from a Builder sermon scene I have in Shadows of Northdale Act 3.

This is an AI voice I used for the scene originally as a placeholder

And this was the final human voice acted version

In my opinion the human version is infinitely better than the AI generated one. The human voice sounds much more natural, plus there are nuances and inflections the voice actor can bring to the script which can change it in ways that an AI voice can't. Additionally, some voice actors i've worked with will go off script or offer alternative versions, some of which end up being used (a couple of improv lines that AndrosTheOxen used for the shopkeeper in Noble Affairs we're featured in the final game).

datiswous · October 16, 2024

Well if you have voice actors available than the choice is easy. But I think it could be used to generate alternative voices, or when you don't have the time to contact voice actors, or maybe just for some testing purposes. Also, if you want to have voices in different languages, it could be helpful.

34 minutes ago, Goldwell said:

This is an AI voice I used for the scene originally as a placeholder

This is pretty good I have to say. What software/site did you use?

Edited October 16, 2024 by datiswous

Goldwell · October 16, 2024

10 minutes ago, datiswous said:

This is pretty good I have to say. What software/site did you use?

https://elevenlabs.io/

datiswous · October 16, 2024

15 hours ago, STRUNK said:

I was looking at this one yesterday:
https://github.com/neonbjb/tortoise-tts

Yeah I saw it, but the demo page is not able to generate a voice from audio file. The WhisperSpeech demo does have this ability. I tested this and it does generate a pretty realistic voice, it's not really a copy of the voice you supply. Also if the supplied voice has a heavy accent it makes mistakes.

Edited October 16, 2024 by datiswous

chakkman · October 16, 2024

To be fair, the A.I. voices are better than I expected, but, I agree with Goldwell that humans still have the edge. Expectedly.

If that's not an option, then A.I. voices surely will do. I like that the timbre of the voice is really close to the Hammerites Stephen Russell voiced. I guess you took that as a sample?

Edited October 16, 2024 by chakkman

STRUNK · October 16, 2024

6 hours ago, datiswous said:

Yeah I saw it, but the demo page is not able to generate a voice from audio file. The WhisperSpeech demo does have this ability. I tested this and it does generate a pretty realistic voice, it's not really a copy of the voice you supply. Also if the supplied voice has a heavy accent it makes mistakes.

I tried that yesterday and I thought it sounded pretty bad : P
The thing is, there must also be a way to steer the speech output, and train voice sets of the dfferent npc's; bored, angry, alarmed etc. to be really usefull .. I guess. Like training LoRa's for image generation ...
I'll try to install and use tortoise TTS this weekend.

STRUNK · October 19, 2024

Ik got this tortoise TTS up and running after some hassle and made a model for Builder1.
Builder one has just 4 audiofiles and I ran 500 epochs on it, what might be way too much for such a small sample size, and the model is about 1.6Gb, but it seems to work quite nicely.
Some outputs sounded a bit strange and for most of them I had to cut of the start of the audiofile for there was some garbage.
I still have to play around with settings but for now it's looking (sounding) quite nice.

To install it I followed this tutorial, though some things differ a bit when you install version 3 that came after this tutorial:

He is using some other programs to remove background noise and to prepare the audio, but the voices in TDM are already clean so no need for that. What you will have to do is convert all the ogg files to wav (batch convert with vlc player) for tortoise TTS to be able to handle them.
You also need an Nvidia graphics card.
That said, on my rtx5000 laptop GPU it all takes a lot of time ...

STRUNK · October 19, 2024

STRUNK · October 19, 2024

STRUNK · October 19, 2024

datiswous · October 19, 2024

6 hours ago, STRUNK said:

To install it I followed this tutorial,

Yeah that's for Windows though, but nice to see you get it working

STRUNK · October 20, 2024

20 hours ago, datiswous said:

Yeah that's for Windows though, but nice to see you get it working

I think it can run on ios and linux but that might not be as simple as the window install:
https://github.com/neonbjb/tortoise-tts
https://git.ecker.tech/lightmare/tortoise-tts

Edited October 20, 2024 by STRUNK

datiswous · October 20, 2024

Actually I see the ai-voice-cloning repo what the video is about also lists a Linux install.

I have to try Docker sometime.

Edited October 20, 2024 by datiswous

Zerush · October 21, 2024

Currently are very realistic AI TTS tools out there, which also can bee used, so voice actors not longer needed and it's possible to add more random dialogs.

STRUNK · October 23, 2024

On 10/20/2024 at 7:44 PM, datiswous said:

Actually I see the ai-voice-cloning repo what the video is about also lists a Linux install.

I have to try Docker sometime.

There is a new TTS : https://github.com/SWivid/F5-TTS
Should be a lot better with "emotion".

STRUNK · October 24, 2024

The cloning of the characteristics of the voice works quite nice.
I selected sets of audio clips by how "loud" the speech is, Loud, Normal and Soft (speaking up, speaking normal, speaking soft).
I didn't figure out how to get the best audio quality, without weird "audio artifacts".
But as demonstrated down here, the moor certainly sounds like the moor:

datiswous · October 25, 2024

That's pretty amazing. So this is not related to tortuose-tts?

STRUNK · October 25, 2024

1 hour ago, datiswous said:

That's pretty amazing. So this is not related to tortuose-tts?

F5-TTS is a different, non autoregressive model.
The moor voice was done with tortoise-TTS.

ChronA · October 25, 2024

It's pretty remarkable what's possible these days. Maybe large voice samples are no longer even needed.

If they are though, and anyone were looking for samples to train new characters, I suggest considering LibreVox recordings. Since these readings are all in the public domain, the legal and ethical case for using them to create derivative works with AI is much less fraught. LibreVox even says so themselves: https://wiki.librivox.org/index.php?title=LibriVox_and_Artificial_Intelligence_(AI)

In particular Frankenstein, or the Modern Prometheus (version 3) narrated by Caden Vaughn Clegg is excellent. I could imagine a voice and style of intonation like Clegg's main reading pattern fitting really well in the The Dark Mod. His monster voice is not bad either. Some other interesting ones are Greg Bryant in Paradise Lost. His performances could make for a good player character voice. He has a similar gruffness to Stephen Russel as Garrett, but with a different overtone. Lastly Cori Samuel gives some really good performances that could be suitable for young women characters, especially those of noble or plucky-roguish backgrounds.

Edited October 25, 2024 by ChronA

Sign In

How about AI voice generation using the already existing voices?

Recommended Posts

STRUNK

datiswous

chakkman

nbohr1more

datiswous

STRUNK

Goldwell

datiswous

Goldwell

datiswous

chakkman

STRUNK

STRUNK

STRUNK

STRUNK

STRUNK

datiswous

STRUNK

datiswous

Zerush

STRUNK

STRUNK

datiswous

STRUNK

ChronA

Join the conversation

Recent Status Updates

Browse

Activity