Check this reddit post - "AI Speech Synthesis for FMs"

Hooded Lantern · January 30, 2023

"AI Speech Synthesis for FMs" by u/that1sluttycelebrity

datiswous · January 30, 2023

I think it might be illigal to do that though except if you ask the person that created the original voice.

nbohr1more · January 30, 2023

While using this to add professional voice-actors to missions is likely illegal, it may be worthwhile to see whether our own voice-actors might be OK with FM authors using this if they are not available to offer their services.

For example, ERH+ has a WIP mission "Seed of the Loadstar" where he is using voice-to-text for the cinematic sequence. Rather than waiting for voice-actors to have time to assist, it might be easier to use this AI tool. Even if just for beta-testing the concept to see how it sounds in mission.

@New Horizon @redleaf @Norbert @AndrosTheOxen @Deadlove @Lux @Mollyness

@Goldwell @Goldchocobo @Mortem Desino @ocn @BrokenArts @Narrator @Noelker

@V-Man339 would anyone in the voice-acting group allow mission authors or the TDM team to use AI to generate new vocal lines for missions or new AI characters (etc)?

Does anyone strongly object to any AI usage even for beta-testing?

If we get some responses we can add the voice-actor's stances to the wiki:

https://wiki.thedarkmod.com/index.php?title=Voice_actors

ChronA · January 30, 2023

To my knowledge, the legality and ethics of AI generated derivative works has yet to be determined. Until then we should be cautious about assuming an IP maximalist perspective will prevail. In almost all creative fields, copyright terms already extend far beyond what is natural or healthy for maximizing creative output, and it is frequently overextended to monopolize ideas when it is meant to only cover expressions. IP holders don't need the help.

It's important to remember, despite what some irresponsible commentators have asserted, that generative AI does not store or reproduce copies of the original (training) works OR even their constituent components. Rather it works by modulating a random input seed into a completely novel product that imitates its inspiration by sharing as many salient features of the relevant training works as the AI can recognize and match. This is no different from a human voice actor doing an imitation of Stephen Russell, and if the law you are proposing were to be applied consistently, both would be equally illegal.

Of course, courts and legislatures may decide that applying the rules consistently between humans and computers is not what's best for society, but until then let's not jump the gun.

Stephen Russell's vocal performances in the Thief games don't belong to Stephen Russell. He sold them to Looking Glass Studios, who then gifted them to the public by publishing the games onto the open marketplace, retaining only the copy-right over the artistic expressions distributed in the game--for a limited time, as codified in the law. As the law currently stands, we as the public retain the absolute right to produce derivative imitations with the qualities that shaped those artistic expressions, even if we use generative AI models to do so. So long as we don't 1:1 copy the actual expression or its separable expressive components, which generative AI does not, it is all fair use. (And IMO it should remain so, even if that hurts the revenue stream of a few performers.)

demagogue · January 30, 2023

Yes, if we did this we'd just have our own voice actors contribute their voice. The old way was concatenation. You have the voice actors say literally every possible phoneme and transition in English, and if possible in multiple ways apiece, and then the program knits them together. I think I read that can take more than 6 hours of recording. But I believe newer systems can take a good stretch of recorded speech from a person and generate the phonemes itself. That would be a great project for us if someone wants to take it on.

There may also be some open source voice models out there at this point, but you'd have to make very sure they're consistent with our CC license.

kin · January 30, 2023

What if this AI software is used with existing forum member voice actors and then modified or adjusted to the extend that their voice is similar or even identical to Stephen Russels?

How can that be illegal?

edit: Ok I think I just repeated with fewer words what others already said.

Edited January 30, 2023 by kin

jaxa · January 30, 2023

I was looking at ElevenLabs earlier. I think the quality is very good in some cases, but falls flat in others. Also, it's not open source so it could be yanked or paywalled at any moment.

4 hours ago, ChronA said:

As the law currently stands, we as the public retain the absolute right to produce derivative imitations with the qualities that shaped those artistic expressions, even if we use generative AI models to do so. So long as we don't 1:1 copy the actual expression or its separable expressive components, which generative AI does not, it is all fair use. (And IMO it should remain so, even if that hurts the revenue stream of a few performers.)

There are additional rights such as personality rights that may provide an avenue for living persons or estates to legally attack commercial or fan projects. They aren't universally recognized but could muddy the waters. Obviously, we will see attempts to expand aspects of intellectual property rights as a response to AI in the near future.

1 hour ago, kin said:

What if this AI software is used with existing forum member voice actors and then modified or adjusted to the extend that their voice is similar or even identical to Stephen Russels?

How can that be illegal?

Remember that one guy with a pretty good imitation of Stephen Russell's Garrett who ended up voicing a few Thief fan missions? What happens if you use his voice samples to train the AI?

JackFarmer · January 30, 2023

Knowing that it was artificially generated, I now immediately said that is not real. However, if I had not known, then I would have most probably thought it was real...

Narrator · February 3, 2023

Being one of the voices of TDM, I certainly cannot say anything about the legal implications that such AI-created content might have. But being replaced by a machine would definitely feel somewhat awkward for me.

Turn-around time never was an issue with me, as far as I remember. I mostly deliver within 2-4 days of being asked to do some lines. All I can say is, I would love being asked if I can record the requested lines personally before someone would just use some AI model to fullfil his/her needs (which I cannot prevent anyone from doing). But, you know, it's actually fun for me to record this stuff. And I would feel sad if an AI would bereft me of this pleasure!

Narrator

jaxa · February 3, 2023

And the real one:

Morrowind will be fully voiced in no time.

Edited February 3, 2023 by jaxa

JackFarmer · February 3, 2023

As someone who likes nothing better than writing dialogs, briefings and texts for audio logs for missions, from my point of view I can only agree with @Narrator

I love giving my texts to real people and giving them directions here and there (quite often I dont even have to do that, because the guys already recognize from the text what I want from them).

So in the future, even if it would be legally okay, I will prefer to continue working with real people....Cyberline Systems can keep its malicious technology, because I know where this will lead, at the latest when Arnold is at my door!

demagogue · February 3, 2023

The value is that authors can make their own dialog instantly, listen to it, change it instantly, and go through 20 iterations in a half hour, and do it all night long. In particular, you can keep doing takes of the same line until it gets the prosody how you like it.

You also only need about 1 minute of a sound clip to make a perfect rendition of a person's voice.

And it doesn't even have to really be a good voice actor. You can use your own voice or family members, etc. The system makes it sound good as far as voice acting. Using a real person is great, but it can't really compete.

kin · February 4, 2023

I imagine the day that you can feed the existing fan mission database to AI and get back random missions with the choice to adjust its features and at the end do some hand polishing.

jaxa · February 4, 2023

44 minutes ago, kin said:

I imagine the day that you can feed the existing fan mission database to AI and get back random missions with the choice to adjust its features and at the end do some hand polishing.

I'd say that's difficult, very difficult, but not an impossible scenario.

We have seen some experimentation with procedural generation. Creating random objectives and story within a predefined city/map might also be possible.

kin · February 5, 2023

On 2/4/2023 at 8:16 AM, jaxa said:

I'd say that's difficult, very difficult, but not an impossible scenario.

We have seen some experimentation with procedural generation. Creating random objectives and story within a predefined city/map might also be possible.

It depends how it would be used.

By having AI randomly build (in an architectual manner at least) a mission, even coarsely, authors could save alot of time and focus on refining it.

I am thinking it could very well be used as an inspiration. Kind like adopting an abandoned project.

Also it could draw more authors in the editing field since it would be alot more interesting and less time consuming to modify rather build a mission from scratch.

Hell, I would try that for sure if it was real.

Edited February 5, 2023 by kin

Oktokolo · February 5, 2023

On 1/30/2023 at 11:35 AM, Hooded Lantern said:

"AI Speech Synthesis for FMs" by u/that1sluttycelebrity

The voice itself is pretty good. But the sentence pacing just doesn't exist. As a result, the voice has no emotion - it sounds "dead". That said, an emotionless dead-sounding voice would obviously be perfect for characters like Dagoth Ur...

jaxa · February 5, 2023

2 hours ago, Oktokolo said:

The voice itself is pretty good. But the sentence pacing just doesn't exist. As a result, the voice has no emotion - it sounds "dead". That said, an emotionless dead-sounding voice would obviously be perfect for characters like Dagoth Ur...

IMO, the ElevenLabs AI is attempting sentence pacing and emotion, just getting it wrong often and it sounds uncanny when wrong.

To fix it probably requires adding other tools like markup to give more manual control over the model. Similar to how Stable Diffusion has tools like inpainting and negative prompts that can give a skilled user incredible flexibility.

Another idea would be to allow users to record the lines so they can get the pacing and emphasis just right, and use that as input. So you read the script exactly the way you want, and then the transformation happens and you have Barack Obama talking instead. This would be the equivalent of Stable Diffusion's image-to-image generation.

That method would actually give existing voice actors a competitive advantage when using the software, because they know how to speak with precision.

Just like Stable Diffusion, the user who puts in 5 hours of work is going to get a better result than someone typing prompts for 5 minutes. If you are sufficiently motivated to make a very convincing deepfake, you'll put in the hours.

Oktokolo · February 6, 2023

16 hours ago, jaxa said:

To fix it probably requires adding other tools like markup to give more manual control over the model.

Humans use semantic information to modulate pacing and pitch over sentences and even entire paragraphs.
Having a language model like ChatGPT detect the semantic "features" of the text and feeding them as additional input into the speech synthesis model might reduce the amount of markup significantly or even eliminate the need for the common case where the speaker's emotional state is rather neutral and the meaning of the message matches the actual text.

16 hours ago, jaxa said:

Another idea would be to allow users to record the lines so they can get the pacing and emphasis just right, and use that as input.

That might be a pretty intuitive way to provide additional emotional context that can't be derived from the text alone - like the state of the speaker (exhausted, happy...) or subtext (sarcastic, ironic, bragging, threatening).
But just slapping an emoticon in front of some parts of the text might also work good enough when combined with a language model trained to detect them.

I'm excited to see, which path speech synthesis will go. Pretty sure, results will become indistinguishable from professional voice-acting in the next few years.

jaxa · February 6, 2023

1 hour ago, Oktokolo said:

Humans use semantic information to modulate pacing and pitch over sentences and even entire paragraphs.
Having a language model like ChatGPT detect the semantic "features" of the text and feeding them as additional input into the speech synthesis model might reduce the amount of markup significantly or even eliminate the need for the common case where the speaker's emotional state is rather neutral and the meaning of the message matches the actual text.

Could be. There will always be some edge cases where you want finer control.

https://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Language
https://en.wikipedia.org/wiki/Java_Speech_Markup_Language
https://en.wikipedia.org/wiki/SABLE

Some of these seem to have been abandoned due to lack of interest. I think interest in the topic just exploded.

Sign In

Check this reddit post - "AI Speech Synthesis for FMs"

Recommended Posts

Hooded Lantern

datiswous

nbohr1more

ChronA

demagogue

kin

jaxa

JackFarmer

Narrator

jaxa

JackFarmer

demagogue

kin

jaxa

kin

Oktokolo

jaxa

Oktokolo

jaxa

Join the conversation

Recent Status Updates

Browse

Activity