Jump to content
The Dark Mod Forums

Recommended Posts

Posted

While using this to add professional voice-actors to missions is likely illegal, it may be worthwhile to see whether our own voice-actors might be OK with FM authors using this if they are not available to offer their services.

For example, ERH+ has a WIP mission "Seed of the Loadstar" where he is using voice-to-text for the cinematic sequence. Rather than waiting for voice-actors to have time to assist, it might be easier to use this AI tool. Even if just for beta-testing the concept to see how it sounds in mission.

@New Horizon @redleaf @Norbert @AndrosTheOxen @Deadlove @Lux @Mollyness

@Goldwell @Goldchocobo @Mortem Desino @ocn @BrokenArts @Narrator @Noelker

@V-Man339  would anyone in the voice-acting group allow mission authors or the TDM team to use AI to generate new vocal lines for missions or new AI characters (etc)?

Does anyone strongly object to any AI usage even for beta-testing?

If we get some responses we can add the voice-actor's stances to the wiki:

https://wiki.thedarkmod.com/index.php?title=Voice_actors

Please visit TDM's IndieDB site and help promote the mod:

 

http://www.indiedb.com/mods/the-dark-mod

 

(Yeah, shameless promotion... but traffic is traffic folks...)

Posted

To my knowledge, the legality and ethics of AI generated derivative works has yet to be determined. Until then we should be cautious about assuming an IP maximalist perspective will prevail. In almost all creative fields, copyright terms already extend far beyond what is natural or healthy for maximizing creative output, and it is frequently overextended to monopolize ideas when it is meant to only cover expressions. IP holders don't need the help.

It's important to remember, despite what some irresponsible commentators have asserted, that generative AI does not store or reproduce copies of the original (training) works OR even their constituent components. Rather it works by modulating a random input seed into a completely novel product that imitates its inspiration by sharing as many salient features of the relevant training works as the AI can recognize and match. This is no different from a human voice actor doing an imitation of Stephen Russell, and if the law you are proposing were to be applied consistently, both would be equally illegal.

Of course, courts and legislatures may decide that applying the rules consistently between humans and computers is not what's best for society, but until then let's not jump the gun. 

Stephen Russell's vocal performances in the Thief games don't belong to Stephen Russell. He sold them to Looking Glass Studios, who then gifted them to the public by publishing the games onto the open marketplace, retaining only the copy-right over the artistic expressions distributed in the game--for a limited time, as codified in the law. As the law currently stands, we as the public retain the absolute right to produce derivative imitations with the qualities that shaped those artistic expressions, even if we use generative AI models to do so. So long as we don't 1:1 copy the actual expression or its separable expressive components, which generative AI does not, it is all fair use. (And IMO it should remain so, even if that hurts the revenue stream of a few performers.) 

Posted

Yes, if we did this we'd just have our own voice actors contribute their voice. The old way was concatenation. You have the voice actors say literally every possible phoneme and transition in English, and if possible in multiple ways apiece, and then the program knits them together. I think I read that can take more than 6 hours of recording. But I believe newer systems can take a good stretch of recorded speech from a person and generate the phonemes itself. That would be a great project for us if someone wants to take it on.

There may also be some open source voice models out there at this point, but you'd have to make very sure they're consistent with our CC license.

  • Like 1

What do you see when you turn out the light? I can't tell you but I know that it's mine.

Posted (edited)

What if this AI software is used with existing forum member voice actors and then  modified or adjusted to the extend that their voice is similar or even identical to Stephen Russels?

How can that be illegal?

edit: Ok I think I just repeated with fewer words what others already said.

Edited by kin
Posted

I was looking at ElevenLabs earlier. I think the quality is very good in some cases, but falls flat in others.  Also, it's not open source so it could be yanked or paywalled at any moment.

4 hours ago, ChronA said:

As the law currently stands, we as the public retain the absolute right to produce derivative imitations with the qualities that shaped those artistic expressions, even if we use generative AI models to do so. So long as we don't 1:1 copy the actual expression or its separable expressive components, which generative AI does not, it is all fair use. (And IMO it should remain so, even if that hurts the revenue stream of a few performers.) 

There are additional rights such as personality rights that may provide an avenue for living persons or estates to legally attack commercial or fan projects. They aren't universally recognized but could muddy the waters. Obviously, we will see attempts to expand aspects of intellectual property rights as a response to AI in the near future.

1 hour ago, kin said:

What if this AI software is used with existing forum member voice actors and then  modified or adjusted to the extend that their voice is similar or even identical to Stephen Russels?

How can that be illegal?

Remember that one guy with a pretty good imitation of Stephen Russell's Garrett who ended up voicing a few Thief fan missions? What happens if you use his voice samples to train the AI?

  • Like 1
Posted

Knowing that it was artificially generated, I now immediately said that is not real. However, if I had not known, then I would have most probably thought it was real...

Posted

As someone who likes nothing better than writing dialogs, briefings and texts for audio logs for missions, from my point of view I can only agree with @Narrator

I love giving my texts to real people and giving them directions here and there (quite often I dont even have to do that, because the guys already recognize from the text what I want from them).

So in the future, even if it would be legally okay, I will prefer to continue working with real people....Cyberline Systems can keep its malicious technology, because I know where this will lead, at the latest when Arnold is at my door!

 

  • Like 1
Posted

The value is that authors can make their own dialog instantly, listen to it, change it instantly, and go through 20 iterations in a half hour, and do it all night long.  In particular, you can keep doing takes of the same line until it gets the prosody how you like it.

You also only need about 1 minute of a sound clip to make a perfect rendition of a person's voice.

And it doesn't even have to really be a good voice actor. You can use your own voice or family members, etc. The system makes it sound good as far as voice acting. Using a real person is great, but it can't really compete.

  • Like 4

What do you see when you turn out the light? I can't tell you but I know that it's mine.

Posted

I imagine the day that you can feed the existing fan mission database to AI and get back random missions with the choice to adjust its features and at the end do some hand polishing.

 

Posted
44 minutes ago, kin said:

I imagine the day that you can feed the existing fan mission database to AI and get back random missions with the choice to adjust its features and at the end do some hand polishing.

 

I'd say that's difficult, very difficult, but not an impossible scenario.

We have seen some experimentation with procedural generation. Creating random objectives and story within a predefined city/map might also be possible.

Posted (edited)
On 2/4/2023 at 8:16 AM, jaxa said:

I'd say that's difficult, very difficult, but not an impossible scenario.

We have seen some experimentation with procedural generation. Creating random objectives and story within a predefined city/map might also be possible.

It depends how it would be used.

By having AI randomly build (in an architectual manner at least) a mission, even coarsely, authors could save alot of time and focus on refining it.

I am thinking it could very well be used as an inspiration. Kind like adopting an abandoned project.

Also it could draw more authors in the editing field since it would be alot more interesting and less time consuming to modify rather build a mission from scratch.

Hell, I would try that for sure if it was real.

Edited by kin
Posted
On 1/30/2023 at 11:35 AM, Hooded Lantern said:

"AI Speech Synthesis for FMs" by u/that1sluttycelebrity

The voice itself is pretty good. But the sentence pacing just doesn't exist. As a result, the voice has no emotion - it sounds "dead". That said, an emotionless dead-sounding voice would obviously be perfect for characters like Dagoth Ur...

  • Like 1
Posted
2 hours ago, Oktokolo said:

The voice itself is pretty good. But the sentence pacing just doesn't exist. As a result, the voice has no emotion - it sounds "dead". That said, an emotionless dead-sounding voice would obviously be perfect for characters like Dagoth Ur...

IMO, the ElevenLabs AI is attempting sentence pacing and emotion, just getting it wrong often and it sounds uncanny when wrong.

To fix it probably requires adding other tools like markup to give more manual control over the model. Similar to how Stable Diffusion has tools like inpainting and negative prompts that can give a skilled user incredible flexibility.

Another idea would be to allow users to record the lines so they can get the pacing and emphasis just right, and use that as input. So you read the script exactly the way you want, and then the transformation happens and you have Barack Obama talking instead. This would be the equivalent of Stable Diffusion's image-to-image generation.

That method would actually give existing voice actors a competitive advantage when using the software, because they know how to speak with precision.

Just like Stable Diffusion, the user who puts in 5 hours of work is going to get a better result than someone typing prompts for 5 minutes. If you are sufficiently motivated to make a very convincing deepfake, you'll put in the hours.

  • Like 1
Posted
16 hours ago, jaxa said:

To fix it probably requires adding other tools like markup to give more manual control over the model.

Humans use semantic information to modulate pacing and pitch over sentences and even entire paragraphs.
Having a language model like ChatGPT detect the semantic "features" of the text and feeding them as additional input into the speech synthesis model might reduce the amount of markup significantly or even eliminate the need for the common case where the speaker's emotional state is rather neutral and the meaning of the message matches the actual text.

16 hours ago, jaxa said:

Another idea would be to allow users to record the lines so they can get the pacing and emphasis just right, and use that as input.

That might be a pretty intuitive way to provide additional emotional context that can't be derived from the text alone - like the state of the speaker (exhausted, happy...) or subtext (sarcastic, ironic, bragging, threatening).
But just slapping an emoticon in front of some parts of the text might also work good enough when combined with a language model trained to detect them.

I'm excited to see, which path speech synthesis will go. Pretty sure, results will become indistinguishable from professional voice-acting in the next few years.

  • Like 1
Posted
1 hour ago, Oktokolo said:

Humans use semantic information to modulate pacing and pitch over sentences and even entire paragraphs.
Having a language model like ChatGPT detect the semantic "features" of the text and feeding them as additional input into the speech synthesis model might reduce the amount of markup significantly or even eliminate the need for the common case where the speaker's emotional state is rather neutral and the meaning of the message matches the actual text.

Could be. There will always be some edge cases where you want finer control.

https://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Language
https://en.wikipedia.org/wiki/Java_Speech_Markup_Language
https://en.wikipedia.org/wiki/SABLE

Some of these seem to have been abandoned due to lack of interest. I think interest in the topic just exploded.

  • Like 1

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recent Status Updates

    • datiswous

      I moved from Manjaro Linux (rolling release) to Linux Mint (LTS). One of the reasons was that I found the updates a bit too often and long. But now on Mint I get updates every day, although they're usually small updates.
      · 0 replies
    • JackFarmer

      "Hidden Hands: Vitalic Fever" - new update available including subtitles & compressed briefing video (thanks to @datiswous) and several fixes.
      · 0 replies
    • Wolfmond

      🇬🇧

      2025-04-20
      I'd like to track my level design progress a bit more often now, so I'm using the feed in my profile here.
      I've been working intensively on Springheel's YouTube course over the past few days. I'm currently up to lesson 8. There is so much information that needs to be processed and practiced. 
      I have started to create my own house. As I don't have the imagination to create a good floor plan, I grabbed a floor plan generator from Watabou and experimented with it. I chose a floor plan that I will modify slightly, but at least I now have an initial idea. 
      I used two guards as a measuring tape: The rooms are two guards high. It turned out that I can simply double the number of boxes in DarkRadiant in grid size 8 that are drawn in the floor plan. 
      I practiced the simplest things on the floor plan first. Drawing walls, cutting walls, inserting doors, cutting out frames, creating VisPortals, furnishing rooms.
      I have had my first success in creating a book. Creating a book was easier than I thought. I have a few ideas with books. The level I'm creating will be more or less a chill level, just for me, where I'll try out a few things. I don't have an idea for my own mission yet. I want to start small first.
      For the cellar, I wanted to have a second entrance, which should be on the outside. I'm fascinated by these basement doors from the USA, I think they're called Bilco basement doors. They are very unusual in Germany, but this type of access is sometimes used for deliveries to restaurants etc., where barrels can be rolled or lifted into the cellar. 
      I used two Hatch Doors, but they got completely disoriented after turning. I have since got them reasonably tamed. It's not perfect, but it's acceptable. 
      In the cellar today I experimented with a trap door that leads to a shaft system. The rooms aren't practically finished yet, but I want to continue working on the floor plan for now. I'll be starting on the upper floor very soon.

      __________________________________________________________________________________
      🇩🇪

      2025-04-20

      Ich möchte nun mal öfters ein bisschen meinen Werdegang beim Leveldesign tracken, dazu nutze ich hier den Feed in meinem Profil.
      Ich habe mich in den vergangenen Tagen intensiv mit dem Youtube-Kurs von Springheel beschäftigt. Aktuell bin ich bis zu Lektion 8 gekommen. Das sind so viele Informationen, die erstmal verarbeitet werden wollen und trainiert werden wollen. 

      Ich habe mich daran gemacht, ein eigenes Haus zu erstellen. Da mir die Fantasie fehlt, einen guten Raumplan zu erstellen, habe ich mir einen Grundrissgenerator von Watabou geschnappt und damit experimentiert. Ich habe mich für einen Grundriss entschieden, den ich noch leicht abwandeln werde, aber zumindest habe ich nun eine erste Idee. 

      Als Maßband habe ich zwei Wächter genommen: Die Räume sind zwei Wächter hoch. Es hat sich herausgestellt, dass ich in DarkRadiant in Gittergröße 8 einfach die doppelte Anzahl an Kästchen übernehmen kann, die im Grundriss eingezeichnet sind. 

      Ich habe bei dem Grundriss erstmal die einfachsten Sachen geübt. Wände ziehen, Wände zerschneiden, Türen einsetzen, Zargen herausschneiden, VisPortals erstellen, Räume einrichten.

      Ich habe erste Erfolge mit einem Buch gehabt. Das Erstellen eines Buchs ging leichter als gedacht. Ich habe ein paar Ideen mit Bücher. Das Level, das ich gerade erstelle, wird mehr oder weniger ein Chill-Level, einfach nur für mich, bei dem ich ein paar Sachen ausprobieren werde. Ich habe noch keine Idee für eine eigene Mission. Ich möchte erst einmal klein anfangen.

      Beim Keller wollte ich gerne einen zweiten Zugang haben, der sich außen befinden soll. Mich faszinieren diese Kellertüren aus den USA, Bilco basement doors heißen die, glaube ich. Diese sind in Deutschland sehr unüblich, diese Art von Zugängen gibt es aber manchmal zur Anlieferung bei Restaurants etc., wo Fässer dann in den Keller gerollt oder gehoben werden können. 
      Ich habe zwei Hatch Doors verwendet, die allerdings nach dem Drehen vollkommen aus dem Ruder liefen. Inzwischen habe ich sie einigermaßen gebändigt bekommen. Es ist nicht perfekt, aber annehmbar. 
      Im Keller habe ich heute mit einer Falltür experimentiert, die zu einem Schachtsystem führt. Die Räume sind noch quasi nicht eingerichtet, aber ich möchte erstmal am Grundriss weiterarbeiten. In Kürze fange ich das Obergeschoss an.



      · 2 replies
    • JackFarmer

      On a lighter note, thanks to my cat-like reflexes, my superior puzzle skills and my perfect memory, I was able to beat the remastered version of "Tomb Raider: The Last Revelation" in a new superhuman record time of 23 h : 35 m, worship me!
      · 3 replies
    • Goblin of Akenash

      My mapping discord if anyone is interested, its more of a general modding thing rather than just for TDM 
      https://discord.gg/T4Jt4DdmUb

       
      · 0 replies
×
×
  • Create New...