My position: get over it, voice actors. The writing has been on the wall since Vocaloid was released.

I think we've seen games shipping with 10 to 30 gigabytes of voice data at this point. You could imagine some game using procedurally generated text (like GPT-3 used to fuel NPC chatbots), team written scripts, or crowdsourced scripts to get to the equivalent of 1 terabyte or more of lines. Powerful 8-core CPUs are becoming the minimum standard for gaming, and between the CPU and GPU there will be more than enough computational resources available to synthesize voice lines and other sounds, like object collisions, in real time. Assuming 500,000 pages per gigabyte, 1.5 minutes to voice a page, you can get 1.4 years of text-to-speech from 1 gigabyte of scripts.

The legal issues are legitimate. I have no doubt that a court would side with voice actors who are having their voices "stolen", citing personality rights. At the same time, companies could figure out how to mix the voice samples of hundreds of real people together in training data, and adjust parameters to come up with an infinite number of indistinct voices that can be used without paying anyone. Meanwhile, fan efforts can rip off real voices from Hollywood and voice actors, or specific voice performances like Stephen Russell as Garrett. It's not worth it to sue them, and they can come together pseudo-anonymously and distribute code using torrent sites if needed.



Winter, meanwhile, stresses the importance of the act of breathing in voice acting. “Breath is so key to expressing ourselves, especially through voice,” she says. “If your AI voice doesn’t breathe, it’s never going to carry the emotional weight that a human’s performance can.”

Ultimately, for Mitchells and other voice actors, it’s the diminishment of their craft that feels unforgivable. “Actors love to act,” he says. “That's why they sacrifice so much to do it as a job. It is creatively fulfilling and when a character ends up with a fanbase behind it, it’s the most rewarding experience.

“Now, imagine becoming a character loved by many but you didn't do a single thing to contribute towards that role,” Mitchells adds. “Zero creativity from the actor. Zero fulfillment. Zero art.”


In some cases, an amateur will do the voice acting and then a different voice style will be pasted over the original recording. We could also see pure text-to-speech with a markup language to add emphasis, vocal cadence, etc. In either case, an algorithm can definitely transfer or fake the "breathing" and pauses. It's also likely to be considered art. Ignoring the fact that an "invisible sculpture" can be considered art, there will likely be a lot of creativity or at least fine tuning when writing a script, working with an AI that generates scripts, and perfecting the voices.

