Automatic subtitles generation

datiswous · January 25, 2023

Edit: in post 5 I discovered Whisper which does this task MUCH better. So don't use vosk. Some of the info till post 5 is still relevant for subtitle editing in Kdenlive in general.

I previously posted about this in a status update. To make it a bit more in-view for the future I post the info also in this topic.

I recently figured out how to make subtitles work for missions following this wiki guide: https://wiki.thedarkmod.com/index.php?title=Subtitles

You can type in the subtitle-text manually either in the .subs or .srt files (in a text-editor) or use an video editor for that (recomended for .srt). What is also possible on some advanced editors including the free and open source multiplatform (Windows, Linux and Intel-Mac) Kdenlive editor is to auto generate the subtitle text for you from the audio or video file. You can then export to an .srt file that works directly in tdm. If you want to use the subs files for shorter sentences, you can just copy text from the .srt files.

In Kdenlive you can install speech to text libraries from VOSK. For this to work you have to download and install Python. Info how to do the process of installation and usage can be seen in the following video (6.5 minutes):

Spoiler

To sum it up:

Configure first time:

Install Python. (on Windows) During setup, you have to select Advanced Options and there mark Add Python to environment variables (super important!).
In Kdenlive go to menu settings, click on configure Kdenlive.
In that configure window, click in the left menu on Speech to text.
There you click on the link to download speech models.
On the website ( https://alphacephei.com/vosk/models ) you can click on a model download link, but keep the click pressed and move your mouse with the link to the configure Kdenlive window. Kdenlive then asks to install the model from url. vosk-model-en-us-0.22-lgraph is probably decent for most use cases. but you can install and test them all.

To use it:

First load an audio or video file into the view by dragging the file in one of the audio or video bars at the bottom (video: v1, v2 or audio: a1, a2).
Click on menu Project > Subtitles > Edit Subtitle tool. You see an extra Subtitles bar on top.
Now you select the audio or video file (it is sellected when it is outlined with an orange border) in the specific bar and then click on menu Project > Subtitles > Speech recognition.
In the Speech recognition dialog, you select the correct language model and choose option Selected clip.
After generation, you can preview the generated subtitles via the top right window. Make sure it is at starter position for playback. Using an audio file, you see a black background with the subtitles on top.
Now you can tweek the position and edit the text directly in the Subtitles bar. This takes up the most time. Unfortunatelly the generation is not flawless, so you have to correct some words.

Tweeking the subtitles for Requiem took me hours, becouse I wanted them to line up differently. Usually the subtitles are not generated as full senteces. This looks sloppy. If you want to add subtitles quickly without spending much time on it, it can be done this way. If you want to do it right, it still takes a lot of time in my experience.

To export to .srt is shown in the following video:

Spoiler

Although actually it's just one step:

Click on menu Project > Subtitles > Export subtitle file.

Alternativelly you can just save the kdenlive project and then the srt is exported as well. Every save will update the srt file.

I might create a wiki article about it later.

Kdenlive edit window:

Edited January 9 by datiswous

datiswous · January 25, 2023

I wonder what other peeps workflow is for generating subtitles for missions. I saw subtitles for A new Job, St. Lucia and Written in Stone.

Edited January 25, 2023 by datiswous

datiswous · January 27, 2023

The subtitle generation doesn't take different voices into account. So often different voices end up in the same subtitle line (so you have to devide them by editting). Can for example be seen in fm Hair in the Snare where the mission intro has 2 voices.

Edited January 9 by datiswous

datiswous · March 8, 2023

I just found that if you save in kdenlive as a kdenlive project, the subtitle srt track is automatically exported as well and every time you save in kdenlive again the srt file is updated. So manual exporting isn't actually needed. The srt is saved in format filename.kdenlive.srt .

datiswous · March 10, 2023

Automatic subtitles generation with Whisper

I found a far better alternative to auto generate almost perfect srt files: Whisper.

https://openai.com/research/whisper

https://github.com/openai/whisper

For example I did a test with file Simeon3.ogg, a 44 seconds voice file from fm A house of locked secrets. By using command in terminal:

whisper Simeon3.ogg --model small.en

After a very short time (could be due to it using nvidia cuda, not sure) it creates a bunch of export files, including an srt file with contents: (be warned that you will read the contents of an audio file of a mission, potentially spoiling something)

Spoiler

1
00:00:00,000 --> 00:00:07,000
When our time in this world is at an end, our body returns to the earth and our soul goes to the builder's side.

2
00:00:07,000 --> 00:00:15,000
Sometimes the body refuses to embrace the earth, these become the undead, to be purified with the hammer and holy water.

3
00:00:15,000 --> 00:00:21,000
However, sometimes it is not the body, but the spirit that stays in this world after death.

4
00:00:21,000 --> 00:00:26,000
These spirits are known by many names, ghosts, shades, phantoms.

5
00:00:27,000 --> 00:00:34,000
Some are benign and some are harmless, but others can wreak terrible havoc on the world of the living.

6
00:00:34,000 --> 00:00:40,000
As it is our duty to repel the undead, so it is also our duty to repel these spirits.

7
00:00:40,000 --> 00:00:44,000
The holy symbol you have picked up shall enable you to do so.

This is almost exactly how it is supposed to be. Not only does it pick up the language all perfect, it also creates full sentences with punctuation. This is far better than the VOSK method in Kdenlive, which I had to edit afterwards.

After that I load it in Kdenlive together with the sound file and make a couple of easy corrections to the flow. Basically you have to make sure that in the gaps in the audio file the subtitle sentences also stop. See example below:

Spoiler

This is probably a 10x faster workflow.

Edit: You can also list all the files in the command. For example (from dir with voice files):

whisper Carlotta1.ogg Carlotta2.ogg Carlotta3.ogg -o ./../../subtitles --model medium.en -f srt

This command generates the subtitles from these 3 files and saves them in the subtitles folder in (only) srt format.

If you want inline, you copy them over from the srt files. Instead you can just use srt for all voice files.

Edit 2

Currently I use this workflow:

In the folder with voice audio files I make a folder subtitles.
Then I open a terminal window in the voice folder.

Then I do the following command in the terminal:

whisper Carlotta1.ogg Carlotta2.ogg Carlotta3.ogg -o ./subtitles --model medium.en -f srt

Afterwards I move the subtitle folder to another folder with voice files and repeat steps 2 and 3 or if I'm done I move the folder to the root folder of the mission.

Edited January 9 by datiswous

datiswous · March 10, 2023

Btw Whisper can apparently also do direct text translation, which could be useful later if TDM will support multi-language subtitles.

Edit: This might not work for En -> Other language.

Edit2: Aparently Whisper can directly transcribe for a language, so if you would make speech files in a different language, it would generate to that language subtitle. Maybe I could test it with translated missions, if they exist.

Edited May 8, 2023 by datiswous

datiswous · March 10, 2023

Edit: See this post for better solutions for Windows.

I tried to install Whisper on a Win10 pc, but couldn't get it installed. I found this Whisper gui-software which works just as well:

https://github.com/chidiwilliams/buzz

After the subtitles are created, you have to double-click on each item and export to srt.

Spoiler

image.png.70e806ec811c5ed01ff9666908efd502.png

Edit: Hmm, I seem to get different results.. Oh well.

Edited January 9 by datiswous

datiswous · April 8, 2023

On 3/10/2023 at 2:36 AM, datiswous said:
Edit: You can also list all the files in the command. For example (from dir with voice files):
whisper Carlotta1.ogg Carlotta2.ogg Carlotta3.ogg -o ./../../subtitles --model medium.en -f srt
This command generates the subtitles from these 3 files and saves them in the subtitles folder in (only) srt format.

To get a horizontal list of files in Linux, navigate to the folder with soundfiles in a maximized terminal (cmd) window and type:

dir --format=horizontal

You see then all the sound files listed in a horizontal line. You can then type whisper and copy-paste that line of files after that. I usually first create the full command line temporarly in a text editor before posting it in a terminal window.

Edited January 9 by datiswous

datiswous · January 9

Subtitle Edit ( https://www.nikse.dk/subtitleedit ) now has support for subtitle extraction via Whisper.

See https://www.nikse.dk/subtitleedit/help#audio_to_text . This works well on Windows. In the extraction window you can download all the needed extra dependencies the first time you use it. After the generation of the srt files, you can use the editor to tweak the files, or move to a seperate editor of your choice (including texteditors).

Aperantly it also works under Linux:

https://www.nikse.dk/subtitleedit/help#linux

If it doesn't, see info above to use the commandline in Linux.

Kdenlive ( https://kdenlive.org ) now also has Whisper subtitle extraction build in. This works well in Windows, but I couldn't get it working in Linux.

You have to go to Settings > Configure Kdenlive. Then go to section Speech to text. On top of the window you select option Whisper. Then you have to install some stuff by clicking on an install button (this doesn't work in Linux currently). The extraction via cpu is considered slow, but I thought it's not so bad using an 8th generation i3 processor during a test with a large speech file. You can afaik only do this one by one, so it's not as fast.

Edited January 9 by datiswous

datiswous · January 9

---

Edited January 9 by datiswous
ERROR...

Sign In

Automatic subtitles generation

Recommended Posts

datiswous

Link to comment

Share on other sites

datiswous

Link to comment

Share on other sites

datiswous

Link to comment

Share on other sites

datiswous

Link to comment

Share on other sites

datiswous

Link to comment

Share on other sites

datiswous

Link to comment

Share on other sites

datiswous

Link to comment

Share on other sites

datiswous

Link to comment

Share on other sites

datiswous

Link to comment

Share on other sites

datiswous

Link to comment

Share on other sites

Join the conversation

Recent Status Updates

Browse

Activity