Automatic subtitles generation with Whisper I found a far better alternative to auto generate almost perfect srt files: Whisper. https://openai.com/research/whisper https://github.com/openai/whisper For example I did a test with file Simeon3.ogg, a 44 seconds voice file from fm A house of locked secrets. By using command in terminal: whisper Simeon3.ogg --model small.en After a very short time (could be due to it using nvidia cuda, not sure) it creates a bunch of ex