SpeechRecognizer & Sound: Transcribe

Performs speech recognition on the selected Sound using the selected SpeechRecognizer, and writes the recognized text to the Info window.

The sound is automatically resampled to 16 kHz (the sampling frequency expected by whisper.cpp) if its original sampling frequency differs.

The recognition uses the Silero VAD (Voice Activity Detection) built into whisper.cpp to skip non-speech parts of the audio, which improves both speed and accuracy.

The result is a flat text string containing the full transcription.


© Anastasia Shchupak 2026-03-15