Very nice! Will take a look. One thing that I really like about llama.cpp and whisper.cpp is no python - much easier to get working and keep working. I tried other python-based LLM engines in the past and it often has the result of breaking something else. Also both llama.cpp and whisper.cpp have nice web servers.You should look into other forks that are way faster, such as WhisperX or faster-whisper:
https://github.com/m-bain/whisperX (uses faster-whisper underneath)
![]()
GitHub - SYSTRAN/faster-whisper: Faster Whisper transcription with CTranslate2
Faster Whisper transcription with CTranslate2. Contribute to SYSTRAN/faster-whisper development by creating an account on GitHub.github.com
I run the 1st one as a public service out of my GPU.