Performance of the OpenVINO Whisper transcription models

I performed a study that compares four transcription models (base, small, medium and large-v3). My findings are:

  • Base model is fastest but also least accurate.
  • Small model offers the best balance — slightly better than large-v3 in accuracy while being significantly faster.
  • Medium and large-v3 demonstrate diminishing returns: higher processing time but only modest accuracy gains.
  • Certain audio tracks proved challenging for all models.
  • Combining models (e.g., small for most tracks, large-v3 for specific cases) could optimise accuracy. This approach could be implemented as an enhancement in Audacity.

The detailed study and the study data is available at https://www.alanbonnici.com/2025/04/comparing-audacitys-openvino-whisper.html.

1 Like