February 27, 2025 · 6 min read

Elevenlabs releases Scribe: the new leading automatic speech recognition model beating OpenAI Whisper V3 and Google Gemini

In an unexpected move that's set to redefine the landscape of Automatic Speech Recognition (ASR), Elevenlabs has introduced Scribe V1. Primarily known for their cutting-edge text-to-speech (TTS) technology, Elevenlabs has now turned their innovative prowess towards ASR, delivering a closed-source model that outperforms both open-source competitors such as OpenAI Whisper V3, as well as closed-source alternatives like Google’s Gemini Flash model and Deepgram Nova-3.

Overview of the capabilities of the new Scribe model by Elevenlabs.

At Scribewave, we continually seek out and integrate the best transcription technologies into our privacy-friendly, all-in-one speech-to-text (STT) solution. We meticulously monitor the latest advancements in speech-to-text APIs to ensure that our users have access to the best the industry has to offer at any moment. When Elevenlabs released Scribe earlier this morning, we swiftly recognized its potential and implemented it on our platform on the very same day, ensuring we stay at the forefront of technological innovation.

However, the integration of Scribe came with its own challenges. One notable limitation is that out of the box, Elevenlab’s speaker recognition is effective only for audio files up to 8 minutes in length. I addressed this constraint overnight with loads of coffee and chocolate, to ensure that our users can enjoy seamless, high-quality transcriptions without the hassle of file length or format restrictions. Thanks to these enhancements, Scribewave now supports all audio and video files up to an impressive 5 hours in duration without compromises on the quality of the transcript.

With this post, I am pleased to announce that enterprise customers can now gain beta access to this revolutionary model. Moreover, we are already rolling out access to Scribe to all our users.

Scribe's Impressive Performance: Topping Benchmarks and Vibe Checks

Elevenlabs' Scribe V1 isn't just hype—it's backed by solid evidence. The model has taken the top spot in independent benchmarks like the one by Artificial Analysis and Mozilla's common voice, confirming its superior accuracy and reliability. These findings match our own internal tests at Scribewave, proving that Scribe is truly a cut above the rest in most of the cases.

But what really sets Scribe apart is how well it performs in real-world scenarios. Although I have my experience with tons of different speech recognition models, I am really amazed by its accuracy. I tested various files that were transcribed perfectly—100% accuracy in Dutch, English, Italian, and French. This level of precision is a game-changer in how we use and understand spoken language. Over the next few days, I'll be running more tests on our implementation of Scribe to ensure it continues to meet our high standards.

Other early adopters online seem to agree with me that Scribe passes the infamous “vibe check”, proving the usability of the model. One Twitter user even showed off an impeccable transcription result for the world’s fastest speaker and other praise its multilingual capabilities.

Picture: ElevenLabs Scribe beats Deepgram and Assembly in terms of WER (Source: Artificial Analysis: https://artificialanalysis.ai/speech-to-text)

Analysis: benefits of Elevenlabs Scribe model

Scribe V1 offers a range of impressive advantages:

Exceptional Accuracy: Scribe is highly accurate in understanding the correct words and correctly labeling speakers, a feature known as "diarization." This makes it a reliable choice for transcribing conversations and meetings.
Seamless Code Switching: Scribe effortlessly handles switching between different languages within the same file, making it ideal for multilingual environments.
Extensive Language Support: With support for 99 languages, Scribe can be used almost anywhere in the world.
Improvements in Under-Served Languages: Scribe has made significant improvements in languages that were previously under-served, such as Serbian, Cantonese, and Malayalam. This opens up new possibilities for users in these regions.
Audio Event Detection: One of Scribe's standout features is its ability to detect audio events like music and laughter. This means it understands not just speech, but the entire audio context, making transcriptions more comprehensive and useful.
Speedy Transcriptions: Scribe is incredibly fast. It can transcribe a minute of audio in just a few seconds, saving you time and increasing efficiency.

Picture: Scribe’s performance on different languages compared to Gemini, Whisper and Deepgram on the common voice benchmark

Not yet perfect: limitations of the model

Although Scribe V1 offers numerous benefits, I spotted a few limitations to take into account as well:

Closed-Source Model: Scribe is a closed-source model, which means you cannot run it on your local machine. This might be a consideration for those who prefer open-source solutions.
Audio Quality Sensitivity: Although Scribe is highly accurate with good quality audio, its performance can deteriorate in more challenging environments. This includes situations where people are interrupting each other or when someone is further away from the microphone.
API Limitations: The API has some technical limitations. It currently only works with file streams and does not support file URLs. Additionally, diarization is only supported for files shorter than 8 minutes. However, with Scribewave, you can transcribe files using this model for up to 3 hours, making it much more useful for real-life contexts such as transcribing meetings.
Data Privacy: By default, using the Elevenlabs model out of the box may mean your data could be used for further training. For advanced privacy options, you would need a Service Level Agreement (SLA) with the company. Alternatively, you can rely on a service like Scribewave to ensure your data privacy.

These limitations are important to consider, but with the right adjustments and support from services like Scribewave, Scribe V1 remains a powerful tool for speech recognition.

Conclusion

Scribe V1 undeniably sets a new benchmark for automatic speech-to-text conversion. Its unparalleled accuracy, extensive language support, and innovative features make it a game-changer for anyone who relies on precise transcriptions. Journalists, researchers, and podcasters can now process interviews, focus groups, and podcasts with greater ease and confidence.

Curious about how Scribe V1 performs in your language? You can explore its capabilities directly in the ElevenLabs playground. If you need support for audio longer than 8 minutes or are collaborating with a team, Scribewave’s free trial offers an enhanced experience. Visit Scribewave to see how this innovative technology can streamline your workflow and elevate the quality of your projects.

About the author

Ulysse Maes

In a world where Ulysse can't out-flex The Rock or out-charm Timothée Chalamet, he triumphs as the mastermind behind Scribewave, fiercely defending his throne as the king of nerds in beautiful Antwerp, Belgium.

Discover more articles about transcription, subtitling, and translation

OpenAI launches GPT-4o-transcribe: A powerful yet limited transcription model

2025-04-09T17:22:00.000Z

Elevenlabs releases Scribe: the new leading automatic speech recognition model beating OpenAI Whisper V3 and Google Gemini

Scribe's Impressive Performance: Topping Benchmarks and Vibe Checks

Analysis: benefits of Elevenlabs Scribe model

Not yet perfect: limitations of the model

Conclusion

Ulysse Maes

Related articles

OpenAI launches GPT-4o-transcribe: A powerful yet limited transcription model

How to get accurate transcripts without a subscription?

Accurate Transcription in Local Languages: Key Considerations