Scribe v1 Speech to Text
The world's most accurate speech-to-text model with 96.7% accuracy in English. Outperforms Whisper v3, Gemini 2.0 Flash, and Deepgram Nova-3 across 99 languages.
Audio Generator
Cost: -
Transcription Preview
No transcription yet
Key Features
96.7% Accuracy
Industry-leading accuracy with just 3.3% word error rate in English, outperforming all major competitors
99 Languages
Transcribe speech in 99 languages with automatic language detection and code-switching support
32-Speaker Diarization
Identify and label up to 32 different speakers in a single recording with pinpoint accuracy
Word-Level Timestamps
Get precise word-level timestamps for perfect synchronization and subtitle generation
Audio Event Tagging
Automatically tag non-verbal sounds like (laughter), (applause), (footsteps) for richer context
Code Switching
Seamlessly handle switching between different languages within the same audio file
How to Use
Transcribe audio with world-class accuracy in three steps
Upload Audio
Upload your audio or video file in any major format, up to 3GB in size
Configure Options
Enable speaker diarization, timestamps, and audio event tagging as needed
Get Results
Receive structured transcription with speaker labels, timestamps, and event tags
Upload Audio
Upload your audio or video file in any major format, up to 3GB in size
Configure Options
Enable speaker diarization, timestamps, and audio event tagging as needed
Get Results
Receive structured transcription with speaker labels, timestamps, and event tags
Technical Specifications
Use Cases
Enterprise Meetings
Transcribe complex multi-speaker meetings with accurate speaker identification for up to 32 participants
Global Content
Process multilingual content with code-switching support for international teams and audiences
Media Production
Generate precise subtitles with word-level timestamps and audio event markers
Podcast Transcription
Create searchable transcripts with speaker labels for podcast archives and SEO
Research & Analysis
Transcribe interviews and focus groups with high accuracy for qualitative research
Accessibility
Generate accurate captions for deaf and hard-of-hearing audiences across 99 languages
Model Comparison
96.7%
~94%
99
97
Up to 32
Not built-in
Yes
No
Yes
Limited
3 GB
Varies
Frequently Asked Questions
Find answers to common questions about this model
Scribe v1 is ElevenLabs' state-of-the-art automatic speech recognition (ASR) model. It achieves 96.7% accuracy in English and consistently outperforms leading models like OpenAI Whisper v3, Google Gemini 2.0 Flash, and Deepgram Nova-3 across 99 languages.
Scribe v1 achieves a word error rate (WER) of just 3.3% in English and 1.3% in Italian according to FLEURS benchmarks. This translates to approximately 96.7% accuracy, making it the most accurate publicly available ASR model.
Scribe v1 supports 99 languages with automatic language detection. It also handles code-switching, meaning it can accurately transcribe audio that switches between different languages within the same recording.
Scribe v1 can identify and label up to 32 different speakers in a single recording. Each speaker is labeled accurately, making it ideal for complex meetings, panel discussions, and multi-participant conversations.
Audio event tagging automatically detects and labels non-verbal sounds in your transcription, such as (laughter), (applause), (footsteps), or (music). This adds valuable context that pure speech transcription misses.
Scribe v1 supports all major audio and video formats including MP3, WAV, AAC, M4A, OGG, MP4, WebM, and more. The maximum file size is 3GB.
In benchmark tests (FLEURS & Common Voice), Scribe v1 consistently outperforms OpenAI Whisper Large v3 across all 99 supported languages, with particularly significant improvements in accuracy and speaker diarization capabilities.
Yes, all transcriptions generated through our platform can be used for commercial purposes including business meetings, podcasts, video subtitles, and content production without any additional licensing fees.
Experience World-Class Transcription
Try Scribe v1 and discover why it's the most accurate speech-to-text model available