ElevenLabs Scribe (Speech to Text)
"Scribe v2 is the most accurate Speech to Text model" offering "real-time Speech to Text in under 150 ms" across "90+ languages." [1]
ElevenLabs Scribe is a REST speech-to-text API supporting batch and real-time transcription across 90+ languages, with sub-150ms latency for streaming use cases. It covers speaker diarization, word and character timestamps, entity detection and redaction, multichannel processing, and keyterm prompting, making it suitable for podcasts, video captioning, meeting documentation, and AI agent integrations. Pricing starts at $0.22 per hour of audio with a free tier of 4.5 hours per month, self-serve signup, and an enterprise plan available. The service holds SOC 2 Type 2, HIPAA, GDPR, ISO 27001, and PCI DSS certifications, and ships SDKs for Python, Node.js, Swift, Kotlin, and Flutter.
Best for / Avoid if
Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; AI agents and automation - an agent-ready surface (MCP / llms.txt)
Pricing & procurement
- Pricing model
- Hybrid (base + usage) [2]
- Published pricing
- ✓ Yes [3]
- Free tier
- ✓ Yes [4]
- Free tier details
- Free plan includes 4 hours 30 minutes/month of Scribe v1/v2 transcription and 2 hours 30 minutes/month of Scribe v2 Realtime transcription at no cost (recurring monthly allowance, shared with other platform features via the 10,000 credit pool).
- Self-serve signup
- ✓ Yes
- Requires sales call
- ✗ No
- Enterprise plan
- ✓ Yes [5]
| Plan | Item | Per | Amount | Source |
|---|---|---|---|---|
| Pay As You Go | Scribe v1/v2 batch transcription | hour of audio | $0.22 | source |
| Pay As You Go | Scribe v2 Realtime transcription | hour of audio | $0.39 | source |
| Pay As You Go | Entity detection add-on | hour of audio | $0.07 | source |
| Pay As You Go | Keyterm prompting add-on | hour of audio | $0.05 | source |
| Free | Monthly plan fee | month | $0 | source |
| Free | Scribe v1/v2 included transcription | 4 hours 30 minutes/month included | $0 | source |
| Free | Scribe v2 Realtime included transcription | 2 hours 30 minutes/month included | $0 | source |
| Starter | Monthly plan fee | month | $6 | source |
| Starter | Scribe v1/v2 included transcription | 4.5 hours included/month | $0 | source |
| Starter | Scribe v2 Realtime included transcription | 2.5 hours included/month | $0 | source |
| Creator | Monthly plan fee | month | $22 | source |
| Creator | Scribe v1/v2 included transcription | 27 hours included/month | $0 | source |
| Creator | Scribe v2 Realtime included transcription | 15 hours included/month | $0 | source |
| Pro | Monthly plan fee | month | $99 | source |
| Pro | Scribe v1/v2 included transcription | 100 hours included/month | $0 | source |
| Pro | Scribe v2 Realtime included transcription | 56 hours included/month | $0 | source |
| Scale | Monthly plan fee | month | $299 | source |
| Scale | Scribe v1/v2 included transcription | 450 hours included/month | $0 | source |
| Scale | Scribe v2 Realtime included transcription | 254 hours included/month | $0 | source |
| Business | Monthly plan fee | month | $990 | source |
| Business | Scribe v1/v2 included transcription | 1359 hours included/month | $0 | source |
| Business | Scribe v2 Realtime included transcription | 767 hours included/month | $0 | source |
| Enterprise | Scribe v1/v2 included transcription | 4500 hours included/month (example volume) | $0 | source |
| Enterprise | Scribe v2 Realtime included transcription | 2538 hours included/month (example volume) | $0 | source |
Capabilities
- Supported actions
- transcribe_batch, transcribe_streaming, speaker_diarization, language_detection, word_timestamps, character_timestamps, entity_detection, entity_redaction, keyterm_prompting, dynamic_audio_tagging, no_verbatim_mode, multichannel_processing, webhook_delivery, voice_activity_detection, manual_commit_control
- Regions
- US, EU, India, Singapore [6]
- Languages
- Afrikaans, Amharic, Arabic, Armenian, Assamese, Asturian, Azerbaijani, Belarusian, Bengali, Bosnian, Bulgarian, Burmese, Cantonese, Catalan, Central Kurdish, Chichewa, Chinese (Mandarin), Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, Fulah, Galician, Ganda, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kabuverdianu, Kannada, Kazakh, Khmer, Korean, Kyrgyz, Lao, Latvian, Lingala, Lithuanian, Luo, Luxembourgish, Macedonian, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Northern Sotho, Norwegian, Occitan, Oriya, Pashto, Pedi, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Shona, Sindhi, Slovak, Slovenian, Somali, Spanish, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Umbundu, Urdu, Uzbek, Vietnamese, Welsh, Wolof, Xhosa, Zulu [7]
- Input types
- audio/aac, audio/aiff, audio/ogg, audio/mpeg (MP3), audio/opus, audio/wav, audio/flac, audio/m4a, audio/webm, video/mp4, video/avi, video/mkv, video/quicktime (MOV), video/wmv, video/x-flv, video/mpeg, video/3gpp, file upload (up to 3 GB), cloud storage URL (up to 2 GB), YouTube URL, TikTok URL, WebSocket PCM stream (8–48 kHz), WebSocket μ-law stream (ulaw_8000) [8]
- Output types
- JSON (word-level timestamps, speaker IDs, confidence scores), plain text, SRT, DOCX, HTML, PDF, segmented JSON, partial_transcript (streaming), committed_transcript (streaming), committed_transcript_with_timestamps (streaming)
- Webhooks
- ✓ Yes [9]
- Sandbox / test mode
- ✗ No [10]
- SDK languages
- Python, Node.js, Swift, Kotlin, Flutter [11]
- MCP server
- ✓ Yes [12]
Trust & compliance
- SOC 2
- SOC 2 Type II [13]
- HIPAA
- ✓ Yes [14]
- GDPR
- ✓ Yes [15]
- ISO 27001
- ✓ Yes [16]
- PCI DSS
- ✓ Yes [17]
- Published SLA
- ✗ No [18]
- Rate limits
- Concurrency for Scribe v1/v2 batch: min(4, round_up(audio_duration_secs/480)). Files over 8 minutes chunked into 4 parallel segments. Scribe v2 Realtime: 30+ concurrent streams on Business plans; enterprise plans include elevated limits. Response headers expose current-concurrent-requests and maximum-concurrent-requests. HTTP 429 returned on rate_limit_exceeded or concurrent_limit_exceeded. [19]
- Known restrictions
- Maximum file size: 3 GB (file upload) or 2 GB (cloud storage URL), Maximum audio duration: 10 hours (standard), 1 hour (multichannel), Minimum audio duration: 100ms, Maximum channels in multichannel mode: 5, Maximum speakers for diarization: 32, Maximum keyterms: 1,000 per request (batch); 50 keyterms (realtime), Keyterm max length: under 50 characters, max 5 words (batch); up to 20 characters (realtime), Scribe v1 deprecated, removal July 9 2026, Zero Retention Mode (enable_logging=false) is enterprise-only, Data residency (EU, India, Singapore) is enterprise-only feature, HIPAA support requires BAA with ElevenLabs Sales and Zero Retention Mode enabled, Speaker diarization not available on Scribe v2 Realtime, Dual channel not supported on Scribe v2 Realtime, Entity detection and redaction incur additional cost; speaker role detection also incurs additional cost [20]
Developer surface
Integration
- API style
- rest
- Base URL
- https://api.elevenlabs.io
- Version
- v1
- Versioning
- url
- Stability
- ga
- Auth methods
- api_key
- Idempotency keys
- ✗ No
- Error format
- vendor-specific
- Webhook signing
- hmac
Adoption & maturity
- Launched
- 2025-02-26
- GA
- 2025-02-26
- Notable customers
- Revolut, Klarna, Washington Post, Deutsche Telekom, HarperCollins
Other Speech-to-Text & Transcription APIs
Azure AI Speech to Text
"Azure Speech in Foundry Tools provides speech to text, text to speech, and other capabilities through a Microsoft Foundry resource. You can transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and conduct live AI voice conversations."
Amazon Transcribe
"Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or to add speech-to-text capabilities to any application."
Google Cloud Speech-to-Text
"Accurate voice typing and transcription powered by Gemini."
IBM watsonx Speech to Text
"IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics."
AssemblyAI
"Voice AI infrastructure for developers building products that transcribe, understand, and act on speech."
Speechmatics
"Low-latency speech-to-text for multilingual, multi-speaker conversations."
References
- ↑Description: elevenlabs.io · elevenlabs.io
- ↑Pricing model: elevenlabs.io · elevenlabs.io
- ↑Published pricing: elevenlabs.io
- ↑Free tier: elevenlabs.io · elevenlabs.io
- ↑Enterprise plan: elevenlabs.io
- ↑Regions: elevenlabs.io
- ↑Languages: elevenlabs.io · elevenlabs.io
- ↑Input types: elevenlabs.io
- ↑Webhooks: elevenlabs.io
- ↑Sandbox: elevenlabs.io
- ↑SDK languages: elevenlabs.io
- ↑MCP server: elevenlabs.io
- ↑SOC 2: compliance.elevenlabs.io · elevenlabs.io
- ↑HIPAA: elevenlabs.io · elevenlabs.io
- ↑GDPR: elevenlabs.io
- ↑ISO 27001: compliance.elevenlabs.io · elevenlabs.io
- ↑PCI DSS: compliance.elevenlabs.io · elevenlabs.io
- ↑Published SLA: elevenlabs.io
- ↑Rate limits: elevenlabs.io
- ↑Known restrictions: elevenlabs.io · elevenlabs.io
Change history
- 2026-06-21 Capabilities: {} → {"pii_redaction":true,"real_time_streaming":true,"speaker_diarization":true}
- 2026-06-21 Summary Md: (none) → ElevenLabs Scribe is a REST speech-to-text API supporting batch and real-time t…
- 2026-06-21 Score Agent Friendliness: (none) → 65
- 2026-06-21 Score Pricing Transparency: (none) → 100
- 2026-06-21 Score Setup Speed: (none) → 85
- 2026-06-21 Score Docs Quality: (none) → 55
- 2026-06-21 Score Procurement Friction: (none) → 100
- 2026-06-21 Score Trust Readiness: (none) → 80
- 2026-06-21 Best For: (none) → Prototypes and side projects - free to start, no sales call, Regulated or enter…
- 2026-06-21 Scoring Methodology: (none) → Scores are computed deterministically from this profile's published, sourced fi…
- 2026-06-21 Rendering: (none) → static
- 2026-06-21 Llms Txt URL: (none) → https://elevenlabs.io/llms.txt
- 2026-06-21 Has Structured Data: (none) → Yes
- 2026-06-21 Robots Allows Agents: (none) → Yes
- 2026-06-21 API Reference URL: (none) → https://elevenlabs.io/api
- 2026-06-21 Status Page URL: (none) → https://status.elevenlabs.io
- 2026-06-21 Changelog URL: (none) → https://elevenlabs.io/changelog
- 2026-06-21 Docs URL: (none) → https://elevenlabs.io/docs/overview/intro
- 2026-06-21 Llms Txt Present: (none) → Yes
- 2026-06-21 Free Tier Details: set to Free plan includes 4 hours 30 minutes/month of Scribe v1/v2 transcription and 2…
- 2026-06-21 Self Serve Signup: set to Yes
- 2026-06-21 Requires Sales Call: set to No
- 2026-06-21 Enterprise Plan Available: set to Yes
- 2026-06-21 SOC 2: set to type_2
- 2026-06-21 HIPAA: set to Yes
- 2026-06-21 GDPR: set to Yes
- 2026-06-21 ISO 27001: set to Yes
- 2026-06-21 PCI DSS: set to Yes
- 2026-06-21 SLA Published: set to No
- 2026-06-21 Data Retention Policy URL: set to https://elevenlabs.io/dpa
- 2026-06-21 Documented Rate Limits: set to Concurrency for Scribe v1/v2 batch: min(4, round_up(audio_duration_secs/480)). …
- 2026-06-21 Source Confidence: set to high
- 2026-06-21 Extractor: set to claude-subagent:sonnet
- 2026-06-21 Last Verified At: set to 2026-06-21T00:00:00.000Z
- 2026-06-21 Known Restrictions: set to Maximum file size: 3 GB (file upload) or 2 GB (cloud storage URL), Maximum audi…
- 2026-06-21 Auth Methods: set to api_key
- 2026-06-21 Auth Docs URL: set to https://elevenlabs.io/docs/api-reference/introduction
- 2026-06-21 API Style: set to rest
- 2026-06-21 Base URL: set to https://api.elevenlabs.io
- 2026-06-21 API Version: set to v1
- 2026-06-21 Versioning Scheme: set to url
- 2026-06-21 Stability: set to ga
- 2026-06-21 Deprecation Policy URL: set to https://elevenlabs.io/docs/developers/best-practices/breaking-changes-policy
- 2026-06-21 Quickstart URL: set to https://elevenlabs.io/docs/eleven-api/guides/cookbooks/speech-to-text
- 2026-06-21 Idempotency Supported: set to No
- 2026-06-21 Error Format: set to vendor-specific
- 2026-06-21 Webhook Signing: set to hmac
- 2026-06-21 Slug: set to elevenlabs-scribe
- 2026-06-21 Requires Verification: set to No
- 2026-06-21 Starting Price Usd: set to 0.22
Suggest an edit / leave a review
Leave a review or comment
curl -X POST https://apio.sh/api/feedback/elevenlabs-scribe \
-H 'Content-Type: application/json' \
-d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'Suggest a correction to a field (cite a source)
curl -X POST https://apio.sh/api/suggest/elevenlabs-scribe/FIELD \
-H 'Content-Type: application/json' \
-d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'