ElevenLabs Scribe (Speech to Text)

"Scribe v2 is the most accurate Speech to Text model" offering "real-time Speech to Text in under 150 ms" across "90+ languages." [1]

elevenlabs.io/speech-to-text · By ElevenLabs · Agent JSON · Suggest an edit · Last verified 2026-06-21 · Source confidence: high

ElevenLabs Scribe is a REST speech-to-text API supporting batch and real-time transcription across 90+ languages, with sub-150ms latency for streaming use cases. It covers speaker diarization, word and character timestamps, entity detection and redaction, multichannel processing, and keyterm prompting, making it suitable for podcasts, video captioning, meeting documentation, and AI agent integrations. Pricing starts at $0.22 per hour of audio with a free tier of 4.5 hours per month, self-serve signup, and an enterprise plan available. The service holds SOC 2 Type 2, HIPAA, GDPR, ISO 27001, and PCI DSS certifications, and ships SDKs for Python, Node.js, Swift, Kotlin, and Flutter.

Best for / Avoid if

Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; AI agents and automation - an agent-ready surface (MCP / llms.txt)

Pricing & procurement

Pricing model
Hybrid (base + usage) [2]
Published pricing
Yes [3]
Free tier
Yes [4]
Free tier details
Free plan includes 4 hours 30 minutes/month of Scribe v1/v2 transcription and 2 hours 30 minutes/month of Scribe v2 Realtime transcription at no cost (recurring monthly allowance, shared with other platform features via the 10,000 credit pool).
Self-serve signup
Yes
Requires sales call
No
Enterprise plan
Yes [5]
Published prices
PlanItemPerAmountSource
Pay As You GoScribe v1/v2 batch transcriptionhour of audio$0.22source
Pay As You GoScribe v2 Realtime transcriptionhour of audio$0.39source
Pay As You GoEntity detection add-onhour of audio$0.07source
Pay As You GoKeyterm prompting add-onhour of audio$0.05source
FreeMonthly plan feemonth$0source
FreeScribe v1/v2 included transcription4 hours 30 minutes/month included$0source
FreeScribe v2 Realtime included transcription2 hours 30 minutes/month included$0source
StarterMonthly plan feemonth$6source
StarterScribe v1/v2 included transcription4.5 hours included/month$0source
StarterScribe v2 Realtime included transcription2.5 hours included/month$0source
CreatorMonthly plan feemonth$22source
CreatorScribe v1/v2 included transcription27 hours included/month$0source
CreatorScribe v2 Realtime included transcription15 hours included/month$0source
ProMonthly plan feemonth$99source
ProScribe v1/v2 included transcription100 hours included/month$0source
ProScribe v2 Realtime included transcription56 hours included/month$0source
ScaleMonthly plan feemonth$299source
ScaleScribe v1/v2 included transcription450 hours included/month$0source
ScaleScribe v2 Realtime included transcription254 hours included/month$0source
BusinessMonthly plan feemonth$990source
BusinessScribe v1/v2 included transcription1359 hours included/month$0source
BusinessScribe v2 Realtime included transcription767 hours included/month$0source
EnterpriseScribe v1/v2 included transcription4500 hours included/month (example volume)$0source
EnterpriseScribe v2 Realtime included transcription2538 hours included/month (example volume)$0source

Capabilities

  • Real-time streaming
  • Speaker diarization
  • PII redaction
Supported actions
transcribe_batch, transcribe_streaming, speaker_diarization, language_detection, word_timestamps, character_timestamps, entity_detection, entity_redaction, keyterm_prompting, dynamic_audio_tagging, no_verbatim_mode, multichannel_processing, webhook_delivery, voice_activity_detection, manual_commit_control
Regions
US, EU, India, Singapore [6]
Languages
Afrikaans, Amharic, Arabic, Armenian, Assamese, Asturian, Azerbaijani, Belarusian, Bengali, Bosnian, Bulgarian, Burmese, Cantonese, Catalan, Central Kurdish, Chichewa, Chinese (Mandarin), Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, Fulah, Galician, Ganda, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kabuverdianu, Kannada, Kazakh, Khmer, Korean, Kyrgyz, Lao, Latvian, Lingala, Lithuanian, Luo, Luxembourgish, Macedonian, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Northern Sotho, Norwegian, Occitan, Oriya, Pashto, Pedi, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Shona, Sindhi, Slovak, Slovenian, Somali, Spanish, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Umbundu, Urdu, Uzbek, Vietnamese, Welsh, Wolof, Xhosa, Zulu [7]
Input types
audio/aac, audio/aiff, audio/ogg, audio/mpeg (MP3), audio/opus, audio/wav, audio/flac, audio/m4a, audio/webm, video/mp4, video/avi, video/mkv, video/quicktime (MOV), video/wmv, video/x-flv, video/mpeg, video/3gpp, file upload (up to 3 GB), cloud storage URL (up to 2 GB), YouTube URL, TikTok URL, WebSocket PCM stream (8–48 kHz), WebSocket μ-law stream (ulaw_8000) [8]
Output types
JSON (word-level timestamps, speaker IDs, confidence scores), plain text, SRT, DOCX, HTML, PDF, segmented JSON, partial_transcript (streaming), committed_transcript (streaming), committed_transcript_with_timestamps (streaming)
Webhooks
Yes [9]
Sandbox / test mode
No [10]
SDK languages
Python, Node.js, Swift, Kotlin, Flutter [11]
MCP server
Yes [12]

Trust & compliance

SOC 2
SOC 2 Type II [13]
HIPAA
Yes [14]
GDPR
Yes [15]
ISO 27001
Yes [16]
PCI DSS
Yes [17]
Published SLA
No [18]
Rate limits
Concurrency for Scribe v1/v2 batch: min(4, round_up(audio_duration_secs/480)). Files over 8 minutes chunked into 4 parallel segments. Scribe v2 Realtime: 30+ concurrent streams on Business plans; enterprise plans include elevated limits. Response headers expose current-concurrent-requests and maximum-concurrent-requests. HTTP 429 returned on rate_limit_exceeded or concurrent_limit_exceeded. [19]
Known restrictions
Maximum file size: 3 GB (file upload) or 2 GB (cloud storage URL), Maximum audio duration: 10 hours (standard), 1 hour (multichannel), Minimum audio duration: 100ms, Maximum channels in multichannel mode: 5, Maximum speakers for diarization: 32, Maximum keyterms: 1,000 per request (batch); 50 keyterms (realtime), Keyterm max length: under 50 characters, max 5 words (batch); up to 20 characters (realtime), Scribe v1 deprecated, removal July 9 2026, Zero Retention Mode (enable_logging=false) is enterprise-only, Data residency (EU, India, Singapore) is enterprise-only feature, HIPAA support requires BAA with ElevenLabs Sales and Zero Retention Mode enabled, Speaker diarization not available on Scribe v2 Realtime, Dual channel not supported on Scribe v2 Realtime, Entity detection and redaction incur additional cost; speaker role detection also incurs additional cost [20]

Developer surface

Docs rendering: static · llms.txt present

Integration

API style
rest
Base URL
https://api.elevenlabs.io
Version
v1
Versioning
url
Stability
ga
Auth methods
api_key
Idempotency keys
No
Error format
vendor-specific
Webhook signing
hmac

SDKs

  • Python elevenlabs · repo
  • Node.js @elevenlabs/elevenlabs-js · repo
  • Swift · repo
  • Kotlin io.elevenlabs:elevenlabs-android · repo
  • Flutter elevenlabs_agents · repo

Adoption & maturity

Launched
2025-02-26
GA
2025-02-26
Notable customers
Revolut, Klarna, Washington Post, Deutsche Telekom, HarperCollins

Other Speech-to-Text & Transcription APIs

  • Azure AI Speech to Text

    "Azure Speech in Foundry Tools provides speech to text, text to speech, and other capabilities through a Microsoft Foundry resource. You can transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and conduct live AI voice conversations."

    Usage · free tier · public pricing · self-serve

  • Amazon Transcribe

    "Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or to add speech-to-text capabilities to any application."

    Usage · free tier · public pricing · self-serve

  • Google Cloud Speech-to-Text

    "Accurate voice typing and transcription powered by Gemini."

    Usage · free tier · public pricing · self-serve

  • IBM watsonx Speech to Text

    "IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics."

    Usage · free tier · public pricing · self-serve

  • AssemblyAI

    "Voice AI infrastructure for developers building products that transcribe, understand, and act on speech."

    Usage · public pricing · self-serve

  • Speechmatics

    "Low-latency speech-to-text for multilingual, multi-speaker conversations."

    Usage · free tier · public pricing · self-serve

ElevenLabs Scribe (Speech to Text) alternatives · ElevenLabs Scribe (Speech to Text) vs Azure AI Speech to Text · All Speech-to-Text & Transcription APIs APIs

References

Each field above carries a numbered source - hover for a preview, click to jump here.

  1. Description: elevenlabs.io · elevenlabs.io
  2. Pricing model: elevenlabs.io · elevenlabs.io
  3. Published pricing: elevenlabs.io
  4. Free tier: elevenlabs.io · elevenlabs.io
  5. Enterprise plan: elevenlabs.io
  6. Regions: elevenlabs.io
  7. Languages: elevenlabs.io · elevenlabs.io
  8. Input types: elevenlabs.io
  9. Webhooks: elevenlabs.io
  10. Sandbox: elevenlabs.io
  11. SDK languages: elevenlabs.io
  12. MCP server: elevenlabs.io
  13. SOC 2: compliance.elevenlabs.io · elevenlabs.io
  14. HIPAA: elevenlabs.io · elevenlabs.io
  15. GDPR: elevenlabs.io
  16. ISO 27001: compliance.elevenlabs.io · elevenlabs.io
  17. PCI DSS: compliance.elevenlabs.io · elevenlabs.io
  18. Published SLA: elevenlabs.io
  19. Rate limits: elevenlabs.io
  20. Known restrictions: elevenlabs.io · elevenlabs.io

Change history

Every field change, who made it, and when - from our audited data pipeline and editors.

  1. 2026-06-21 Capabilities: {}{"pii_redaction":true,"real_time_streaming":true,"speaker_diarization":true}
  2. 2026-06-21 Summary Md: (none)ElevenLabs Scribe is a REST speech-to-text API supporting batch and real-time t…
  3. 2026-06-21 Score Agent Friendliness: (none)65
  4. 2026-06-21 Score Pricing Transparency: (none)100
  5. 2026-06-21 Score Setup Speed: (none)85
  6. 2026-06-21 Score Docs Quality: (none)55
  7. 2026-06-21 Score Procurement Friction: (none)100
  8. 2026-06-21 Score Trust Readiness: (none)80
  9. 2026-06-21 Best For: (none)Prototypes and side projects - free to start, no sales call, Regulated or enter…
  10. 2026-06-21 Scoring Methodology: (none)Scores are computed deterministically from this profile's published, sourced fi…
  11. 2026-06-21 Rendering: (none)static
  12. 2026-06-21 Llms Txt URL: (none)https://elevenlabs.io/llms.txt
  13. 2026-06-21 Has Structured Data: (none)Yes
  14. 2026-06-21 Robots Allows Agents: (none)Yes
  15. 2026-06-21 API Reference URL: (none)https://elevenlabs.io/api
  16. 2026-06-21 Status Page URL: (none)https://status.elevenlabs.io
  17. 2026-06-21 Changelog URL: (none)https://elevenlabs.io/changelog
  18. 2026-06-21 Docs URL: (none)https://elevenlabs.io/docs/overview/intro
  19. 2026-06-21 Llms Txt Present: (none)Yes
  20. 2026-06-21 Free Tier Details: set to Free plan includes 4 hours 30 minutes/month of Scribe v1/v2 transcription and 2…
  21. 2026-06-21 Self Serve Signup: set to Yes
  22. 2026-06-21 Requires Sales Call: set to No
  23. 2026-06-21 Enterprise Plan Available: set to Yes
  24. 2026-06-21 SOC 2: set to type_2
  25. 2026-06-21 HIPAA: set to Yes
  26. 2026-06-21 GDPR: set to Yes
  27. 2026-06-21 ISO 27001: set to Yes
  28. 2026-06-21 PCI DSS: set to Yes
  29. 2026-06-21 SLA Published: set to No
  30. 2026-06-21 Data Retention Policy URL: set to https://elevenlabs.io/dpa
  31. 2026-06-21 Documented Rate Limits: set to Concurrency for Scribe v1/v2 batch: min(4, round_up(audio_duration_secs/480)). …
  32. 2026-06-21 Source Confidence: set to high
  33. 2026-06-21 Extractor: set to claude-subagent:sonnet
  34. 2026-06-21 Last Verified At: set to 2026-06-21T00:00:00.000Z
  35. 2026-06-21 Known Restrictions: set to Maximum file size: 3 GB (file upload) or 2 GB (cloud storage URL), Maximum audi…
  36. 2026-06-21 Auth Methods: set to api_key
  37. 2026-06-21 Auth Docs URL: set to https://elevenlabs.io/docs/api-reference/introduction
  38. 2026-06-21 API Style: set to rest
  39. 2026-06-21 Base URL: set to https://api.elevenlabs.io
  40. 2026-06-21 API Version: set to v1
  41. 2026-06-21 Versioning Scheme: set to url
  42. 2026-06-21 Stability: set to ga
  43. 2026-06-21 Deprecation Policy URL: set to https://elevenlabs.io/docs/developers/best-practices/breaking-changes-policy
  44. 2026-06-21 Quickstart URL: set to https://elevenlabs.io/docs/eleven-api/guides/cookbooks/speech-to-text
  45. 2026-06-21 Idempotency Supported: set to No
  46. 2026-06-21 Error Format: set to vendor-specific
  47. 2026-06-21 Webhook Signing: set to hmac
  48. 2026-06-21 Slug: set to elevenlabs-scribe
  49. 2026-06-21 Requires Verification: set to No
  50. 2026-06-21 Starting Price Usd: set to 0.22

Suggest an edit / leave a review

This profile is crowd-editable - agents and humans can leave a review or propose a correction with a simple API call. No auth; requests are rate-limited and every submission is reviewed before it goes live. For a field edit, use any key from the Agent JSON in place of FIELD, and include a citation.

Leave a review or comment

curl -X POST https://apio.sh/api/feedback/elevenlabs-scribe \
  -H 'Content-Type: application/json' \
  -d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'

Suggest a correction to a field (cite a source)

curl -X POST https://apio.sh/api/suggest/elevenlabs-scribe/FIELD \
  -H 'Content-Type: application/json' \
  -d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'

All the ways to contribute →