Speechmatics

"Low-latency speech-to-text for multilingual, multi-speaker conversations." [1]

Speech-to-Text & Transcription APIs

www.speechmatics.com · By Speechmatics · Agent JSON · Suggest an edit · Last verified 2026-06-21 · Source confidence: high

Speechmatics is a speech-to-text API supporting batch and real-time transcription across EU, US, and Australia regions, with capabilities including speaker diarization, language detection, translation, summarization, and audio event detection, making it suited for contact centers, legal, medical, and broadcast use cases. Pricing starts at $0.0022 per minute with a free tier of 3,000 minutes per month and self-serve signup, scaling to enterprise plans with dedicated regional endpoints. The API is REST-based with SDK support for Python, Node.js, .NET, and Rust, and holds SOC 2 Type 2, HIPAA, GDPR, and ISO 27001 certifications.

Best for / Avoid if

Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; Teams needing broad API coverage out of the box

Pricing & procurement

Pricing model: Usage-based [2]
Published pricing: Yes [3]
Free tier: Yes [4]
Free tier details: Free 3,000 minutes (50 hours) of STT per month; no credit card required. Free tier caps: 2 concurrent real-time sessions, 10 batch hours/month. (TTS free tier: 1 million characters/month - excluded from STT pricing scope.) [5]
Self-serve signup: Yes
Requires sales call: No
Enterprise plan: Yes [6]

Published prices
Plan	Item	Per	Amount	Source
Free	Speech-to-text transcription (batch and real-time, Standard model)	3,000 minutes (50 hours) per month included	$0	source
Pro	Batch transcription — Standard model	hour of audio	$0.8	source
Pro	Batch transcription — Enhanced model	hour of audio	$1.04	source
Pro	Real-time (streaming) transcription — Standard model	hour of audio	$1.04	source
Pro	Real-time (streaming) transcription — Enhanced model	hour of audio	$1.35	source
Pro	Volume discount on batch or real-time transcription above 500 hours/month	hour of audio (applied automatically above 500 hours/month per STT type)	20%	source
Enterprise	Batch or real-time transcription	custom (contact sales; additional volume discounts from 24,000 hours/year)	-	source

Capabilities

Real-time streaming
Speaker diarization
Speech translation
Medical transcription

Supported actions: transcribe_batch, transcribe_streaming, speaker_diarization, speaker_identification, language_detection, word_timestamps, translation, summarization, sentiment_analysis, chapter_generation, custom_dictionary, audio_event_detection, text_to_speech, voice_agents_flow
Regions: EU (eu1.asr.api.speechmatics.com), EU2 - enterprise only (eu2.asr.api.speechmatics.com), US (us1.asr.api.speechmatics.com), US2 - enterprise only (us2.asr.api.speechmatics.com), Australia (au1.asr.api.speechmatics.com) [7]
Languages: Arabic, Bashkir, Basque, Belarusian, Bengali, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Interlingua, Irish, Italian, Japanese, Korean, Latvian, Lithuanian, Malay, Maltese, Mandarin, Marathi, Mongolian, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Slovakian, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Uyghur, Vietnamese, Welsh [8]speechmatics.com/languages“One API, 56+ languages, covering over half the world's population. Arabic, Bashkir, Basque, Belarusian, Bengali, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Interlingua, Irish, Italian, Indonesian, Japanese, Korean, Latvian, Lithuanian, Malay, Maltese, Mandarin, Marathi, Mongolian, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Slovakian, Slovenian, Spanish, Swahili, Swedish, Tamil, Tagalog, Thai, Turkish, Urdu, Uyghur, Ukrainian, Vietnamese, and Welsh.”
Input types: audio/wav, audio/mp3, audio/aac, audio/ogg, audio/mpeg, audio/amr, audio/m4a, video/mp4, audio/flac, PCM_S16LE (real-time WebSocket streaming) [9]
Output types: JSON (json-v2), Plain text (.txt), SRT subtitles, Word-level timestamps, Speaker diarization output, Alignment output (word_start_and_end, one_per_line), Translation output, Sentiment analysis, Summarization, Chapter markers
Webhooks: Yes [10]
Sandbox / test mode: No [11]
SDK languages: Python, Node.js, .NET, Rust [12]
MCP server: No

Trust & compliance

SOC 2: SOC 2 Type II [13]
HIPAA: Yes [14]
GDPR: Yes [15]
ISO 27001: Yes [16]
PCI DSS: No [17]
Published SLA: No [18]
Rate limits: Batch: 10 new jobs/second (POST); 50 job status requests/second (GET); 20,000 concurrent jobs max. Free tier: 2 concurrent real-time sessions, 10 hours/month batch. Paid tier: 50 concurrent real-time sessions, 6,000 hours/month. File size limit: <1 GB per direct upload. Max session duration: 48 hours. Max 50 speaker identifiers. Translation: max 5 target languages per request. [19]
Known restrictions: Supported batch input formats are exhaustive: wav, mp3, aac, ogg, mpeg, amr, m4a, mp4, flac only, Raw audio formats without embedded codec cannot be processed in batch mode, File size limit: less than 1 GB when submitting directly in request body (larger files must use URL), Maximum audio duration for batch jobs: 2 hours, Data retention: audio files and transcripts deleted after 7 days, Melia 1 model available in EU1 and US1 regions only, Pro tier capped at 6,000 hours/month; Enterprise for higher volumes, Maximum 50 speaker identifiers across all speakers, Translation: maximum 5 target languages per request, Real-time sessions auto-terminate after 48 hours, or 1 hour of no audio, or 3 minutes of no activity/pings [20]

Developer surface

Docs rendering: static

Integration

API style: rest
Base URL: https://asr.api.speechmatics.com/v2
Version: v2
Versioning: url
Stability: ga
Auth methods: api_key, jwt
Error format: vendor-specific
Rate limit: 10 / second

SDKs

Python speechmatics-batch · repo
Python speechmatics-rt · repo
Python speechmatics-voice · repo
Python speechmatics-tts · repo
Node.js @speechmatics/batch-client · repo
Node.js @speechmatics/real-time-client · repo
Node.js @speechmatics/flow-client · repo
.NET · repo
Rust · repo

Adoption & maturity

Launched: 2006-01-01
Notable customers: what3words, 3Play Media, Veritone, Deloitte UK, Vonage

Other Speech-to-Text & Transcription APIs

ElevenLabs Scribe (Speech to Text)
"Scribe v2 is the most accurate Speech to Text model" offering "real-time Speech to Text in under 150 ms" across "90+ languages."
Hybrid · free tier · public pricing · self-serve
Azure AI Speech to Text
"Azure Speech in Foundry Tools provides speech to text, text to speech, and other capabilities through a Microsoft Foundry resource. You can transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and conduct live AI voice conversations."
Usage · free tier · public pricing · self-serve
Amazon Transcribe
"Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or to add speech-to-text capabilities to any application."
Usage · free tier · public pricing · self-serve
Google Cloud Speech-to-Text
"Accurate voice typing and transcription powered by Gemini."
Usage · free tier · public pricing · self-serve
IBM watsonx Speech to Text
"IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics."
Usage · free tier · public pricing · self-serve
AssemblyAI
"Voice AI infrastructure for developers building products that transcribe, understand, and act on speech."
Usage · public pricing · self-serve

Speechmatics alternatives · Speechmatics vs ElevenLabs Scribe (Speech to Text) · All Speech-to-Text & Transcription APIs APIs

References

Each field above carries a numbered source - hover for a preview, click to jump here.

↑Description: speechmatics.com
↑Pricing model: speechmatics.com
↑Published pricing: speechmatics.com
↑Free tier: speechmatics.com
↑Free tier details: speechmatics.com · docs.speechmatics.com
↑Enterprise plan: speechmatics.com
↑Regions: docs.speechmatics.com
↑Languages: speechmatics.com
↑Input types: docs.speechmatics.com
↑Webhooks: docs.speechmatics.com
↑Sandbox: speechmatics.com
↑SDK languages: docs.speechmatics.com
↑SOC 2: speechmatics.com
↑HIPAA: speechmatics.com
↑GDPR: speechmatics.com
↑ISO 27001: speechmatics.com
↑PCI DSS: speechmatics.com
↑Published SLA: speechmatics.com
↑Rate limits: docs.speechmatics.com
↑Known restrictions: docs.speechmatics.com · docs.speechmatics.com

Change history

Every field change, who made it, and when - from our audited data pipeline and editors.

2026-06-21 Capabilities: {} → {"medical":true,"translation":true,"real_time_streaming":true,"speaker_diarizat…
2026-06-21 Summary Md: (none) → Speechmatics is a speech-to-text API supporting batch and real-time transcripti…
2026-06-21 Score Docs Quality: (none) → 15
2026-06-21 Score Procurement Friction: (none) → 100
2026-06-21 Score Trust Readiness: (none) → 70
2026-06-21 Best For: (none) → Prototypes and side projects - free to start, no sales call, Regulated or enter…
2026-06-21 Scoring Methodology: (none) → Scores are computed deterministically from this profile's published, sourced fi…
2026-06-21 Score Agent Friendliness: (none) → 30
2026-06-21 Score Pricing Transparency: (none) → 100
2026-06-21 Score Setup Speed: (none) → 85
2026-06-21 Llms Txt Present: (none) → No
2026-06-21 Has Structured Data: (none) → Yes
2026-06-21 Robots Allows Agents: (none) → Yes
2026-06-21 Status Page URL: (none) → https://status.speechmatics.com
2026-06-21 Docs URL: (none) → https://docs.speechmatics.com/
2026-06-21 Rendering: (none) → static
2026-06-21 Pricing Model: set to usage_based
2026-06-21 Has Published Pricing: set to Yes
2026-06-21 Free Tier Available: set to Yes
2026-06-21 Error Format: set to vendor-specific
2026-06-21 Free Tier Details: set to Free 3,000 minutes (50 hours) of STT per month; no credit card required. Free t…
2026-06-21 Self Serve Signup: set to Yes
2026-06-21 Requires Sales Call: set to No
2026-06-21 Enterprise Plan Available: set to Yes
2026-06-21 SOC 2: set to type_2
2026-06-21 HIPAA: set to Yes
2026-06-21 GDPR: set to Yes
2026-06-21 ISO 27001: set to Yes
2026-06-21 PCI DSS: set to No
2026-06-21 SLA Published: set to No
2026-06-21 Data Retention Policy URL: set to https://www.speechmatics.com/legal/privacy-policy
2026-06-21 Documented Rate Limits: set to Batch: 10 new jobs/second (POST); 50 job status requests/second (GET); 20,000 c…
2026-06-21 Rate Limit Requests: set to 10
2026-06-21 Rate Limit Window: set to second
2026-06-21 Known Restrictions: set to Supported batch input formats are exhaustive: wav, mp3, aac, ogg, mpeg, amr, m4…
2026-06-21 Auth Methods: set to api_key, jwt
2026-06-21 Auth Docs URL: set to https://docs.speechmatics.com/get-started/authentication
2026-06-21 API Style: set to rest
2026-06-21 Base URL: set to https://asr.api.speechmatics.com/v2
2026-06-21 API Version: set to v2
2026-06-21 Versioning Scheme: set to url
2026-06-21 Stability: set to ga
2026-06-21 Quickstart URL: set to https://docs.speechmatics.com/get-started/quickstart
2026-06-21 Slug: set to speechmatics
2026-06-21 Requires Verification: set to No
2026-06-21 Starting Price Usd: set to 0.0022
2026-06-21 Price Basis: set to minute
2026-06-21 Free Tier Limit: set to 3000 minutes (50 hours) per month
2026-06-21 Launched At: set to 2006-01-01
2026-06-21 Notable Customers: set to what3words, 3Play Media, Veritone, Deloitte UK, Vonage

Suggest an edit / leave a review

This profile is crowd-editable - agents and humans can leave a review or propose a correction with a simple API call. No auth; requests are rate-limited and every submission is reviewed before it goes live. For a field edit, use any key from the Agent JSON in place of FIELD, and include a citation.

Leave a review or comment

curl -X POST https://apio.sh/api/feedback/speechmatics \
  -H 'Content-Type: application/json' \
  -d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'

Suggest a correction to a field (cite a source)

curl -X POST https://apio.sh/api/suggest/speechmatics/FIELD \
  -H 'Content-Type: application/json' \
  -d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'

All the ways to contribute →

Best for / Avoid if

Pricing & procurement

Capabilities

Trust & compliance

Developer surface

Integration

Adoption & maturity

Other Speech-to-Text & Transcription APIs

ElevenLabs Scribe (Speech to Text)

Azure AI Speech to Text

Amazon Transcribe

Google Cloud Speech-to-Text

IBM watsonx Speech to Text

AssemblyAI

References

Change history

Suggest an edit / leave a review