Speechmatics
"Low-latency speech-to-text for multilingual, multi-speaker conversations." [1]
Speechmatics is a speech-to-text API supporting batch and real-time transcription across EU, US, and Australia regions, with capabilities including speaker diarization, language detection, translation, summarization, and audio event detection, making it suited for contact centers, legal, medical, and broadcast use cases. Pricing starts at $0.0022 per minute with a free tier of 3,000 minutes per month and self-serve signup, scaling to enterprise plans with dedicated regional endpoints. The API is REST-based with SDK support for Python, Node.js, .NET, and Rust, and holds SOC 2 Type 2, HIPAA, GDPR, and ISO 27001 certifications.
Best for / Avoid if
Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; Teams needing broad API coverage out of the box
Pricing & procurement
- Pricing model
- Usage-based [2]
- Published pricing
- ✓ Yes [3]
- Free tier
- ✓ Yes [4]
- Free tier details
- Free 3,000 minutes (50 hours) of STT per month; no credit card required. Free tier caps: 2 concurrent real-time sessions, 10 batch hours/month. (TTS free tier: 1 million characters/month - excluded from STT pricing scope.) [5]
- Self-serve signup
- ✓ Yes
- Requires sales call
- ✗ No
- Enterprise plan
- ✓ Yes [6]
| Plan | Item | Per | Amount | Source |
|---|---|---|---|---|
| Free | Speech-to-text transcription (batch and real-time, Standard model) | 3,000 minutes (50 hours) per month included | $0 | source |
| Pro | Batch transcription — Standard model | hour of audio | $0.8 | source |
| Pro | Batch transcription — Enhanced model | hour of audio | $1.04 | source |
| Pro | Real-time (streaming) transcription — Standard model | hour of audio | $1.04 | source |
| Pro | Real-time (streaming) transcription — Enhanced model | hour of audio | $1.35 | source |
| Pro | Volume discount on batch or real-time transcription above 500 hours/month | hour of audio (applied automatically above 500 hours/month per STT type) | 20% | source |
| Enterprise | Batch or real-time transcription | custom (contact sales; additional volume discounts from 24,000 hours/year) | - | source |
Capabilities
- Supported actions
- transcribe_batch, transcribe_streaming, speaker_diarization, speaker_identification, language_detection, word_timestamps, translation, summarization, sentiment_analysis, chapter_generation, custom_dictionary, audio_event_detection, text_to_speech, voice_agents_flow
- Regions
- EU (eu1.asr.api.speechmatics.com), EU2 - enterprise only (eu2.asr.api.speechmatics.com), US (us1.asr.api.speechmatics.com), US2 - enterprise only (us2.asr.api.speechmatics.com), Australia (au1.asr.api.speechmatics.com) [7]
- Languages
- Arabic, Bashkir, Basque, Belarusian, Bengali, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Interlingua, Irish, Italian, Japanese, Korean, Latvian, Lithuanian, Malay, Maltese, Mandarin, Marathi, Mongolian, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Slovakian, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Uyghur, Vietnamese, Welsh [8]
- Input types
- audio/wav, audio/mp3, audio/aac, audio/ogg, audio/mpeg, audio/amr, audio/m4a, video/mp4, audio/flac, PCM_S16LE (real-time WebSocket streaming) [9]
- Output types
- JSON (json-v2), Plain text (.txt), SRT subtitles, Word-level timestamps, Speaker diarization output, Alignment output (word_start_and_end, one_per_line), Translation output, Sentiment analysis, Summarization, Chapter markers
- Webhooks
- ✓ Yes [10]
- Sandbox / test mode
- ✗ No [11]
- SDK languages
- Python, Node.js, .NET, Rust [12]
- MCP server
- ✗ No
Trust & compliance
- SOC 2
- SOC 2 Type II [13]
- HIPAA
- ✓ Yes [14]
- GDPR
- ✓ Yes [15]
- ISO 27001
- ✓ Yes [16]
- PCI DSS
- ✗ No [17]
- Published SLA
- ✗ No [18]
- Rate limits
- Batch: 10 new jobs/second (POST); 50 job status requests/second (GET); 20,000 concurrent jobs max. Free tier: 2 concurrent real-time sessions, 10 hours/month batch. Paid tier: 50 concurrent real-time sessions, 6,000 hours/month. File size limit: <1 GB per direct upload. Max session duration: 48 hours. Max 50 speaker identifiers. Translation: max 5 target languages per request. [19]
- Known restrictions
- Supported batch input formats are exhaustive: wav, mp3, aac, ogg, mpeg, amr, m4a, mp4, flac only, Raw audio formats without embedded codec cannot be processed in batch mode, File size limit: less than 1 GB when submitting directly in request body (larger files must use URL), Maximum audio duration for batch jobs: 2 hours, Data retention: audio files and transcripts deleted after 7 days, Melia 1 model available in EU1 and US1 regions only, Pro tier capped at 6,000 hours/month; Enterprise for higher volumes, Maximum 50 speaker identifiers across all speakers, Translation: maximum 5 target languages per request, Real-time sessions auto-terminate after 48 hours, or 1 hour of no audio, or 3 minutes of no activity/pings [20]
Developer surface
Integration
- API style
- rest
- Base URL
- https://asr.api.speechmatics.com/v2
- Version
- v2
- Versioning
- url
- Stability
- ga
- Auth methods
- api_key, jwt
- Error format
- vendor-specific
- Rate limit
- 10 / second
Adoption & maturity
- Launched
- 2006-01-01
- Notable customers
- what3words, 3Play Media, Veritone, Deloitte UK, Vonage
Other Speech-to-Text & Transcription APIs
ElevenLabs Scribe (Speech to Text)
"Scribe v2 is the most accurate Speech to Text model" offering "real-time Speech to Text in under 150 ms" across "90+ languages."
Azure AI Speech to Text
"Azure Speech in Foundry Tools provides speech to text, text to speech, and other capabilities through a Microsoft Foundry resource. You can transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and conduct live AI voice conversations."
Amazon Transcribe
"Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or to add speech-to-text capabilities to any application."
Google Cloud Speech-to-Text
"Accurate voice typing and transcription powered by Gemini."
IBM watsonx Speech to Text
"IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics."
AssemblyAI
"Voice AI infrastructure for developers building products that transcribe, understand, and act on speech."
References
- ↑Description: speechmatics.com
- ↑Pricing model: speechmatics.com
- ↑Published pricing: speechmatics.com
- ↑Free tier: speechmatics.com
- ↑Free tier details: speechmatics.com · docs.speechmatics.com
- ↑Enterprise plan: speechmatics.com
- ↑Regions: docs.speechmatics.com
- ↑Languages: speechmatics.com
- ↑Input types: docs.speechmatics.com
- ↑Webhooks: docs.speechmatics.com
- ↑Sandbox: speechmatics.com
- ↑SDK languages: docs.speechmatics.com
- ↑SOC 2: speechmatics.com
- ↑HIPAA: speechmatics.com
- ↑GDPR: speechmatics.com
- ↑ISO 27001: speechmatics.com
- ↑PCI DSS: speechmatics.com
- ↑Published SLA: speechmatics.com
- ↑Rate limits: docs.speechmatics.com
- ↑Known restrictions: docs.speechmatics.com · docs.speechmatics.com
Change history
- 2026-06-21 Capabilities: {} → {"medical":true,"translation":true,"real_time_streaming":true,"speaker_diarizat…
- 2026-06-21 Summary Md: (none) → Speechmatics is a speech-to-text API supporting batch and real-time transcripti…
- 2026-06-21 Score Docs Quality: (none) → 15
- 2026-06-21 Score Procurement Friction: (none) → 100
- 2026-06-21 Score Trust Readiness: (none) → 70
- 2026-06-21 Best For: (none) → Prototypes and side projects - free to start, no sales call, Regulated or enter…
- 2026-06-21 Scoring Methodology: (none) → Scores are computed deterministically from this profile's published, sourced fi…
- 2026-06-21 Score Agent Friendliness: (none) → 30
- 2026-06-21 Score Pricing Transparency: (none) → 100
- 2026-06-21 Score Setup Speed: (none) → 85
- 2026-06-21 Llms Txt Present: (none) → No
- 2026-06-21 Has Structured Data: (none) → Yes
- 2026-06-21 Robots Allows Agents: (none) → Yes
- 2026-06-21 Status Page URL: (none) → https://status.speechmatics.com
- 2026-06-21 Docs URL: (none) → https://docs.speechmatics.com/
- 2026-06-21 Rendering: (none) → static
- 2026-06-21 Pricing Model: set to usage_based
- 2026-06-21 Has Published Pricing: set to Yes
- 2026-06-21 Free Tier Available: set to Yes
- 2026-06-21 Error Format: set to vendor-specific
- 2026-06-21 Free Tier Details: set to Free 3,000 minutes (50 hours) of STT per month; no credit card required. Free t…
- 2026-06-21 Self Serve Signup: set to Yes
- 2026-06-21 Requires Sales Call: set to No
- 2026-06-21 Enterprise Plan Available: set to Yes
- 2026-06-21 SOC 2: set to type_2
- 2026-06-21 HIPAA: set to Yes
- 2026-06-21 GDPR: set to Yes
- 2026-06-21 ISO 27001: set to Yes
- 2026-06-21 PCI DSS: set to No
- 2026-06-21 SLA Published: set to No
- 2026-06-21 Data Retention Policy URL: set to https://www.speechmatics.com/legal/privacy-policy
- 2026-06-21 Documented Rate Limits: set to Batch: 10 new jobs/second (POST); 50 job status requests/second (GET); 20,000 c…
- 2026-06-21 Rate Limit Requests: set to 10
- 2026-06-21 Rate Limit Window: set to second
- 2026-06-21 Known Restrictions: set to Supported batch input formats are exhaustive: wav, mp3, aac, ogg, mpeg, amr, m4…
- 2026-06-21 Auth Methods: set to api_key, jwt
- 2026-06-21 Auth Docs URL: set to https://docs.speechmatics.com/get-started/authentication
- 2026-06-21 API Style: set to rest
- 2026-06-21 Base URL: set to https://asr.api.speechmatics.com/v2
- 2026-06-21 API Version: set to v2
- 2026-06-21 Versioning Scheme: set to url
- 2026-06-21 Stability: set to ga
- 2026-06-21 Quickstart URL: set to https://docs.speechmatics.com/get-started/quickstart
- 2026-06-21 Slug: set to speechmatics
- 2026-06-21 Requires Verification: set to No
- 2026-06-21 Starting Price Usd: set to 0.0022
- 2026-06-21 Price Basis: set to minute
- 2026-06-21 Free Tier Limit: set to 3000 minutes (50 hours) per month
- 2026-06-21 Launched At: set to 2006-01-01
- 2026-06-21 Notable Customers: set to what3words, 3Play Media, Veritone, Deloitte UK, Vonage
Suggest an edit / leave a review
Leave a review or comment
curl -X POST https://apio.sh/api/feedback/speechmatics \
-H 'Content-Type: application/json' \
-d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'Suggest a correction to a field (cite a source)
curl -X POST https://apio.sh/api/suggest/speechmatics/FIELD \
-H 'Content-Type: application/json' \
-d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'