Amazon Transcribe
"Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or to add speech-to-text capabilities to any application." [1]
Amazon Transcribe is an automatic speech recognition service from AWS that converts audio to text via batch or real-time streaming, with support for speaker diarization, custom vocabularies, custom language models, and multi-language identification. It targets a broad range of applications including contact center analytics, clinical documentation through a dedicated medical variant, accessibility captioning, and toxic content detection in gaming. Pricing starts at $0.006 per minute on a pay-as-you-go basis, with a free tier of 60 minutes per month for the first 12 months. The service is HIPAA-eligible, SOC 2 Type 2 certified, ISO 27001 and PCI DSS compliant, available across 25 AWS regions including GovCloud, and provides SDKs for Python, JavaScript, Java, Go, C++, Ruby, and PHP.
Best for / Avoid if
Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; Teams needing broad API coverage out of the box
Pricing & procurement
- Pricing model
- Usage-based [2]
- Published pricing
- ✓ Yes [3]
- Free tier
- ✓ Yes [4]
- Free tier details
- 60 minutes per month for the first 12 months after account creation, shared across Amazon Transcribe standard, Call Analytics, and Transcribe Medical. Unused minutes do not roll over. [5]
- Self-serve signup
- ✓ Yes
- Requires sales call
- ✗ No
- Enterprise plan
- ✓ Yes [6]
| Plan | Item | Per | Amount | Source |
|---|---|---|---|---|
| Free Tier | Standard transcription (batch or streaming) | 60 minutes per month for first 12 months | $0 | source |
| Standard - Tier 1 | Batch transcription | minute (first 250,000 minutes/month) | $0.006 | source |
| Standard - Tier 2 | Batch transcription | minute (next 750,000 minutes/month) | $0.0042 | source |
| Standard - Tier 3 | Batch transcription | minute (over 1,000,000 minutes/month) | $0.0029 | source |
| Standard - Tier 1 | Streaming transcription | minute (first 250,000 minutes/month) | $0.01 | source |
| Standard - Tier 2 | Streaming transcription | minute (next 750,000 minutes/month) | $0.0062 | source |
| Standard - Tier 3 | Streaming transcription | minute (over 1,000,000 minutes/month) | $0.0042 | source |
| Custom Language Model - Tier 1 | CLM batch transcription | minute (first 250,000 minutes/month) | $0.007 | source |
| Custom Language Model - Tier 2 | CLM batch transcription | minute (next 750,000 minutes/month) | $0.0043 | source |
| Custom Language Model - Tier 3 | CLM batch transcription | minute (over 1,000,000 minutes/month) | $0.0031 | source |
| Custom Language Model - Tier 1 | CLM streaming transcription | minute (first 250,000 minutes/month) | $0.012 | source |
| Custom Language Model - Tier 2 | CLM streaming transcription | minute (next 750,000 minutes/month) | $0.0074 | source |
| Custom Language Model - Tier 3 | CLM streaming transcription | minute (over 1,000,000 minutes/month) | $0.0052 | source |
| Add-on - Tier 1 | Automatic content redaction (PII) - batch | minute (first 250,000 minutes/month) | $0.0024 | source |
| Add-on - Tier 2 | Automatic content redaction (PII) - batch | minute (next 750,000 minutes/month) | $0.0015 | source |
| Add-on - Tier 3 | Automatic content redaction (PII) - batch | minute (over 1,000,000 minutes/month) | $0.001 | source |
| Add-on - Tier 1 | Automatic content redaction (PII) - streaming | minute (first 250,000 minutes/month) | $0.003 | source |
| Add-on - Tier 2 | Automatic content redaction (PII) - streaming | minute (next 750,000 minutes/month) | $0.0019 | source |
| Add-on - Tier 3 | Automatic content redaction (PII) - streaming | minute (over 1,000,000 minutes/month) | $0.0013 | source |
| Add-on - Tier 1 | Toxicity detection - batch | minute (first 250,000 minutes/month) | $0.002 | source |
| Add-on - Tier 2 | Toxicity detection - batch | minute (next 750,000 minutes/month) | $0.0012 | source |
| Add-on - Tier 3 | Toxicity detection - batch | minute (over 1,000,000 minutes/month) | $0.0009 | source |
| Call Analytics - Tier 1 | Post-call analytics | minute (first 250,000 minutes/month) | $0.03 | source |
| Call Analytics - Tier 2 | Post-call analytics | minute (next 750,000 minutes/month) | $0.0186 | source |
| Call Analytics - Tier 3 | Post-call analytics | minute (next 4,000,000 minutes/month) | $0.0138 | source |
| Call Analytics - Tier 1 | Real-time call analytics | minute (first 250,000 minutes/month) | $0.0375 | source |
| Call Analytics - Tier 2 | Real-time call analytics | minute (next 750,000 minutes/month) | $0.0233 | source |
| Call Analytics - Tier 3 | Real-time call analytics | minute (next 4,000,000 minutes/month) | $0.0173 | source |
| Add-on - Tier 1 | Generative call summarization | minute (first 250,000 minutes/month) | $0.0024 | source |
| Add-on - Tier 2 | Generative call summarization | minute (next 750,000 minutes/month) | $0.0015 | source |
| Add-on - Tier 3 | Generative call summarization | minute (next 4,000,000 minutes/month) | $0.0011 | source |
| Transcribe Medical | Medical batch transcription | minute | $0.075 | source |
| Transcribe Medical | Medical streaming transcription | minute | $0.075 | source |
Capabilities
- Supported actions
- transcribe_batch, transcribe_streaming, speaker_diarization, language_detection, multi_language_identification, word_timestamps, confidence_scores, custom_vocabulary, custom_language_models, vocabulary_filtering, automatic_punctuation, channel_identification, pii_redaction, pii_identification, subtitles_generation, alternative_transcriptions, call_analytics_batch, call_analytics_streaming, sentiment_analysis, call_summarization, issue_detection, call_categorization, medical_transcription, phi_identification, job_queueing, content_redaction_audio [7]
- Regions
- US East (N. Virginia) us-east-1, US East (Ohio) us-east-2, US West (N. California) us-west-1, US West (Oregon) us-west-2, Africa (Cape Town) af-south-1, Asia Pacific (Hong Kong) ap-east-1, Asia Pacific (Mumbai) ap-south-1, Asia Pacific (Seoul) ap-northeast-2, Asia Pacific (Singapore) ap-southeast-1, Asia Pacific (Sydney) ap-southeast-2, Asia Pacific (Tokyo) ap-northeast-1, Asia Pacific (Malaysia) ap-southeast-5, Asia Pacific (Thailand) ap-southeast-7, Canada (Central) ca-central-1, Europe (Frankfurt) eu-central-1, Europe (Ireland) eu-west-1, Europe (London) eu-west-2, Europe (Paris) eu-west-3, Europe (Stockholm) eu-north-1, Europe (Zurich) eu-central-2, Middle East (Bahrain) me-south-1, Mexico (Central) mx-central-1, South America (São Paulo) sa-east-1, AWS GovCloud (US-East) us-gov-east-1, AWS GovCloud (US-West) us-gov-west-1 [8]
- Languages
- Abkhaz (ab-GE), Afrikaans (af-ZA), Albanian (sq-AL), Amharic (am-ET), Arabic Gulf (ar-AE), Arabic Modern Standard (ar-SA), Armenian (hy-AM), Asturian (ast-ES), Azerbaijani (az-AZ), Bashkir (ba-RU), Basque (eu-ES), Belarusian (be-BY), Bengali (bn-IN), Bosnian (bs-BA), Bulgarian (bg-BG), Burmese (my-MM), Catalan (ca-ES), Central Kurdish Iran (ckb-IR), Central Kurdish Iraq (ckb-IQ), Chinese Cantonese (zh-HK), Chinese Simplified (zh-CN), Chinese Traditional (zh-TW), Croatian (hr-HR), Czech (cs-CZ), Danish (da-DK), Dutch (nl-NL), English Australian (en-AU), English British (en-GB), English Indian (en-IN), English Irish (en-IE), English New Zealand (en-NZ), English Scottish (en-AB), English South African (en-ZA), English US (en-US), English Welsh (en-WL), Estonian (et-EE), Farsi (fa-IR), Farsi Afghan (fa-AF), Finnish (fi-FI), French (fr-FR), French Canadian (fr-CA), Galician (gl-ES), Georgian (ka-GE), German (de-DE), German Swiss (de-CH), Greek (el-GR), Gujarati (gu-IN), Haitian Creole (ht-HT), Hausa (ha-NG), Hebrew (he-IL), Hindi Indian (hi-IN), Hungarian (hu-HU), Icelandic (is-IS), Indonesian (id-ID), Italian (it-IT), Japanese (ja-JP), Javanese (jv-ID), Kabyle (kab-DZ), Kannada (kn-IN), Kazakh (kk-KZ), Khmer (km-KH), Kinyarwanda (rw-RW), Korean (ko-KR), Kyrgyz (ky-KG), Latvian (lv-LV), Lithuanian (lt-LT), Luganda (lg-IN), Macedonian (mk-MK), Malay (ms-MY), Malayalam (ml-IN), Maltese (mt-MT), Marathi (mr-IN), Meadow Mari (mhr-RU), Mongolian (mn-MN), Nepali (ne-NP), Norwegian Bokmål (no-NO), Odia/Oriya (or-IN), Pashto (ps-AF), Polish (pl-PL), Portuguese (pt-PT), Portuguese Brazilian (pt-BR), Punjabi (pa-IN), Romanian (ro-RO), Russian (ru-RU), Serbian (sr-RS), Sinhala (si-LK), Slovak (sk-SK), Slovenian (sl-SI), Somali (so-SO), Spanish (es-ES), Spanish Mexican (es-MX), Spanish US (es-US), Sundanese (su-ID), Swahili Kenya (sw-KE), Swahili Burundi (sw-BI), Swahili Rwanda (sw-RW), Swahili Tanzania (sw-TZ), Swahili Uganda (sw-UG), Swedish (sv-SE), Tagalog/Filipino (tl-PH), Tamil (ta-IN), Tatar (tt-RU), Telugu (te-IN), Thai (th-TH), Turkish (tr-TR), Ukrainian (uk-UA), Uyghur (ug-CN), Uzbek (uz-UZ), Vietnamese (vi-VN), Welsh (cy-WL), Wolof (wo-SN), Zulu (zu-ZA) [9]
- Input types
- audio file via Amazon S3 (batch), media stream via HTTP/2 (streaming), media stream via WebSocket (streaming), FLAC (recommended lossless), WAV with PCM 16-bit encoding (recommended lossless), single-channel audio, dual-channel audio, sample rates 8,000 Hz to 48,000 Hz
- Output types
- JSON transcript with full text, word-level timestamps (start time, end time), confidence scores per word, speaker-labeled transcript (diarization), channel-identified transcript, SRT/VTT subtitles (batch), redacted transcript (PII removed), call analytics JSON with sentiment and categories
- Webhooks
- ✗ No [10]
- Sandbox / test mode
- ✗ No [11]
- SDK languages
- Python (batch), Python (streaming), JavaScript/Node.js (streaming), Java V2 (streaming), C++ (streaming), Ruby V3, PHP V3, Go [12]
- MCP server
- ✗ No
Trust & compliance
- SOC 2
- SOC 2 Type II [13]
- HIPAA
- ✓ Yes [14]
- GDPR
- ✓ Yes [15]
- ISO 27001
- ✓ Yes [16]
- PCI DSS
- ✓ Yes [17]
- Published SLA
- ✓ Yes [18]
- Rate limits
- Concurrent transcription jobs: 250 (adjustable). Concurrent streams (HTTP/2 + WebSocket): 25 (adjustable). StartTranscriptionJob: 25 TPS (adjustable). StartStreamTranscription: 25 TPS (adjustable). Maximum audio file length: 28,800 seconds (8 hours). Maximum audio file size: 2 GB. Minimum audio file duration: 500 milliseconds. Job records retained: 90 days. [19]
- Known restrictions
- Maximum audio file length: 28,800 seconds (8 hours) for standard batch, Maximum audio file size: 2 GB, Maximum audio file length for Medical batch: 14,400 seconds (4 hours), Maximum audio file length for Call Analytics batch: 14,400 seconds (4 hours), Streaming sessions limited to 4 hours per open connection, Media with more than two channels is not currently supported, Amazon Transcribe Medical is only available in US English, Automatic content redaction does not remove PII from source audio files, only transcripts, Custom language model training limited to 5 concurrent jobs and 10 models per account by default, Billing in one-second increments with a 15-second minimum per request [20]
Developer surface
Integration
- API style
- rest
- Base URL
- https://transcribe.{region}.amazonaws.com
- Versioning
- none
- Stability
- ga
- Auth methods
- hmac_signature
- Error format
- vendor-specific
- Rate limit
- 25 / second
- Python (batch)
boto3· repo - Python (streaming)
amazon-transcribe· repo - JavaScript/Node.js (streaming)
@aws-sdk/client-transcribe-streaming· repo - Java V2 (streaming)
software.amazon.awssdk:transcribestreaming· repo - C++ (streaming)
aws-cpp-sdk-transcribestreaming· repo - Ruby V3
aws-sdk-transcribestreamingservice· repo - PHP V3
aws/aws-sdk-php· repo - Go
github.com/aws/aws-sdk-go-v2/service/transcribestreaming· repo
Adoption & maturity
- Launched
- 2017-11-29
- GA
- 2018-04-04
Other Speech-to-Text & Transcription APIs
ElevenLabs Scribe (Speech to Text)
"Scribe v2 is the most accurate Speech to Text model" offering "real-time Speech to Text in under 150 ms" across "90+ languages."
Azure AI Speech to Text
"Azure Speech in Foundry Tools provides speech to text, text to speech, and other capabilities through a Microsoft Foundry resource. You can transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and conduct live AI voice conversations."
Google Cloud Speech-to-Text
"Accurate voice typing and transcription powered by Gemini."
IBM watsonx Speech to Text
"IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics."
AssemblyAI
"Voice AI infrastructure for developers building products that transcribe, understand, and act on speech."
Speechmatics
"Low-latency speech-to-text for multilingual, multi-speaker conversations."
References
- ↑Description: docs.aws.amazon.com · aws.amazon.com
- ↑Pricing model: aws.amazon.com · docs.aws.amazon.com
- ↑Published pricing: aws.amazon.com
- ↑Free tier: aws.amazon.com · aws.amazon.com
- ↑Free tier details: aws.amazon.com
- ↑Enterprise plan: aws.amazon.com
- ↑Supported actions: docs.aws.amazon.com
- ↑Regions: docs.aws.amazon.com
- ↑Languages: docs.aws.amazon.com
- ↑Webhooks: docs.aws.amazon.com
- ↑Sandbox: docs.aws.amazon.com
- ↑SDK languages: docs.aws.amazon.com
- ↑SOC 2: aws.amazon.com
- ↑HIPAA: docs.aws.amazon.com · aws.amazon.com
- ↑GDPR: aws.amazon.com
- ↑ISO 27001: aws.amazon.com
- ↑PCI DSS: aws.amazon.com
- ↑Published SLA: aws.amazon.com
- ↑Rate limits: docs.aws.amazon.com
- ↑Known restrictions: docs.aws.amazon.com · docs.aws.amazon.com
Change history
- 2026-06-21 Capabilities: {} → {"medical":true,"pii_redaction":true,"real_time_streaming":true,"speaker_diariz…
- 2026-06-21 Summary Md: (none) → Amazon Transcribe is an automatic speech recognition service from AWS that conv…
- 2026-06-21 Score Setup Speed: (none) → 85
- 2026-06-21 Score Pricing Transparency: (none) → 100
- 2026-06-21 Score Docs Quality: (none) → 15
- 2026-06-21 Score Procurement Friction: (none) → 100
- 2026-06-21 Score Trust Readiness: (none) → 100
- 2026-06-21 Best For: (none) → Prototypes and side projects - free to start, no sales call, Regulated or enter…
- 2026-06-21 Scoring Methodology: (none) → Scores are computed deterministically from this profile's published, sourced fi…
- 2026-06-21 Score Agent Friendliness: (none) → 30
- 2026-06-21 Has Structured Data: (none) → Yes
- 2026-06-21 Status Page URL: (none) → https://status.aws.amazon.com
- 2026-06-21 Docs URL: (none) → https://docs.aws.amazon.com/
- 2026-06-21 Llms Txt Present: (none) → No
- 2026-06-21 Rendering: (none) → static
- 2026-06-21 Robots Allows Agents: (none) → Yes
- 2026-06-21 SDK Packages: set to Python (batch), Python (streaming), JavaScript/Node.js (streaming), Java V2 (st…
- 2026-06-21 MCP Server Available: set to No
- 2026-06-21 Pricing Model: set to usage_based
- 2026-06-21 Has Published Pricing: set to Yes
- 2026-06-21 Free Tier Available: set to Yes
- 2026-06-21 Free Tier Details: set to 60 minutes per month for the first 12 months after account creation, shared acr…
- 2026-06-21 Self Serve Signup: set to Yes
- 2026-06-21 Requires Sales Call: set to No
- 2026-06-21 Enterprise Plan Available: set to Yes
- 2026-06-21 SOC 2: set to type_2
- 2026-06-21 HIPAA: set to Yes
- 2026-06-21 GDPR: set to Yes
- 2026-06-21 ISO 27001: set to Yes
- 2026-06-21 PCI DSS: set to Yes
- 2026-06-21 SLA Published: set to Yes
- 2026-06-21 SLA URL: set to https://aws.amazon.com/ai/services/language-sla/
- 2026-06-21 Data Retention Policy URL: set to https://docs.aws.amazon.com/transcribe/latest/dg/opt-out.html
- 2026-06-21 Documented Rate Limits: set to Concurrent transcription jobs: 250 (adjustable). Concurrent streams (HTTP/2 + W…
- 2026-06-21 Rate Limit Requests: set to 25
- 2026-06-21 Rate Limit Window: set to second
- 2026-06-21 Auth Methods: set to hmac_signature
- 2026-06-21 Auth Docs URL: set to https://docs.aws.amazon.com/transcribe/latest/dg/security-iam.html
- 2026-06-21 API Style: set to rest
- 2026-06-21 Base URL: set to https://transcribe.{region}.amazonaws.com
- 2026-06-21 Stability: set to ga
- 2026-06-21 Quickstart URL: set to https://docs.aws.amazon.com/transcribe/latest/dg/getting-started.html
- 2026-06-21 Error Format: set to vendor-specific
- 2026-06-21 Requires Verification: set to No
- 2026-06-21 Starting Price Usd: set to 0.006
- 2026-06-21 Price Basis: set to minute
- 2026-06-21 Free Tier Limit: set to 60 minutes/month for 12 months
- 2026-06-21 Launched At: set to 2017-11-29
- 2026-06-21 GA Date: set to 2018-04-04
- 2026-06-21 Notable Customers: set to (none)
Suggest an edit / leave a review
Leave a review or comment
curl -X POST https://apio.sh/api/feedback/aws-transcribe \
-H 'Content-Type: application/json' \
-d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'Suggest a correction to a field (cite a source)
curl -X POST https://apio.sh/api/suggest/aws-transcribe/FIELD \
-H 'Content-Type: application/json' \
-d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'