AssemblyAI

"Voice AI infrastructure for developers building products that transcribe, understand, and act on speech." [1]

Speech-to-Text & Transcription APIs

www.assemblyai.com · By AssemblyAI · Agent JSON · Suggest an edit · Last verified 2026-06-21 · Source confidence: high

AssemblyAI is a voice AI platform providing speech-to-text transcription, speaker diarization, and audio intelligence features via REST API, aimed at developers building products on top of speech data. Pricing is usage-based at $0.0025 per minute with a $50 one-time free credit requiring no credit card, and enterprise plans are available. The service holds SOC 2 Type II, HIPAA, GDPR, ISO 27001, and PCI DSS certifications, with data processed in the US and EU. Customers include Zoom, Spotify, and Dovetail, and SDKs are actively maintained for Python and Node.js.

Best for / Avoid if

Best for: Regulated or enterprise workloads - compliance attestations and an enterprise plan; AI agents and automation - an agent-ready surface (MCP / llms.txt); Teams needing broad API coverage out of the box

Avoid if: You want to try it free before paying

Pricing & procurement

Pricing model: Usage-based [2]
Published pricing: Yes [3]
Free tier: No [4]
Free tier details: $50 one-time credit on signup, no credit card required (not a recurring free tier)
Self-serve signup: Yes [5]
Requires sales call: No
Enterprise plan: Yes [6]

Published prices
Plan	Item	Per	Amount	Source
Pay As You Go	Batch transcription (Universal-3 Pro model)	hour of audio	$0.21	source
Pay As You Go	Batch transcription (Universal-2 model)	hour of audio	$0.15	source
Pay As You Go	Streaming transcription (Universal-3 Pro Streaming / u3-rt-pro)	hour of audio	$0.45	source
Pay As You Go	Streaming transcription (Universal-Streaming English)	hour of audio	$0.15	source
Pay As You Go	Streaming transcription (Universal-Streaming Multilingual)	hour of audio	$0.15	source
Pay As You Go	Speaker Identification add-on	hour of audio	$0.02	source
Pay As You Go	Translation add-on	hour of audio	$0.06	source
Pay As You Go	Custom Formatting add-on	hour of audio	$0.03	source
Pay As You Go	Entity Detection add-on	hour of audio	$0.08	source
Pay As You Go	Sentiment Analysis add-on	hour of audio	$0.02	source
Pay As You Go	Key Phrases / Auto Highlights add-on	hour of audio	$0.01	source
Pay As You Go	Topic Detection (IAB) add-on	hour of audio	$0.15	source
Pay As You Go	Medical Mode add-on	hour of audio	$0.15	source
Pay As You Go	Speaker Diarization (async standard) add-on	hour of audio	$0.02	source
Pay As You Go	Speaker Diarization (async experimental) add-on	hour of audio	$0.065	source
Pay As You Go	Speaker Diarization (streaming) add-on	hour of audio	$0.12	source
Pay As You Go	Keyterms Prompting add-on	hour of audio	$0.05	source
Pay As You Go	General Prompting (Beta) add-on (U3 Pro only)	hour of audio	$0.05	source
Pay As You Go	Voice Focus add-on (U3 Pro Streaming only)	hour of audio	$0.1	source
Pay As You Go	Profanity Filtering add-on	hour of audio	$0.01	source
Pay As You Go	PII Audio Redaction add-on	hour of audio	$0.05	source
Pay As You Go	PII Text Redaction add-on	hour of audio	$0.08	source
Pay As You Go	Content Moderation add-on	hour of audio	$0.15	source

Capabilities

Real-time streaming
Speaker diarization
Speech translation
Medical transcription
PII redaction

Supported actions: transcribe_batch, transcribe_streaming, speaker_diarization, language_detection, word_timestamps, sentiment_analysis, entity_detection, topic_detection, pii_redaction, pii_audio_redaction, profanity_filtering, content_moderation, multichannel_transcription, custom_vocabulary, keyterms_prompting, webhook_notifications, medical_mode, automatic_punctuation, code_switching_detection, translation, voice_focus_noise_reduction, general_prompting [7]assemblyai.com/docs/api-reference/transcripts/submit“Diarization: Speaker labeling with configurable speaker count | Language Detection: Auto-detection with confidence thresholds | Sentiment Analysis: Per-utterance emotional tone classification | Entity Detection: Identifies locations, person names, organizations | Topic Detection: IAB category classification | Word Timestamps: Granular timing for each word with confidence scores | PII Redaction: Text and audio redaction”
Regions: United States, European Union (Dublin, Ireland) [8]
Languages: English, Spanish, French, German, Indonesian, Italian, Japanese, Dutch, Polish, Portuguese, Russian, Swedish, Turkish, Ukrainian, Catalan, Arabic, Azerbaijani, Bulgarian, Bosnian, Mandarin Chinese, Czech, Danish, Greek, Estonian, Finnish, Galician, Hebrew, Hindi, Croatian, Hungarian, Korean, Macedonian, Malay, Norwegian, Romanian, Slovak, Swiss German, Tagalog, Thai, Urdu, Vietnamese, Afrikaans, Belarusian, Welsh, Persian, Armenian, Icelandic, Kazakh, Lithuanian, Latvian, Maori, Marathi, Slovenian, Swahili, Tamil, Amharic, Assamese, Bengali, Gujarati, Hausa, Javanese, Georgian, Khmer, Kannada, Luxembourgish, Lingala, Lao, Malayalam, Mongolian, Maltese, Burmese, Nepali, Occitan, Punjabi, Pashto, Sindhi, Shona, Somali, Serbian, Telugu, Tajik, Uzbek, Yoruba [9]
Input types: audio/mpeg (MP3), audio/wav (WAV), audio/aac (AAC), audio/flac (FLAC), audio/ogg (OGG, OGA, MOGG), audio/opus (OPUS), audio/aiff (AIF, AIFF), audio/alac (ALAC), audio/amr (AMR), audio/mp4 (M4A, M4B, M4P, M4R), audio/ac3 (AC3), audio/ape (APE), audio/dss (DSS), audio/flv (FLV), audio/wma (WMA), audio/wv (WV), audio/qcp (QCP), audio/tta (TTA), audio/voc (VOC), video/mp4 (MP4, M4V), video/webm (WEBM), video/quicktime (MOV), video/ts (TS, MTS, M2TS), video/mp2 (MP2), video/mxf (MXF), file URL, local file upload, WebSocket (streaming), live audio stream [10]
Output types: JSON transcript with word-level timestamps, confidence scores, speaker-labeled utterances, sentiment analysis results, entity detection results, topic/IAB category results, SRT captions, VTT captions, redacted audio (beep or silence)
Webhooks: Yes [11]
Sandbox / test mode: No [12]
SDK languages: Python, Node.js, C#/.NET, Java [13]github.com/AssemblyAI/assemblyai-java-sdk“As of April 2025, AssemblyAI Java SDK has been discontinued and will no longer be maintained”assemblyai.com/docs/getting-started“Python 3.8+ required; install via pip install assemblyai — JavaScript Node.js 18+ required; install via npm install assemblyai”github.com/AssemblyAI/assemblyai-go-sdk“As of April 2025, AssemblyAI Go SDK has been discontinued and will no longer be maintained. The repository was archived on April 1, 2025 and is now read-only.”
MCP server: Yes [14]

Trust & compliance

SOC 2: SOC 2 Type II [15]
HIPAA: Yes [16]
GDPR: Yes [17]
ISO 27001: Yes [18]
PCI DSS: Yes [19]
Published SLA: Yes [20]
Rate limits: Free tier: 5 new streams per minute (streaming); Pay-as-you-go: 100 new streams per minute. Max file size for /v2/transcript: 5GB; max audio duration: 10 hours. Max local file upload via /v2/upload: 2.2GB. General API rate limit: 20,000 requests per 5 minutes. [21]
Known restrictions: Maximum file size for transcription endpoint: 5GB, Maximum audio duration: 10 hours, Maximum local file upload size: 2.2GB, Free-tier streaming concurrency: 5 new streams/minute, Pay-as-you-go streaming concurrency: 100 new streams/minute, Java, C#/.NET, Go, and Ruby SDKs discontinued April 2025; only Python and JavaScript SDKs are actively maintained, Summarization and Auto Chapters features deprecated (migrate to LLM Gateway)

Developer surface

Docs rendering: static · llms.txt present

Integration

API style: rest
Base URL: https://api.assemblyai.com
Version: v2
Versioning: url
Stability: ga
Auth methods: api_key
Idempotency keys: No
Error format: vendor-specific
Rate limit: 20000 / 5 minutes

SDKs

Python assemblyai · repo
Node.js assemblyai · repo
C#/.NET AssemblyAI · repo
Java com.assemblyai:assemblyai-java · repo

Adoption & maturity

Launched: 2017-01-01
Notable customers: Zoom, Spotify, Veed, CallRail, Dovetail, Calabrio, Kapwing, Jiminny, Grain, Supernormal

Other Speech-to-Text & Transcription APIs

ElevenLabs Scribe (Speech to Text)
"Scribe v2 is the most accurate Speech to Text model" offering "real-time Speech to Text in under 150 ms" across "90+ languages."
Hybrid · free tier · public pricing · self-serve
Azure AI Speech to Text
"Azure Speech in Foundry Tools provides speech to text, text to speech, and other capabilities through a Microsoft Foundry resource. You can transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and conduct live AI voice conversations."
Usage · free tier · public pricing · self-serve
Amazon Transcribe
"Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or to add speech-to-text capabilities to any application."
Usage · free tier · public pricing · self-serve
Google Cloud Speech-to-Text
"Accurate voice typing and transcription powered by Gemini."
Usage · free tier · public pricing · self-serve
IBM watsonx Speech to Text
"IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics."
Usage · free tier · public pricing · self-serve
Speechmatics
"Low-latency speech-to-text for multilingual, multi-speaker conversations."
Usage · free tier · public pricing · self-serve

AssemblyAI alternatives · AssemblyAI vs ElevenLabs Scribe (Speech to Text) · All Speech-to-Text & Transcription APIs APIs

References

Each field above carries a numbered source - hover for a preview, click to jump here.

↑Description: assemblyai.com
↑Pricing model: assemblyai.com
↑Published pricing: assemblyai.com
↑Free tier: assemblyai.com
↑Self-serve signup: assemblyai.com
↑Enterprise plan: assemblyai.com
↑Supported actions: assemblyai.com
↑Regions: assemblyai.com
↑Languages: assemblyai.com · assemblyai.com
↑Input types: support.assemblyai.com
↑Webhooks: assemblyai.com
↑Sandbox: assemblyai.com
↑SDK languages: github.com · assemblyai.com · github.com
↑MCP server: assemblyai.com · assemblyai.com
↑SOC 2: assemblyai.com
↑HIPAA: assemblyai.com
↑GDPR: assemblyai.com · assemblyai.com
↑ISO 27001: assemblyai.com
↑PCI DSS: assemblyai.com
↑Published SLA: assemblyai.com
↑Rate limits: assemblyai.com · assemblyai.com

Change history

Every field change, who made it, and when - from our audited data pipeline and editors.

2026-06-21 Capabilities: {} → {"medical":true,"translation":true,"pii_redaction":true,"real_time_streaming":t…
2026-06-21 Summary Md: (none) → AssemblyAI is a voice AI platform providing speech-to-text transcription, speak…
2026-06-21 Score Pricing Transparency: (none) → 85
2026-06-21 Score Setup Speed: (none) → 60
2026-06-21 Score Docs Quality: (none) → 75
2026-06-21 Score Procurement Friction: (none) → 85
2026-06-21 Score Trust Readiness: (none) → 100
2026-06-21 Best For: (none) → Regulated or enterprise workloads - compliance attestations and an enterprise p…
2026-06-21 Avoid If: (none) → You want to try it free before paying
2026-06-21 Scoring Methodology: (none) → Scores are computed deterministically from this profile's published, sourced fi…
2026-06-21 Score Agent Friendliness: (none) → 70
2026-06-21 Llms Txt URL: (none) → https://www.assemblyai.com/llms.txt
2026-06-21 Llms Txt Present: (none) → Yes
2026-06-21 Rendering: (none) → static
2026-06-21 Has Structured Data: (none) → No
2026-06-21 Robots Allows Agents: (none) → Yes
2026-06-21 Openapi Spec URL: (none) → https://www.assemblyai.com/openapi.json
2026-06-21 API Reference URL: (none) → https://www.assemblyai.com/docs
2026-06-21 Status Page URL: (none) → https://status.assemblyai.com
2026-06-21 Changelog URL: (none) → https://www.assemblyai.com/changelog
2026-06-21 Docs URL: (none) → https://www.assemblyai.com/docs
2026-06-21 Requires Sales Call: set to No
2026-06-21 Enterprise Plan Available: set to Yes
2026-06-21 SOC 2: set to type_2
2026-06-21 HIPAA: set to Yes
2026-06-21 GDPR: set to Yes
2026-06-21 ISO 27001: set to Yes
2026-06-21 PCI DSS: set to Yes
2026-06-21 SLA Published: set to Yes
2026-06-21 SLA URL: set to https://www.assemblyai.com/security
2026-06-21 Data Retention Policy URL: set to https://www.assemblyai.com/legal/privacy-policy
2026-06-21 Documented Rate Limits: set to Free tier: 5 new streams per minute (streaming); Pay-as-you-go: 100 new streams…
2026-06-21 Rate Limit Requests: set to 20000
2026-06-21 Rate Limit Window: set to 5 minutes
2026-06-21 Known Restrictions: set to Maximum file size for transcription endpoint: 5GB, Maximum audio duration: 10 h…
2026-06-21 Auth Methods: set to api_key
2026-06-21 Auth Docs URL: set to https://www.assemblyai.com/docs/api-reference/overview
2026-06-21 API Style: set to rest
2026-06-21 Base URL: set to https://api.assemblyai.com
2026-06-21 API Version: set to v2
2026-06-21 Versioning Scheme: set to url
2026-06-21 Stability: set to ga
2026-06-21 Deprecation Policy URL: set to https://www.assemblyai.com/changelog
2026-06-21 MCP URL: set to https://assemblyai.com/docs/mcp
2026-06-21 Quickstart URL: set to https://www.assemblyai.com/docs/getting-started
2026-06-21 Idempotency Supported: set to No
2026-06-21 Error Format: set to vendor-specific
2026-06-21 Webhook Events URL: set to https://www.assemblyai.com/docs/getting-started/webhooks
2026-06-21 Requires Verification: set to No
2026-06-21 Starting Price Usd: set to 0.0025

Suggest an edit / leave a review

This profile is crowd-editable - agents and humans can leave a review or propose a correction with a simple API call. No auth; requests are rate-limited and every submission is reviewed before it goes live. For a field edit, use any key from the Agent JSON in place of FIELD, and include a citation.

Leave a review or comment

curl -X POST https://apio.sh/api/feedback/assemblyai \
  -H 'Content-Type: application/json' \
  -d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'

Suggest a correction to a field (cite a source)

curl -X POST https://apio.sh/api/suggest/assemblyai/FIELD \
  -H 'Content-Type: application/json' \
  -d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'

All the ways to contribute →

Best for / Avoid if

Pricing & procurement

Capabilities

Trust & compliance

Developer surface

Integration

Adoption & maturity

Other Speech-to-Text & Transcription APIs

ElevenLabs Scribe (Speech to Text)

Azure AI Speech to Text

Amazon Transcribe

Google Cloud Speech-to-Text

IBM watsonx Speech to Text

Speechmatics

References

Change history

Suggest an edit / leave a review