Cartesia (Sonic)

"The fastest and most natural text to speech model" [1]

Text-to-Speech APIs

cartesia.ai · By Cartesia · Agent JSON · Suggest an edit · Last verified 2026-06-21 · Source confidence: high

Cartesia's Sonic API is a text-to-speech service built for low-latency voice applications such as conversational AI agents, customer support, dubbing, and audiobook narration, with a reported first-audio-byte latency of 90ms on Sonic 3.5. Pricing starts at $50 per million characters with a free tier of 20,000 characters per month, and self-serve signup is available without a sales call. The API supports REST and WebSocket streaming, instant voice cloning on all plans, and deploys across cloud regions, on-premises, and on-device. Cartesia holds SOC 2 Type II, HIPAA, GDPR, and PCI DSS certifications, and counts Quora, Cresta, and Rasa among its customers.

Best for / Avoid if

Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; AI agents and automation - an agent-ready surface (MCP / llms.txt)

Pricing & procurement

Pricing model: Hybrid (base + usage) [2]
Published pricing: Yes
Free tier: Yes [3]
Free tier details: Free plan at $0/month includes 20,000 credits/month (approximately 20,000 characters of TTS at 1 credit per character). Includes instant voice cloning. Does not include professional voice cloning or commercial use license. [4]
Self-serve signup: Yes
Requires sales call: No
Enterprise plan: Yes [5]

Published prices
Plan	Item	Per	Amount	Source
Free	Monthly subscription	month	$0	source
Free	TTS credits included	20,000 credits/month	$0	source
Pro	Monthly subscription	month	$5	source
Pro	TTS credits included	100,000 credits/month	$5	source
Startup	Monthly subscription	month	$49	source
Startup	TTS credits included	1,250,000 credits/month	$49	source
Scale	Monthly subscription	month	$299	source
Scale	TTS credits included	8,000,000 credits/month	$299	source
	Standard Sonic TTS — credit consumption rate	~1 credit per character	-	source
	TTS with Professional voice clone — credit consumption rate	~1.5 credits per character	-	source
Startup	Professional voice clone fine-tuning (one-time per clone)	1,000,000 credits per fine-tune	-	source
	Voice localization (one-time per voice)	225 credits per voice	-	source
	Voice changer — credit consumption rate	15 credits per second of audio	-	source

Capabilities

Real-time streaming
Voice cloning
Voice design
Multilingual voices
Word timestamps

Supported actions: synthesize_speech, streaming_tts, websocket_tts, instant_voice_cloning, professional_voice_cloning, word_timestamps, phoneme_timestamps, speech_to_speech, multilingual_synthesis, emotion_control, speed_control, volume_control, pronunciation_dictionaries, voice_design, voice_localization, voice_infill, voice_changer [6]
Regions: US, EU, global (cloud), on-premises, on-device
Languages: American English, British English, Australian English, French (France), French (Canada), German, Spanish (Castilian), Spanish (Mexican), Italian, Portuguese (Brazilian), Russian, Japanese, Korean, Mandarin Chinese, Dutch, Hindi, Bengali, Bulgarian, Croatian, Czech, Danish, Finnish, Georgian, Greek, Gujarati, Hebrew, Hungarian, Indonesian, Kannada, Malayalam, Marathi, Norwegian, Polish, Punjabi, Romanian, Slovak, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese, Emirati Arabic, Malay, Arabic [7]docs.cartesia.ai/build-with-cartesia/tts-models/latest“Ranked #1 for naturalness, with sub-90ms latency and native support for 42 languages”cartesia.ai/languages“47 languages and regional variants: American English, British English, Australian English, French (France), French (Canada), German, Castilian Spanish, Mexican Spanish, Italian, Brazilian Portuguese, Russian, Japanese, Korean, Mandarin Chinese, Dutch, Hindi, Bengali, Bulgarian, Croatian, Czech, Danish, Finnish, Georgian, Greek, Gujarati, Hebrew, Hungarian, Indonesian, Kannada, Malayalam, Marathi, Norwegian, Polish, Punjabi, Romanian, Slovak, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese, Emirati Arabic, Malay, Arabic”
Input types: plain text
Output types: pcm_f32le, pcm_s16le, pcm_mulaw, pcm_alaw, wav, mp3, streaming chunks (SSE), streaming chunks (WebSocket) [8]
Webhooks: No [9]
Sandbox / test mode: No
SDK languages: Python, JavaScript/TypeScript [10]
MCP server: Yes [11]docs.cartesia.ai/integrations/mcp“Run cartesia-mcp with uvx to use Cartesia TTS, STT, voices, and pronunciation dictionaries from MCP clients like Cursor and Claude Code.”github.com/cartesia-ai/cartesia-mcp“The official Cartesia MCP Server — exposes Cartesia APIs over the Model Context Protocol (MCP) so clients such as Cursor, Claude Desktop, and OpenAI Agents can list voices, run TTS, clone voices, infill audio, and more.”

Trust & compliance

SOC 2: SOC 2 Type II [12]
HIPAA: Yes [13]
GDPR: Yes [14]
ISO 27001: Unknown [15]
PCI DSS: Yes [16]
Published SLA: No [17]
Rate limits: TTS concurrent requests: Free=2, Pro=3, Startup=5, Scale=15, Enterprise=custom. WebSocket connections limited to 10x concurrency limit. Idle WebSocket connections closed after 5 minutes. Exceeding limits returns HTTP 429. First audio byte latency: 90ms (Sonic 3.5). [18]
Known restrictions: Voice cloning requires explicit consent from the voice owner per Acceptable Use Policy, Cannot post audio of deceased persons, political candidates, or others without express permission, No SSML support documented, Professional voice cloning available on Startup plan and above only; instant voice cloning available on all plans including Free, Free plan does not include commercial use license, Infinite request length supported (no per-request character cap documented), Voice sample recordings may be retained by Cartesia solely for voice cloning functionality [19]

Developer surface

Docs rendering: static

Integration

API style: rest
Base URL: https://api.cartesia.ai
Version: 2026-03-01
Versioning: date
Stability: ga
Auth methods: api_key, jwt
Error format: vendor-specific
Rate limit: 2 / concurrent

SDKs

Python cartesia · repo
JavaScript/TypeScript @cartesia/cartesia-js · repo

Adoption & maturity

Launched: 2023-01-01
Notable customers: Quora, Cresta, Rasa

Other Text-to-Speech APIs

ElevenLabs Text to Speech
"Text to Speech with high quality, human-like AI voices"
Hybrid · free tier · public pricing · self-serve
Azure AI Text to Speech
"Text to speech enables your applications, tools, or devices to convert text into human like synthesized speech. The text to speech capability is also known as speech synthesis. Use human like standard voices out of the box, or create a custom voice that's unique to your product or brand."
Usage · free tier · public pricing · self-serve
Amazon Polly
"Amazon Polly is a cloud service that converts text into lifelike speech. You can use Amazon Polly to develop applications that increase engagement and accessibility."
Usage · free tier · public pricing · self-serve
Google Cloud Text-to-Speech
"Cloud Text-to-Speech converts text or Speech Synthesis Markup Language (SSML) input into audio data of natural human speech."
Usage · free tier · public pricing · self-serve
Murf AI
"Enterprise-grade AI voice generation with 150+ natural-sounding voices across 35 languages and 20+ speaking styles."
Usage · public pricing · self-serve
OpenAI Text to Speech (gpt-4o-mini-tts / tts-1)
"Transform text into lifelike spoken audio" - OpenAI's TTS service enabling blog narration, multilingual audio production, and realtime voice output via gpt-4o-mini-tts, tts-1, and tts-1-hd models.
Usage · public pricing · self-serve

Cartesia (Sonic) alternatives · Cartesia (Sonic) vs ElevenLabs Text to Speech · All Text-to-Speech APIs APIs

References

Each field above carries a numbered source - hover for a preview, click to jump here.

↑Description: cartesia.ai · docs.cartesia.ai
↑Pricing model: cartesia.ai
↑Free tier: cartesia.ai
↑Free tier details: cartesia.ai
↑Enterprise plan: cartesia.ai
↑Supported actions: docs.cartesia.ai · docs.cartesia.ai
↑Languages: docs.cartesia.ai · cartesia.ai
↑Output types: docs.cartesia.ai · docs.cartesia.ai
↑Webhooks: docs.cartesia.ai
↑SDK languages: docs.cartesia.ai
↑MCP server: docs.cartesia.ai · github.com
↑SOC 2: cartesia.ai · cartesia.ai
↑HIPAA: cartesia.ai · cartesia.ai
↑GDPR: cartesia.ai · cartesia.ai
↑ISO 27001: cartesia.ai
↑PCI DSS: cartesia.ai · cartesia.ai
↑Published SLA: cartesia.ai
↑Rate limits: docs.cartesia.ai · docs.cartesia.ai
↑Known restrictions: cartesia.ai · cartesia.ai

Change history

Every field change, who made it, and when - from our audited data pipeline and editors.

2026-06-21 Capabilities: {} → {"streaming":true,"multilingual":true,"voice_design":true,"voice_cloning":true,…
2026-06-21 Summary Md: (none) → Cartesia's Sonic API is a text-to-speech service built for low-latency voice ap…
2026-06-21 Score Docs Quality: (none) → 15
2026-06-21 Score Procurement Friction: (none) → 100
2026-06-21 Score Trust Readiness: (none) → 65
2026-06-21 Best For: (none) → Prototypes and side projects - free to start, no sales call, Regulated or enter…
2026-06-21 Scoring Methodology: (none) → Scores are computed deterministically from this profile's published, sourced fi…
2026-06-21 Score Agent Friendliness: (none) → 50
2026-06-21 Score Pricing Transparency: (none) → 100
2026-06-21 Score Setup Speed: (none) → 80
2026-06-21 Llms Txt Present: (none) → No
2026-06-21 Has Structured Data: (none) → Yes
2026-06-21 Robots Allows Agents: (none) → Yes
2026-06-21 Status Page URL: (none) → https://status.cartesia.ai
2026-06-21 Docs URL: (none) → https://docs.cartesia.ai/get-started/overview
2026-06-21 Rendering: (none) → static
2026-06-21 MCP Server Available: set to Yes
2026-06-21 Pricing Model: set to hybrid
2026-06-21 Has Published Pricing: set to Yes
2026-06-21 Free Tier Available: set to Yes
2026-06-21 Free Tier Details: set to Free plan at $0/month includes 20,000 credits/month (approximately 20,000 chara…
2026-06-21 Self Serve Signup: set to Yes
2026-06-21 Requires Sales Call: set to No
2026-06-21 Enterprise Plan Available: set to Yes
2026-06-21 SOC 2: set to type_2
2026-06-21 HIPAA: set to Yes
2026-06-21 PCI DSS: set to Yes
2026-06-21 SLA Published: set to No
2026-06-21 Data Retention Policy URL: set to https://cartesia.ai/legal/privacy.html
2026-06-21 Documented Rate Limits: set to TTS concurrent requests: Free=2, Pro=3, Startup=5, Scale=15, Enterprise=custom.…
2026-06-21 Rate Limit Requests: set to 2
2026-06-21 Rate Limit Window: set to concurrent
2026-06-21 Known Restrictions: set to Voice cloning requires explicit consent from the voice owner per Acceptable Use…
2026-06-21 Auth Methods: set to api_key, jwt
2026-06-21 Auth Docs URL: set to https://docs.cartesia.ai/get-started/authenticate-your-client-applications.md
2026-06-21 API Style: set to rest
2026-06-21 Base URL: set to https://api.cartesia.ai
2026-06-21 API Version: set to 2026-03-01
2026-06-21 Versioning Scheme: set to date
2026-06-21 Stability: set to ga
2026-06-21 Deprecation Policy URL: set to https://docs.cartesia.ai/use-the-api/api-conventions.md
2026-06-21 MCP URL: set to https://github.com/cartesia-ai/cartesia-mcp
2026-06-21 Quickstart URL: set to https://docs.cartesia.ai/get-started/realtime-text-to-speech-quickstart.md
2026-06-21 GDPR: set to Yes
2026-06-21 Requires Verification: set to No
2026-06-21 Starting Price Usd: set to 50
2026-06-21 Price Basis: set to 1M characters
2026-06-21 Free Tier Limit: set to 20,000 characters/month
2026-06-21 Launched At: set to 2023-01-01
2026-06-21 Notable Customers: set to Quora, Cresta, Rasa

Suggest an edit / leave a review

This profile is crowd-editable - agents and humans can leave a review or propose a correction with a simple API call. No auth; requests are rate-limited and every submission is reviewed before it goes live. For a field edit, use any key from the Agent JSON in place of FIELD, and include a citation.

Leave a review or comment

curl -X POST https://apio.sh/api/feedback/cartesia \
  -H 'Content-Type: application/json' \
  -d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'

Suggest a correction to a field (cite a source)

curl -X POST https://apio.sh/api/suggest/cartesia/FIELD \
  -H 'Content-Type: application/json' \
  -d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'

All the ways to contribute →

Best for / Avoid if

Pricing & procurement

Capabilities

Trust & compliance

Developer surface

Integration

Adoption & maturity

Other Text-to-Speech APIs

ElevenLabs Text to Speech

Azure AI Text to Speech

Amazon Polly

Google Cloud Text-to-Speech

Murf AI

OpenAI Text to Speech (gpt-4o-mini-tts / tts-1)

References

Change history

Suggest an edit / leave a review