Google Cloud Text-to-Speech

"Cloud Text-to-Speech converts text or Speech Synthesis Markup Language (SSML) input into audio data of natural human speech." [1]

Text-to-Speech APIs

cloud.google.com/text-to-speech · By Google · Agent JSON · Suggest an edit · Last verified 2026-06-21 · Source confidence: high

Google Cloud Text-to-Speech converts text or SSML input into natural-sounding audio, targeting voice agents, IVR systems, audiobook narration, accessibility tools, and real-time conversational AI. It offers a generous free tier (up to 4 million characters per month for Standard voices, 1 million for WaveNet and Neural2), with paid tiers starting at $4 per million characters on a self-serve, usage-based model. The API ships SDKs for eight languages, supports streaming and long-form synthesis, and carries SOC 2 Type 2, HIPAA, GDPR, and ISO 27001 certifications with a published SLA.

Best for / Avoid if

Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; Teams needing broad API coverage out of the box

Pricing & procurement

Pricing model: Usage-based [2]
Published pricing: Yes
Free tier: Yes [3]
Free tier details: Recurring monthly free allowance: 4 million characters/month for Standard voices; 1 million characters/month for WaveNet voices; 1 million characters/month for Neural2, Studio, and Chirp 3 HD voices. Separate one-time $300 trial credit (90 days) is also available but is not part of the recurring free tier. [4]cloud.google.com/text-to-speech/pricing“The first 1 million characters for WaveNet voices are free each month, and for Standard (non-WaveNet) voices, the first 4 million characters are free each month.”costbench.com/software/ai-voice-tools/google-cloud-text-to-speech/free-plan/“Standard Voices: 4M characters/month; Neural2 Voices: 1M characters/month; Studio Voices: 1M characters/month; Chirp 3 HD Voices: 1M characters/month”
Self-serve signup: Yes [5]
Requires sales call: No
Enterprise plan: Yes

Published prices
Plan	Item	Per	Amount	Source
Pay As You Go	Speech synthesis - Standard voices (free tier)	4M characters/month	$0	source
Pay As You Go	Speech synthesis - Standard voices	1M characters	$4	source
Pay As You Go	Speech synthesis - WaveNet voices (free tier)	1M characters/month	$0	source
Pay As You Go	Speech synthesis - WaveNet voices	1M characters	$16	source
Pay As You Go	Speech synthesis - Neural2 voices (free tier)	1M characters/month	$0	source
Pay As You Go	Speech synthesis - Neural2 voices	1M characters	$16	source
Pay As You Go	Speech synthesis - Polyglot voices	1M characters	$16	source
Pay As You Go	Speech synthesis - Chirp 3 HD voices (free tier)	1M characters/month	$0	source
Pay As You Go	Speech synthesis - Chirp 3 HD voices	1M characters	$30	source
Pay As You Go	Speech synthesis - Studio voices (free tier)	1M characters/month	$0	source
Pay As You Go	Speech synthesis - Studio voices	1M characters	$160	source
Pay As You Go	Instant Custom Voice synthesis (Chirp 3)	1M characters	$60	source
Pay As You Go	Gemini 2.5 Flash TTS - input tokens	1M tokens	$0.3	source
Pay As You Go	Gemini 2.5 Flash TTS - audio output tokens	1M tokens	$2.5	source
Pay As You Go	Gemini 2.5 Pro TTS - input tokens	1M tokens	$1	source
Pay As You Go	Gemini 2.5 Pro TTS - audio output tokens	1M tokens	$20	source

Capabilities

Real-time streaming
Voice cloning
SSML control
Multilingual voices
Word timestamps

Supported actions: synthesize_speech, streaming_tts, long_audio_synthesis, ssml_support, instant_voice_cloning, multilingual_synthesis, multi_speaker_markup, word_timestamps, custom_pronunciations, speaking_rate_control, pitch_control, volume_control, list_voices, async_long_audio_synthesis, multi_speaker_audio [6]docs.cloud.google.com/text-to-speech/docs/reference/rest“POST /v1/text:synthesize - Synchronous speech synthesis; POST /v1/{parent=projects/*/locations/*}:synthesizeLongAudio - Asynchronous long-form synthesis”docs.cloud.google.com/text-to-speech/docs/chirp3-instant-custom-voice“Reference and consent audio files generate a 'voice cloning key'—a text string representing the voice data stored client-side.”docs.cloud.google.com/text-to-speech/docs/chirp3-hd“Chirp 3: HD supports streaming synthesis through the streaming_synthesize() API method, enabling real-time audio generation as text is streamed in chunks.”
Regions: global, us (multi-region), eu (multi-region), us-central1, asia-northeast1 (Tokyo), australia-southeast1 (Sydney), asia-south1 (Mumbai), asia-southeast1 (Singapore), asia-northeast3 (Seoul), europe-west2 (London), europe-west3 (Frankfurt), europe-west4 (Netherlands), northamerica-northeast1 [7]
Languages: Afrikaans (af-ZA), Arabic (ar-XA), Basque (eu-ES), Bengali (bn-IN), Bulgarian (bg-BG), Catalan (ca-ES), Chinese - Hong Kong (yue-HK), Croatian (hr-HR), Czech (cs-CZ), Danish (da-DK), Dutch - Belgium (nl-BE), Dutch - Netherlands (nl-NL), English - Australia (en-AU), English - India (en-IN), English - UK (en-GB), English - US (en-US), Estonian (et-EE), Filipino (fil-PH), Finnish (fi-FI), French - Canada (fr-CA), French - France (fr-FR), Galician (gl-ES), German (de-DE), Greek (el-GR), Hebrew (he-IL), Hindi (hi-IN), Hungarian (hu-HU), Indonesian (id-ID), Italian (it-IT), Japanese (ja-JP), Korean (ko-KR), Latvian (lv-LV), Lithuanian (lt-LT), Mandarin Chinese (cmn-CN), Mandarin Chinese - Taiwan (cmn-TW), Norwegian (nb-NO), Polish (pl-PL), Portuguese - Brazil (pt-BR), Portuguese - Portugal (pt-PT), Punjabi - India (pa-IN, Preview), Romanian (ro-RO), Russian (ru-RU), Serbian (sr-RS), Slovak (sk-SK), Slovenian (sl-SI), Spanish - Spain (es-ES), Spanish - US (es-US), Swedish (sv-SE), Thai (th-TH), Turkish (tr-TR), Ukrainian (uk-UA), Vietnamese (vi-VN), 80+ languages and variants via Gemini-TTS models [8]
Input types: plain text, SSML [9]
Output types: MP3, LINEAR16 (WAV), OGG_OPUS, MULAW, ALAW, PCM, streaming audio chunks [10]
Webhooks: No
Sandbox / test mode: No [11]
SDK languages: Python, Node.js, Java, Go, C#, PHP, Ruby, C++ [12]
MCP server: No [13]

Trust & compliance

SOC 2: SOC 2 Type II [14]
HIPAA: Yes [15]
GDPR: Yes [16]
ISO 27001: Yes [17]
PCI DSS: Unknown [18]
Published SLA: Yes [19]
Rate limits: Maximum request size: 5,000 bytes per request (cannot be increased). Requests per minute (RPM) by voice type: Chirp 3 HD: 200 RPM; Chirp Voice Cloning: 30 RPM; Neural2: 1,000 RPM; Polyglot: 1,000 RPM; Studio: 500 RPM; Long Audio Synthesis: 100 RPM; General (all other voices): 1,000 RPM. Concurrent streaming sessions: 100 per project. Voice cloning key generation: 10 requests per minute. Gemini-2.5-flash-tts: 150 QPM; Gemini-2.5-pro-tts: 125 QPM. [20]docs.cloud.google.com/text-to-speech/quotas“Maximum request size: 5,000 bytes total per request. Chirp 3: 200 RPM; Neural 2: 1,000 RPM; Studio: 500 RPM; General (all other voices): 1,000 RPM; Concurrent streaming sessions: 100 per project; Voice cloning key generation: 10 requests per minute”docs.cloud.google.com/text-to-speech/quotas“gemini-2.5-flash-tts: 150 QPM; gemini-2.5-pro-tts: 125 QPM”
Known restrictions: Cloud TTS does not support all SSML elements for all available languages., Instant Voice Cloning (Chirp 3) is currently restricted to allow-listed users; contact sales to request access., Voice cloning requires consent audio recording with the statement: 'I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model.', Maximum request size of 5,000 bytes cannot be increased., Chirp 3: HD voices are out of scope for regionalization and data residency despite being available in eu/us endpoints., Reference audio for instant voice cloning must be up to 10 seconds, single-channel, with minimal background noise. [21]docs.cloud.google.com/text-to-speech/docs/chirp3-instant-custom-voice“Currently restricted to allow-listed users. Contact the sales team to request access. Users must record the required consent statement: 'I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model.'”docs.cloud.google.com/text-to-speech/docs/basics“Cloud TTS does not support all SSML elements for all available languages.”

Developer surface

Docs rendering: static

Integration

API style: rest
Base URL: https://texttospeech.googleapis.com
Version: v1
Versioning: url
Stability: ga
Auth methods: api_key, oauth2
Idempotency keys: No
Error format: vendor-specific
Rate limit: 1000 / minute

SDKs

Python google-cloud-texttospeech · repo
Node.js @google-cloud/text-to-speech · repo
Java com.google.cloud:google-cloud-texttospeech · repo
Go cloud.google.com/go/texttospeech/apiv1 · repo
C# Google.Cloud.TextToSpeech.V1 · repo
PHP google/cloud-text-to-speech · repo
Ruby google-cloud-text_to_speech · repo
C++ · repo

Adoption & maturity

Launched: 2017-11-10
GA: 2018-08-28
Notable customers: Ingram Content Group

Other Text-to-Speech APIs

ElevenLabs Text to Speech
"Text to Speech with high quality, human-like AI voices"
Hybrid · free tier · public pricing · self-serve
Azure AI Text to Speech
"Text to speech enables your applications, tools, or devices to convert text into human like synthesized speech. The text to speech capability is also known as speech synthesis. Use human like standard voices out of the box, or create a custom voice that's unique to your product or brand."
Usage · free tier · public pricing · self-serve
Amazon Polly
"Amazon Polly is a cloud service that converts text into lifelike speech. You can use Amazon Polly to develop applications that increase engagement and accessibility."
Usage · free tier · public pricing · self-serve
Cartesia (Sonic)
"The fastest and most natural text to speech model"
Hybrid · free tier · public pricing · self-serve
Murf AI
"Enterprise-grade AI voice generation with 150+ natural-sounding voices across 35 languages and 20+ speaking styles."
Usage · public pricing · self-serve
OpenAI Text to Speech (gpt-4o-mini-tts / tts-1)
"Transform text into lifelike spoken audio" - OpenAI's TTS service enabling blog narration, multilingual audio production, and realtime voice output via gpt-4o-mini-tts, tts-1, and tts-1-hd models.
Usage · public pricing · self-serve

Google Cloud Text-to-Speech alternatives · Google Cloud Text-to-Speech vs ElevenLabs Text to Speech · All Text-to-Speech APIs APIs

References

Each field above carries a numbered source - hover for a preview, click to jump here.

↑Description: docs.cloud.google.com
↑Pricing model: cloud.google.com
↑Free tier: cloud.google.com
↑Free tier details: cloud.google.com · costbench.com
↑Self-serve signup: docs.cloud.google.com
↑Supported actions: docs.cloud.google.com · docs.cloud.google.com · docs.cloud.google.com
↑Regions: docs.cloud.google.com
↑Languages: docs.cloud.google.com · docs.cloud.google.com
↑Input types: docs.cloud.google.com
↑Output types: docs.cloud.google.com · docs.cloud.google.com
↑Sandbox: cloud.google.com
↑SDK languages: docs.cloud.google.com
↑MCP server: docs.cloud.google.com
↑SOC 2: cloud.google.com
↑HIPAA: cloud.google.com
↑GDPR: cloud.google.com
↑ISO 27001: cloud.google.com
↑PCI DSS: cloud.google.com
↑Published SLA: cloud.google.com
↑Rate limits: docs.cloud.google.com · docs.cloud.google.com
↑Known restrictions: docs.cloud.google.com · docs.cloud.google.com

Change history

Every field change, who made it, and when - from our audited data pipeline and editors.

2026-06-21 Capabilities: {} → {"ssml":true,"streaming":true,"multilingual":true,"voice_cloning":true,"word_ti…
2026-06-21 Summary Md: (none) → Google Cloud Text-to-Speech converts text or SSML input into natural-sounding a…
2026-06-21 Score Docs Quality: (none) → 15
2026-06-21 Score Procurement Friction: (none) → 100
2026-06-21 Score Trust Readiness: (none) → 90
2026-06-21 Best For: (none) → Prototypes and side projects - free to start, no sales call, Regulated or enter…
2026-06-21 Scoring Methodology: (none) → Scores are computed deterministically from this profile's published, sourced fi…
2026-06-21 Score Agent Friendliness: (none) → 20
2026-06-21 Score Pricing Transparency: (none) → 100
2026-06-21 Score Setup Speed: (none) → 85
2026-06-21 Llms Txt Present: (none) → No
2026-06-21 Has Structured Data: (none) → No
2026-06-21 Robots Allows Agents: (none) → Yes
2026-06-21 Status Page URL: (none) → https://status.cloud.google.com
2026-06-21 Docs URL: (none) → https://docs.cloud.google.com/?hl=zh-tw
2026-06-21 Rendering: (none) → static
2026-06-21 SDK Packages: set to Python, Node.js, Java, Go, C#, PHP, Ruby, C++
2026-06-21 MCP Server Available: set to No
2026-06-21 Pricing Model: set to usage_based
2026-06-21 Has Published Pricing: set to Yes
2026-06-21 Free Tier Available: set to Yes
2026-06-21 Free Tier Details: set to Recurring monthly free allowance: 4 million characters/month for Standard voice…
2026-06-21 Self Serve Signup: set to Yes
2026-06-21 Requires Sales Call: set to No
2026-06-21 Enterprise Plan Available: set to Yes
2026-06-21 SOC 2: set to type_2
2026-06-21 HIPAA: set to Yes
2026-06-21 GDPR: set to Yes
2026-06-21 ISO 27001: set to Yes
2026-06-21 SLA Published: set to Yes
2026-06-21 SLA URL: set to https://cloud.google.com/text-to-speech/sla
2026-06-21 Data Retention Policy URL: set to https://cloud.google.com/terms/cloud-privacy-notice
2026-06-21 Documented Rate Limits: set to Maximum request size: 5,000 bytes per request (cannot be increased). Requests p…
2026-06-21 Rate Limit Requests: set to 1000
2026-06-21 Rate Limit Window: set to minute
2026-06-21 Auth Docs URL: set to https://docs.cloud.google.com/text-to-speech/docs/authentication
2026-06-21 API Style: set to rest
2026-06-21 Base URL: set to https://texttospeech.googleapis.com
2026-06-21 API Version: set to v1
2026-06-21 Versioning Scheme: set to url
2026-06-21 Stability: set to ga
2026-06-21 Deprecation Policy URL: set to https://cloud.google.com/terms/deprecation
2026-06-21 Quickstart URL: set to https://docs.cloud.google.com/text-to-speech/docs/create-audio-text-client-libr…
2026-06-21 Idempotency Supported: set to No
2026-06-21 Error Format: set to vendor-specific
2026-06-21 Slug: set to google-text-to-speech
2026-06-21 Starting Price Usd: set to 4
2026-06-21 Price Basis: set to 1M characters
2026-06-21 Free Tier Limit: set to 4 million characters/month (Standard voices); 1 million characters/month (WaveN…
2026-06-21 Launched At: set to 2017-11-10

Suggest an edit / leave a review

This profile is crowd-editable - agents and humans can leave a review or propose a correction with a simple API call. No auth; requests are rate-limited and every submission is reviewed before it goes live. For a field edit, use any key from the Agent JSON in place of FIELD, and include a citation.

Leave a review or comment

curl -X POST https://apio.sh/api/feedback/google-text-to-speech \
  -H 'Content-Type: application/json' \
  -d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'

Suggest a correction to a field (cite a source)

curl -X POST https://apio.sh/api/suggest/google-text-to-speech/FIELD \
  -H 'Content-Type: application/json' \
  -d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'

All the ways to contribute →

Best for / Avoid if

Pricing & procurement

Capabilities

Trust & compliance

Developer surface

Integration

Adoption & maturity

Other Text-to-Speech APIs

ElevenLabs Text to Speech

Azure AI Text to Speech

Amazon Polly

Cartesia (Sonic)

Murf AI

OpenAI Text to Speech (gpt-4o-mini-tts / tts-1)

References

Change history

Suggest an edit / leave a review