Google Cloud Text-to-Speech
"Cloud Text-to-Speech converts text or Speech Synthesis Markup Language (SSML) input into audio data of natural human speech." [1]
Google Cloud Text-to-Speech converts text or SSML input into natural-sounding audio, targeting voice agents, IVR systems, audiobook narration, accessibility tools, and real-time conversational AI. It offers a generous free tier (up to 4 million characters per month for Standard voices, 1 million for WaveNet and Neural2), with paid tiers starting at $4 per million characters on a self-serve, usage-based model. The API ships SDKs for eight languages, supports streaming and long-form synthesis, and carries SOC 2 Type 2, HIPAA, GDPR, and ISO 27001 certifications with a published SLA.
Best for / Avoid if
Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; Teams needing broad API coverage out of the box
Pricing & procurement
- Pricing model
- Usage-based [2]
- Published pricing
- ✓ Yes
- Free tier
- ✓ Yes [3]
- Free tier details
- Recurring monthly free allowance: 4 million characters/month for Standard voices; 1 million characters/month for WaveNet voices; 1 million characters/month for Neural2, Studio, and Chirp 3 HD voices. Separate one-time $300 trial credit (90 days) is also available but is not part of the recurring free tier. [4]
- Self-serve signup
- ✓ Yes [5]
- Requires sales call
- ✗ No
- Enterprise plan
- ✓ Yes
| Plan | Item | Per | Amount | Source |
|---|---|---|---|---|
| Pay As You Go | Speech synthesis - Standard voices (free tier) | 4M characters/month | $0 | source |
| Pay As You Go | Speech synthesis - Standard voices | 1M characters | $4 | source |
| Pay As You Go | Speech synthesis - WaveNet voices (free tier) | 1M characters/month | $0 | source |
| Pay As You Go | Speech synthesis - WaveNet voices | 1M characters | $16 | source |
| Pay As You Go | Speech synthesis - Neural2 voices (free tier) | 1M characters/month | $0 | source |
| Pay As You Go | Speech synthesis - Neural2 voices | 1M characters | $16 | source |
| Pay As You Go | Speech synthesis - Polyglot voices | 1M characters | $16 | source |
| Pay As You Go | Speech synthesis - Chirp 3 HD voices (free tier) | 1M characters/month | $0 | source |
| Pay As You Go | Speech synthesis - Chirp 3 HD voices | 1M characters | $30 | source |
| Pay As You Go | Speech synthesis - Studio voices (free tier) | 1M characters/month | $0 | source |
| Pay As You Go | Speech synthesis - Studio voices | 1M characters | $160 | source |
| Pay As You Go | Instant Custom Voice synthesis (Chirp 3) | 1M characters | $60 | source |
| Pay As You Go | Gemini 2.5 Flash TTS - input tokens | 1M tokens | $0.3 | source |
| Pay As You Go | Gemini 2.5 Flash TTS - audio output tokens | 1M tokens | $2.5 | source |
| Pay As You Go | Gemini 2.5 Pro TTS - input tokens | 1M tokens | $1 | source |
| Pay As You Go | Gemini 2.5 Pro TTS - audio output tokens | 1M tokens | $20 | source |
Capabilities
- Supported actions
- synthesize_speech, streaming_tts, long_audio_synthesis, ssml_support, instant_voice_cloning, multilingual_synthesis, multi_speaker_markup, word_timestamps, custom_pronunciations, speaking_rate_control, pitch_control, volume_control, list_voices, async_long_audio_synthesis, multi_speaker_audio [6]
- Regions
- global, us (multi-region), eu (multi-region), us-central1, asia-northeast1 (Tokyo), australia-southeast1 (Sydney), asia-south1 (Mumbai), asia-southeast1 (Singapore), asia-northeast3 (Seoul), europe-west2 (London), europe-west3 (Frankfurt), europe-west4 (Netherlands), northamerica-northeast1 [7]
- Languages
- Afrikaans (af-ZA), Arabic (ar-XA), Basque (eu-ES), Bengali (bn-IN), Bulgarian (bg-BG), Catalan (ca-ES), Chinese - Hong Kong (yue-HK), Croatian (hr-HR), Czech (cs-CZ), Danish (da-DK), Dutch - Belgium (nl-BE), Dutch - Netherlands (nl-NL), English - Australia (en-AU), English - India (en-IN), English - UK (en-GB), English - US (en-US), Estonian (et-EE), Filipino (fil-PH), Finnish (fi-FI), French - Canada (fr-CA), French - France (fr-FR), Galician (gl-ES), German (de-DE), Greek (el-GR), Hebrew (he-IL), Hindi (hi-IN), Hungarian (hu-HU), Indonesian (id-ID), Italian (it-IT), Japanese (ja-JP), Korean (ko-KR), Latvian (lv-LV), Lithuanian (lt-LT), Mandarin Chinese (cmn-CN), Mandarin Chinese - Taiwan (cmn-TW), Norwegian (nb-NO), Polish (pl-PL), Portuguese - Brazil (pt-BR), Portuguese - Portugal (pt-PT), Punjabi - India (pa-IN, Preview), Romanian (ro-RO), Russian (ru-RU), Serbian (sr-RS), Slovak (sk-SK), Slovenian (sl-SI), Spanish - Spain (es-ES), Spanish - US (es-US), Swedish (sv-SE), Thai (th-TH), Turkish (tr-TR), Ukrainian (uk-UA), Vietnamese (vi-VN), 80+ languages and variants via Gemini-TTS models [8]
- Input types
- plain text, SSML [9]
- Output types
- MP3, LINEAR16 (WAV), OGG_OPUS, MULAW, ALAW, PCM, streaming audio chunks [10]
- Webhooks
- ✗ No
- Sandbox / test mode
- ✗ No [11]
- SDK languages
- Python, Node.js, Java, Go, C#, PHP, Ruby, C++ [12]
- MCP server
- ✗ No [13]
Trust & compliance
- SOC 2
- SOC 2 Type II [14]
- HIPAA
- ✓ Yes [15]
- GDPR
- ✓ Yes [16]
- ISO 27001
- ✓ Yes [17]
- PCI DSS
- – Unknown [18]
- Published SLA
- ✓ Yes [19]
- Rate limits
- Maximum request size: 5,000 bytes per request (cannot be increased). Requests per minute (RPM) by voice type: Chirp 3 HD: 200 RPM; Chirp Voice Cloning: 30 RPM; Neural2: 1,000 RPM; Polyglot: 1,000 RPM; Studio: 500 RPM; Long Audio Synthesis: 100 RPM; General (all other voices): 1,000 RPM. Concurrent streaming sessions: 100 per project. Voice cloning key generation: 10 requests per minute. Gemini-2.5-flash-tts: 150 QPM; Gemini-2.5-pro-tts: 125 QPM. [20]
- Known restrictions
- Cloud TTS does not support all SSML elements for all available languages., Instant Voice Cloning (Chirp 3) is currently restricted to allow-listed users; contact sales to request access., Voice cloning requires consent audio recording with the statement: 'I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model.', Maximum request size of 5,000 bytes cannot be increased., Chirp 3: HD voices are out of scope for regionalization and data residency despite being available in eu/us endpoints., Reference audio for instant voice cloning must be up to 10 seconds, single-channel, with minimal background noise. [21]
Developer surface
Integration
- API style
- rest
- Base URL
- https://texttospeech.googleapis.com
- Version
- v1
- Versioning
- url
- Stability
- ga
- Auth methods
- api_key, oauth2
- Idempotency keys
- ✗ No
- Error format
- vendor-specific
- Rate limit
- 1000 / minute
- Python
google-cloud-texttospeech· repo - Node.js
@google-cloud/text-to-speech· repo - Java
com.google.cloud:google-cloud-texttospeech· repo - Go
cloud.google.com/go/texttospeech/apiv1· repo - C#
Google.Cloud.TextToSpeech.V1· repo - PHP
google/cloud-text-to-speech· repo - Ruby
google-cloud-text_to_speech· repo - C++ · repo
Adoption & maturity
- Launched
- 2017-11-10
- GA
- 2018-08-28
- Notable customers
- Ingram Content Group
Other Text-to-Speech APIs
ElevenLabs Text to Speech
"Text to Speech with high quality, human-like AI voices"
Azure AI Text to Speech
"Text to speech enables your applications, tools, or devices to convert text into human like synthesized speech. The text to speech capability is also known as speech synthesis. Use human like standard voices out of the box, or create a custom voice that's unique to your product or brand."
Amazon Polly
"Amazon Polly is a cloud service that converts text into lifelike speech. You can use Amazon Polly to develop applications that increase engagement and accessibility."
Cartesia (Sonic)
"The fastest and most natural text to speech model"
Murf AI
"Enterprise-grade AI voice generation with 150+ natural-sounding voices across 35 languages and 20+ speaking styles."
OpenAI Text to Speech (gpt-4o-mini-tts / tts-1)
"Transform text into lifelike spoken audio" - OpenAI's TTS service enabling blog narration, multilingual audio production, and realtime voice output via gpt-4o-mini-tts, tts-1, and tts-1-hd models.
References
- ↑Description: docs.cloud.google.com
- ↑Pricing model: cloud.google.com
- ↑Free tier: cloud.google.com
- ↑Free tier details: cloud.google.com · costbench.com
- ↑Self-serve signup: docs.cloud.google.com
- ↑Supported actions: docs.cloud.google.com · docs.cloud.google.com · docs.cloud.google.com
- ↑Regions: docs.cloud.google.com
- ↑Languages: docs.cloud.google.com · docs.cloud.google.com
- ↑Input types: docs.cloud.google.com
- ↑Output types: docs.cloud.google.com · docs.cloud.google.com
- ↑Sandbox: cloud.google.com
- ↑SDK languages: docs.cloud.google.com
- ↑MCP server: docs.cloud.google.com
- ↑SOC 2: cloud.google.com
- ↑HIPAA: cloud.google.com
- ↑GDPR: cloud.google.com
- ↑ISO 27001: cloud.google.com
- ↑PCI DSS: cloud.google.com
- ↑Published SLA: cloud.google.com
- ↑Rate limits: docs.cloud.google.com · docs.cloud.google.com
- ↑Known restrictions: docs.cloud.google.com · docs.cloud.google.com
Change history
- 2026-06-21 Capabilities: {} → {"ssml":true,"streaming":true,"multilingual":true,"voice_cloning":true,"word_ti…
- 2026-06-21 Summary Md: (none) → Google Cloud Text-to-Speech converts text or SSML input into natural-sounding a…
- 2026-06-21 Score Docs Quality: (none) → 15
- 2026-06-21 Score Procurement Friction: (none) → 100
- 2026-06-21 Score Trust Readiness: (none) → 90
- 2026-06-21 Best For: (none) → Prototypes and side projects - free to start, no sales call, Regulated or enter…
- 2026-06-21 Scoring Methodology: (none) → Scores are computed deterministically from this profile's published, sourced fi…
- 2026-06-21 Score Agent Friendliness: (none) → 20
- 2026-06-21 Score Pricing Transparency: (none) → 100
- 2026-06-21 Score Setup Speed: (none) → 85
- 2026-06-21 Llms Txt Present: (none) → No
- 2026-06-21 Has Structured Data: (none) → No
- 2026-06-21 Robots Allows Agents: (none) → Yes
- 2026-06-21 Status Page URL: (none) → https://status.cloud.google.com
- 2026-06-21 Docs URL: (none) → https://docs.cloud.google.com/?hl=zh-tw
- 2026-06-21 Rendering: (none) → static
- 2026-06-21 SDK Packages: set to Python, Node.js, Java, Go, C#, PHP, Ruby, C++
- 2026-06-21 MCP Server Available: set to No
- 2026-06-21 Pricing Model: set to usage_based
- 2026-06-21 Has Published Pricing: set to Yes
- 2026-06-21 Free Tier Available: set to Yes
- 2026-06-21 Free Tier Details: set to Recurring monthly free allowance: 4 million characters/month for Standard voice…
- 2026-06-21 Self Serve Signup: set to Yes
- 2026-06-21 Requires Sales Call: set to No
- 2026-06-21 Enterprise Plan Available: set to Yes
- 2026-06-21 SOC 2: set to type_2
- 2026-06-21 HIPAA: set to Yes
- 2026-06-21 GDPR: set to Yes
- 2026-06-21 ISO 27001: set to Yes
- 2026-06-21 SLA Published: set to Yes
- 2026-06-21 SLA URL: set to https://cloud.google.com/text-to-speech/sla
- 2026-06-21 Data Retention Policy URL: set to https://cloud.google.com/terms/cloud-privacy-notice
- 2026-06-21 Documented Rate Limits: set to Maximum request size: 5,000 bytes per request (cannot be increased). Requests p…
- 2026-06-21 Rate Limit Requests: set to 1000
- 2026-06-21 Rate Limit Window: set to minute
- 2026-06-21 Auth Docs URL: set to https://docs.cloud.google.com/text-to-speech/docs/authentication
- 2026-06-21 API Style: set to rest
- 2026-06-21 Base URL: set to https://texttospeech.googleapis.com
- 2026-06-21 API Version: set to v1
- 2026-06-21 Versioning Scheme: set to url
- 2026-06-21 Stability: set to ga
- 2026-06-21 Deprecation Policy URL: set to https://cloud.google.com/terms/deprecation
- 2026-06-21 Quickstart URL: set to https://docs.cloud.google.com/text-to-speech/docs/create-audio-text-client-libr…
- 2026-06-21 Idempotency Supported: set to No
- 2026-06-21 Error Format: set to vendor-specific
- 2026-06-21 Slug: set to google-text-to-speech
- 2026-06-21 Starting Price Usd: set to 4
- 2026-06-21 Price Basis: set to 1M characters
- 2026-06-21 Free Tier Limit: set to 4 million characters/month (Standard voices); 1 million characters/month (WaveN…
- 2026-06-21 Launched At: set to 2017-11-10
Suggest an edit / leave a review
Leave a review or comment
curl -X POST https://apio.sh/api/feedback/google-text-to-speech \
-H 'Content-Type: application/json' \
-d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'Suggest a correction to a field (cite a source)
curl -X POST https://apio.sh/api/suggest/google-text-to-speech/FIELD \
-H 'Content-Type: application/json' \
-d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'