Google Cloud Text-to-Speech

"Cloud Text-to-Speech converts text or Speech Synthesis Markup Language (SSML) input into audio data of natural human speech." [1]

cloud.google.com/text-to-speech · By Google · Agent JSON · Suggest an edit · Last verified 2026-06-21 · Source confidence: high

Google Cloud Text-to-Speech converts text or SSML input into natural-sounding audio, targeting voice agents, IVR systems, audiobook narration, accessibility tools, and real-time conversational AI. It offers a generous free tier (up to 4 million characters per month for Standard voices, 1 million for WaveNet and Neural2), with paid tiers starting at $4 per million characters on a self-serve, usage-based model. The API ships SDKs for eight languages, supports streaming and long-form synthesis, and carries SOC 2 Type 2, HIPAA, GDPR, and ISO 27001 certifications with a published SLA.

Best for / Avoid if

Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; Teams needing broad API coverage out of the box

Pricing & procurement

Pricing model
Usage-based [2]
Published pricing
Yes
Free tier
Yes [3]
Free tier details
Recurring monthly free allowance: 4 million characters/month for Standard voices; 1 million characters/month for WaveNet voices; 1 million characters/month for Neural2, Studio, and Chirp 3 HD voices. Separate one-time $300 trial credit (90 days) is also available but is not part of the recurring free tier. [4]
Self-serve signup
Yes [5]
Requires sales call
No
Enterprise plan
Yes
Published prices
PlanItemPerAmountSource
Pay As You GoSpeech synthesis - Standard voices (free tier)4M characters/month$0source
Pay As You GoSpeech synthesis - Standard voices1M characters$4source
Pay As You GoSpeech synthesis - WaveNet voices (free tier)1M characters/month$0source
Pay As You GoSpeech synthesis - WaveNet voices1M characters$16source
Pay As You GoSpeech synthesis - Neural2 voices (free tier)1M characters/month$0source
Pay As You GoSpeech synthesis - Neural2 voices1M characters$16source
Pay As You GoSpeech synthesis - Polyglot voices1M characters$16source
Pay As You GoSpeech synthesis - Chirp 3 HD voices (free tier)1M characters/month$0source
Pay As You GoSpeech synthesis - Chirp 3 HD voices1M characters$30source
Pay As You GoSpeech synthesis - Studio voices (free tier)1M characters/month$0source
Pay As You GoSpeech synthesis - Studio voices1M characters$160source
Pay As You GoInstant Custom Voice synthesis (Chirp 3)1M characters$60source
Pay As You GoGemini 2.5 Flash TTS - input tokens1M tokens$0.3source
Pay As You GoGemini 2.5 Flash TTS - audio output tokens1M tokens$2.5source
Pay As You GoGemini 2.5 Pro TTS - input tokens1M tokens$1source
Pay As You GoGemini 2.5 Pro TTS - audio output tokens1M tokens$20source

Capabilities

  • Real-time streaming
  • Voice cloning
  • SSML control
  • Multilingual voices
  • Word timestamps
Supported actions
synthesize_speech, streaming_tts, long_audio_synthesis, ssml_support, instant_voice_cloning, multilingual_synthesis, multi_speaker_markup, word_timestamps, custom_pronunciations, speaking_rate_control, pitch_control, volume_control, list_voices, async_long_audio_synthesis, multi_speaker_audio [6]
Regions
global, us (multi-region), eu (multi-region), us-central1, asia-northeast1 (Tokyo), australia-southeast1 (Sydney), asia-south1 (Mumbai), asia-southeast1 (Singapore), asia-northeast3 (Seoul), europe-west2 (London), europe-west3 (Frankfurt), europe-west4 (Netherlands), northamerica-northeast1 [7]
Languages
Afrikaans (af-ZA), Arabic (ar-XA), Basque (eu-ES), Bengali (bn-IN), Bulgarian (bg-BG), Catalan (ca-ES), Chinese - Hong Kong (yue-HK), Croatian (hr-HR), Czech (cs-CZ), Danish (da-DK), Dutch - Belgium (nl-BE), Dutch - Netherlands (nl-NL), English - Australia (en-AU), English - India (en-IN), English - UK (en-GB), English - US (en-US), Estonian (et-EE), Filipino (fil-PH), Finnish (fi-FI), French - Canada (fr-CA), French - France (fr-FR), Galician (gl-ES), German (de-DE), Greek (el-GR), Hebrew (he-IL), Hindi (hi-IN), Hungarian (hu-HU), Indonesian (id-ID), Italian (it-IT), Japanese (ja-JP), Korean (ko-KR), Latvian (lv-LV), Lithuanian (lt-LT), Mandarin Chinese (cmn-CN), Mandarin Chinese - Taiwan (cmn-TW), Norwegian (nb-NO), Polish (pl-PL), Portuguese - Brazil (pt-BR), Portuguese - Portugal (pt-PT), Punjabi - India (pa-IN, Preview), Romanian (ro-RO), Russian (ru-RU), Serbian (sr-RS), Slovak (sk-SK), Slovenian (sl-SI), Spanish - Spain (es-ES), Spanish - US (es-US), Swedish (sv-SE), Thai (th-TH), Turkish (tr-TR), Ukrainian (uk-UA), Vietnamese (vi-VN), 80+ languages and variants via Gemini-TTS models [8]
Input types
plain text, SSML [9]
Output types
MP3, LINEAR16 (WAV), OGG_OPUS, MULAW, ALAW, PCM, streaming audio chunks [10]
Webhooks
No
Sandbox / test mode
No [11]
SDK languages
Python, Node.js, Java, Go, C#, PHP, Ruby, C++ [12]
MCP server
No [13]

Trust & compliance

SOC 2
SOC 2 Type II [14]
HIPAA
Yes [15]
GDPR
Yes [16]
ISO 27001
Yes [17]
PCI DSS
Unknown [18]
Published SLA
Yes [19]
Rate limits
Maximum request size: 5,000 bytes per request (cannot be increased). Requests per minute (RPM) by voice type: Chirp 3 HD: 200 RPM; Chirp Voice Cloning: 30 RPM; Neural2: 1,000 RPM; Polyglot: 1,000 RPM; Studio: 500 RPM; Long Audio Synthesis: 100 RPM; General (all other voices): 1,000 RPM. Concurrent streaming sessions: 100 per project. Voice cloning key generation: 10 requests per minute. Gemini-2.5-flash-tts: 150 QPM; Gemini-2.5-pro-tts: 125 QPM. [20]
Known restrictions
Cloud TTS does not support all SSML elements for all available languages., Instant Voice Cloning (Chirp 3) is currently restricted to allow-listed users; contact sales to request access., Voice cloning requires consent audio recording with the statement: 'I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model.', Maximum request size of 5,000 bytes cannot be increased., Chirp 3: HD voices are out of scope for regionalization and data residency despite being available in eu/us endpoints., Reference audio for instant voice cloning must be up to 10 seconds, single-channel, with minimal background noise. [21]

Developer surface

Docs rendering: static

Integration

API style
rest
Base URL
https://texttospeech.googleapis.com
Version
v1
Versioning
url
Stability
ga
Auth methods
api_key, oauth2
Idempotency keys
No
Error format
vendor-specific
Rate limit
1000 / minute

SDKs

  • Python google-cloud-texttospeech · repo
  • Node.js @google-cloud/text-to-speech · repo
  • Java com.google.cloud:google-cloud-texttospeech · repo
  • Go cloud.google.com/go/texttospeech/apiv1 · repo
  • C# Google.Cloud.TextToSpeech.V1 · repo
  • PHP google/cloud-text-to-speech · repo
  • Ruby google-cloud-text_to_speech · repo
  • C++ · repo

Adoption & maturity

Launched
2017-11-10
GA
2018-08-28
Notable customers
Ingram Content Group

Other Text-to-Speech APIs

  • ElevenLabs Text to Speech

    "Text to Speech with high quality, human-like AI voices"

    Hybrid · free tier · public pricing · self-serve

  • Azure AI Text to Speech

    "Text to speech enables your applications, tools, or devices to convert text into human like synthesized speech. The text to speech capability is also known as speech synthesis. Use human like standard voices out of the box, or create a custom voice that's unique to your product or brand."

    Usage · free tier · public pricing · self-serve

  • Amazon Polly

    "Amazon Polly is a cloud service that converts text into lifelike speech. You can use Amazon Polly to develop applications that increase engagement and accessibility."

    Usage · free tier · public pricing · self-serve

  • Cartesia (Sonic)

    "The fastest and most natural text to speech model"

    Hybrid · free tier · public pricing · self-serve

  • Murf AI

    "Enterprise-grade AI voice generation with 150+ natural-sounding voices across 35 languages and 20+ speaking styles."

    Usage · public pricing · self-serve

  • OpenAI Text to Speech (gpt-4o-mini-tts / tts-1)

    "Transform text into lifelike spoken audio" - OpenAI's TTS service enabling blog narration, multilingual audio production, and realtime voice output via gpt-4o-mini-tts, tts-1, and tts-1-hd models.

    Usage · public pricing · self-serve

Google Cloud Text-to-Speech alternatives · Google Cloud Text-to-Speech vs ElevenLabs Text to Speech · All Text-to-Speech APIs APIs

References

Change history

Every field change, who made it, and when - from our audited data pipeline and editors.

  1. 2026-06-21 Capabilities: {}{"ssml":true,"streaming":true,"multilingual":true,"voice_cloning":true,"word_ti…
  2. 2026-06-21 Summary Md: (none)Google Cloud Text-to-Speech converts text or SSML input into natural-sounding a…
  3. 2026-06-21 Score Docs Quality: (none)15
  4. 2026-06-21 Score Procurement Friction: (none)100
  5. 2026-06-21 Score Trust Readiness: (none)90
  6. 2026-06-21 Best For: (none)Prototypes and side projects - free to start, no sales call, Regulated or enter…
  7. 2026-06-21 Scoring Methodology: (none)Scores are computed deterministically from this profile's published, sourced fi…
  8. 2026-06-21 Score Agent Friendliness: (none)20
  9. 2026-06-21 Score Pricing Transparency: (none)100
  10. 2026-06-21 Score Setup Speed: (none)85
  11. 2026-06-21 Llms Txt Present: (none)No
  12. 2026-06-21 Has Structured Data: (none)No
  13. 2026-06-21 Robots Allows Agents: (none)Yes
  14. 2026-06-21 Status Page URL: (none)https://status.cloud.google.com
  15. 2026-06-21 Docs URL: (none)https://docs.cloud.google.com/?hl=zh-tw
  16. 2026-06-21 Rendering: (none)static
  17. 2026-06-21 SDK Packages: set to Python, Node.js, Java, Go, C#, PHP, Ruby, C++
  18. 2026-06-21 MCP Server Available: set to No
  19. 2026-06-21 Pricing Model: set to usage_based
  20. 2026-06-21 Has Published Pricing: set to Yes
  21. 2026-06-21 Free Tier Available: set to Yes
  22. 2026-06-21 Free Tier Details: set to Recurring monthly free allowance: 4 million characters/month for Standard voice…
  23. 2026-06-21 Self Serve Signup: set to Yes
  24. 2026-06-21 Requires Sales Call: set to No
  25. 2026-06-21 Enterprise Plan Available: set to Yes
  26. 2026-06-21 SOC 2: set to type_2
  27. 2026-06-21 HIPAA: set to Yes
  28. 2026-06-21 GDPR: set to Yes
  29. 2026-06-21 ISO 27001: set to Yes
  30. 2026-06-21 SLA Published: set to Yes
  31. 2026-06-21 SLA URL: set to https://cloud.google.com/text-to-speech/sla
  32. 2026-06-21 Data Retention Policy URL: set to https://cloud.google.com/terms/cloud-privacy-notice
  33. 2026-06-21 Documented Rate Limits: set to Maximum request size: 5,000 bytes per request (cannot be increased). Requests p…
  34. 2026-06-21 Rate Limit Requests: set to 1000
  35. 2026-06-21 Rate Limit Window: set to minute
  36. 2026-06-21 Auth Docs URL: set to https://docs.cloud.google.com/text-to-speech/docs/authentication
  37. 2026-06-21 API Style: set to rest
  38. 2026-06-21 Base URL: set to https://texttospeech.googleapis.com
  39. 2026-06-21 API Version: set to v1
  40. 2026-06-21 Versioning Scheme: set to url
  41. 2026-06-21 Stability: set to ga
  42. 2026-06-21 Deprecation Policy URL: set to https://cloud.google.com/terms/deprecation
  43. 2026-06-21 Quickstart URL: set to https://docs.cloud.google.com/text-to-speech/docs/create-audio-text-client-libr…
  44. 2026-06-21 Idempotency Supported: set to No
  45. 2026-06-21 Error Format: set to vendor-specific
  46. 2026-06-21 Slug: set to google-text-to-speech
  47. 2026-06-21 Starting Price Usd: set to 4
  48. 2026-06-21 Price Basis: set to 1M characters
  49. 2026-06-21 Free Tier Limit: set to 4 million characters/month (Standard voices); 1 million characters/month (WaveN…
  50. 2026-06-21 Launched At: set to 2017-11-10

Suggest an edit / leave a review

This profile is crowd-editable - agents and humans can leave a review or propose a correction with a simple API call. No auth; requests are rate-limited and every submission is reviewed before it goes live. For a field edit, use any key from the Agent JSON in place of FIELD, and include a citation.

Leave a review or comment

curl -X POST https://apio.sh/api/feedback/google-text-to-speech \
  -H 'Content-Type: application/json' \
  -d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'

Suggest a correction to a field (cite a source)

curl -X POST https://apio.sh/api/suggest/google-text-to-speech/FIELD \
  -H 'Content-Type: application/json' \
  -d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'

All the ways to contribute →