Azure AI Text to Speech

"Text to speech enables your applications, tools, or devices to convert text into human like synthesized speech. The text to speech capability is also known as speech synthesis. Use human like standard voices out of the box, or create a custom voice that's unique to your product or brand." [1]

azure.microsoft.com/en-us/products/ai-services/ai-speech · By Microsoft · Agent JSON · Suggest an edit · Last verified 2026-06-21 · Source confidence: high

Azure AI Text to Speech is Microsoft's managed speech synthesis service, suited for voice agents, call center automation, audiobook narration, accessibility tools, and content creation. It offers over 30 deployment regions, a free tier of 500,000 characters per month, and usage-based pricing starting at $15 per million characters for standard voices. SDKs are available for Python, C#, JavaScript, Java, and Go, and the service carries SOC 2 Type 2, HIPAA, GDPR, ISO 27001, and PCI DSS certifications. Custom and personal voice cloning are supported, though professional voice fine-tuning requires limited-access approval.

Best for / Avoid if

Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; AI agents and automation - an agent-ready surface (MCP / llms.txt)

Pricing & procurement

Pricing model
Usage-based [2]
Published pricing
Yes [3]
Free tier
Yes [4]
Free tier details
Free (F0) tier: 0.5 million characters per month for neural text-to-speech (recurring monthly allowance, not a one-time trial). Free tier rate limits are not adjustable.
Self-serve signup
Yes
Requires sales call
No
Enterprise plan
Yes [5]
Published prices
PlanItemPerAmountSource
Free (F0)Speech synthesis - neural voices (recurring monthly allowance)0.5M characters per month$0source
Pay As You Go (S0)Speech synthesis - neural voices (real-time & batch)1M characters$16source
Pay As You Go (S0)Speech synthesis - neural HD voices (real-time & batch)1M characters$22source
Pay As You Go (S0)Speech synthesis - neural voices (long audio creation)1M characters$100source
Pay As You Go (S0)Speech synthesis - custom neural voice (real-time & batch)1M characters$24source
Pay As You Go (S0)Speech synthesis - custom neural HD voice (real-time & batch)1M characters$48source
Pay As You Go (S0)Speech synthesis - custom neural voice (long audio creation)1M characters$100source
Pay As You Go (S0)Custom neural voice model trainingcompute hour$52source
Pay As You Go (S0)Custom neural voice endpoint hostingmodel per hour$4.04source
Pay As You Go (S0)Personal voice synthesis1M characters$24source
Pay As You Go (S0)Personal voice profile storage1,000 profiles per month$600source
Commitment - 80M characters/monthSpeech synthesis - neural voicesmonth (80M characters included)$1024source
Commitment - 400M characters/monthSpeech synthesis - neural voicesmonth (400M characters included)$4160source
Commitment - 2,000M characters/monthSpeech synthesis - neural voicesmonth (2,000M characters included)$16000source
Connected Container - 80M characters/monthSpeech synthesis - neural voicesmonth (80M characters included)$972.8source
Connected Container - 400M characters/monthSpeech synthesis - neural voicesmonth (400M characters included)$3952source
Connected Container - 2,000M characters/monthSpeech synthesis - neural voicesmonth (2,000M characters included)$15200source

Capabilities

  • Real-time streaming
  • Voice cloning
  • Voice design
  • SSML control
  • Multilingual voices
  • Word timestamps
Supported actions
synthesize_speech, streaming_tts, batch_synthesis, ssml_support, word_timestamps, viseme_generation, professional_voice_cloning, personal_voice_cloning, custom_voice_training, multilingual_synthesis, voice_design, audio_content_creation, text_to_speech_avatar, real_time_synthesis, async_long_audio_synthesis [6]
Regions
Australia East, Brazil South, Canada Central, Canada East, Central US, East Asia, East US, East US 2, France Central, Germany West Central, India Central, Italy North, Japan East, Japan West, Korea Central, North Central US, North Europe, Norway East, Qatar Central, South Africa North, South Central US, Southeast Asia, Sweden Central, Switzerland North, Switzerland West, UAE North, UK South, UK West, West Central US, West Europe, West US, West US 2, West US 3, US Gov Arizona, US Gov Virginia [7]
Languages
af-ZA (Afrikaans, South Africa), am-ET (Amharic, Ethiopia), ar-AE (Arabic, UAE), ar-BH (Arabic, Bahrain), ar-DZ (Arabic, Algeria), ar-EG (Arabic, Egypt), ar-IQ (Arabic, Iraq), ar-JO (Arabic, Jordan), ar-KW (Arabic, Kuwait), ar-LB (Arabic, Lebanon), ar-LY (Arabic, Libya), ar-MA (Arabic, Morocco), ar-OM (Arabic, Oman), ar-QA (Arabic, Qatar), ar-SA (Arabic, Saudi Arabia), ar-SY (Arabic, Syria), ar-TN (Arabic, Tunisia), ar-YE (Arabic, Yemen), as-IN (Assamese, India), az-AZ (Azerbaijani, Azerbaijan), bg-BG (Bulgarian, Bulgaria), bn-BD (Bangla, Bangladesh), bn-IN (Bengali, India), bs-BA (Bosnian, Bosnia and Herzegovina), ca-ES (Catalan), cs-CZ (Czech, Czechia), cy-GB (Welsh, United Kingdom), da-DK (Danish, Denmark), de-AT (German, Austria), de-CH (German, Switzerland), de-DE (German, Germany), el-GR (Greek, Greece), en-AU (English, Australia), en-CA (English, Canada), en-GB (English, United Kingdom), en-HK (English, Hong Kong SAR), en-IE (English, Ireland), en-IN (English, India), en-KE (English, Kenya), en-NG (English, Nigeria), en-NZ (English, New Zealand), en-PH (English, Philippines), en-SG (English, Singapore), en-TZ (English, Tanzania), en-US (English, United States), en-ZA (English, South Africa), es-AR (Spanish, Argentina), es-BO (Spanish, Bolivia), es-CL (Spanish, Chile), es-CO (Spanish, Colombia), es-CR (Spanish, Costa Rica), es-CU (Spanish, Cuba), es-DO (Spanish, Dominican Republic), es-EC (Spanish, Ecuador), es-ES (Spanish, Spain), es-GQ (Spanish, Equatorial Guinea), es-GT (Spanish, Guatemala), es-HN (Spanish, Honduras), es-MX (Spanish, Mexico), es-NI (Spanish, Nicaragua), es-PA (Spanish, Panama), es-PE (Spanish, Peru), es-PR (Spanish, Puerto Rico), es-PY (Spanish, Paraguay), es-SV (Spanish, El Salvador), es-US (Spanish, United States), es-UY (Spanish, Uruguay), es-VE (Spanish, Venezuela), et-EE (Estonian, Estonia), eu-ES (Basque), fa-IR (Persian, Iran), fi-FI (Finnish, Finland), fil-PH (Filipino, Philippines), fr-BE (French, Belgium), fr-CA (French, Canada), fr-CH (French, Switzerland), fr-FR (French, France), ga-IE (Irish, Ireland), gl-ES (Galician), gu-IN (Gujarati, India), he-IL (Hebrew, Israel), hi-IN (Hindi, India), hr-HR (Croatian, Croatia), hu-HU (Hungarian, Hungary), hy-AM (Armenian, Armenia), id-ID (Indonesian, Indonesia), is-IS (Icelandic, Iceland), it-IT (Italian, Italy), iu-CANS-CA (Inuktitut Syllabics, Canada), iu-LATN-CA (Inuktitut Latin, Canada), ja-JP (Japanese, Japan), jv-ID (Javanese, Indonesia), ka-GE (Georgian, Georgia), kk-KZ (Kazakh, Kazakhstan), km-KH (Khmer, Cambodia), kn-IN (Kannada, India), ko-KR (Korean, Korea), lo-LA (Lao, Laos), lt-LT (Lithuanian, Lithuania), lv-LV (Latvian, Latvia), mk-MK (Macedonian, North Macedonia), ml-IN (Malayalam, India), mn-MN (Mongolian, Mongolia), mr-IN (Marathi, India), ms-MY (Malay, Malaysia), mt-MT (Maltese, Malta), my-MM (Burmese, Myanmar), nb-NO (Norwegian Bokmål, Norway), ne-NP (Nepali, Nepal), nl-BE (Dutch, Belgium), nl-NL (Dutch, Netherlands), or-IN (Odia, India), pa-IN (Punjabi, India), pl-PL (Polish, Poland), ps-AF (Pashto, Afghanistan), pt-BR (Portuguese, Brazil), pt-PT (Portuguese, Portugal), ro-RO (Romanian, Romania), ru-RU (Russian, Russia), si-LK (Sinhala, Sri Lanka), sk-SK (Slovak, Slovakia), sl-SI (Slovenian, Slovenia), so-SO (Somali, Somalia), sq-AL (Albanian, Albania), sr-LATN-RS (Serbian Latin, Serbia), sr-RS (Serbian Cyrillic, Serbia), su-ID (Sundanese, Indonesia), sv-SE (Swedish, Sweden), sw-KE (Kiswahili, Kenya), sw-TZ (Kiswahili, Tanzania), ta-IN (Tamil, India), ta-LK (Tamil, Sri Lanka), ta-MY (Tamil, Malaysia), ta-SG (Tamil, Singapore), te-IN (Telugu, India), th-TH (Thai, Thailand), tr-TR (Turkish, Türkiye), uk-UA (Ukrainian, Ukraine), ur-IN (Urdu, India), ur-PK (Urdu, Pakistan), uz-UZ (Uzbek, Uzbekistan), vi-VN (Vietnamese, Vietnam), wuu-CN (Chinese Wu, Simplified), yue-CN (Chinese Cantonese, Simplified), zh-CN (Chinese Mandarin, Simplified), zh-HK (Chinese Cantonese, Traditional), zh-TW (Chinese Taiwanese Mandarin, Traditional) [8]
Input types
plain text, SSML (Speech Synthesis Markup Language)
Output types
mp3 (various bitrates), opus (ogg, webm containers), pcm (raw), wav (riff), alaw, mulaw, truesilk, g722, amr-wb [9]
Webhooks
No [10]
Sandbox / test mode
No
SDK languages
Python, C#, JavaScript, Java, Go [11]
MCP server
Yes [12]

Trust & compliance

SOC 2
SOC 2 Type II [13]
HIPAA
Yes [14]
GDPR
Yes [15]
ISO 27001
Yes [16]
PCI DSS
Yes [17]
Published SLA
Yes [18]
Rate limits
Free (F0): 20 transactions per 60 seconds (not adjustable). Standard (S0): 200 transactions per second (TPS) default, adjustable up to 1,000 TPS upon request. Maximum audio length per request: 10 minutes. Maximum SSML message size per WebSocket turn: 64 KB. Maximum distinct voice/audio tags in SSML: 50. HD voice latency: less than 300 ms. [19]
Known restrictions
Custom voice (professional voice fine-tuning) requires limited-access application approval, Chinese characters counted as two characters for billing, including kanji (Japanese), hanja (Korean), hanzi (other languages), HD voices support only a subset of SSML elements (not full SSML), Personal voice does not support BYOS (Bring Your Own Storage), Dragon HD Flash voices only support zh-CN and en-US text, Real-time HD voice synthesis only (no batch synthesis for HD voices), Maximum 10 minutes audio output per real-time synthesis request, Custom voice endpoint hosting billed separately per hour, Voice talent verbal consent recording required before custom voice training [20]

Developer surface

Docs rendering: static · llms.txt present

Integration

API style
rest
Base URL
https://{region}.tts.speech.microsoft.com/cognitiveservices/v1
Version
2024-04-01
Versioning
url
Stability
ga
Auth methods
api_key, jwt
Error format
vendor-specific
Rate limit
200 / second

SDKs

  • Python azure-cognitiveservices-speech · repo
  • C# Microsoft.CognitiveServices.Speech · repo
  • JavaScript microsoft-cognitiveservices-speech-sdk · repo
  • Java com.microsoft.cognitiveservices.speech:client-sdk · repo
  • Go github.com/Microsoft/cognitive-services-speech-sdk-go · repo

Adoption & maturity

Launched
2018-09-24
GA
2018-11-01

Other Text-to-Speech APIs

  • ElevenLabs Text to Speech

    "Text to Speech with high quality, human-like AI voices"

    Hybrid · free tier · public pricing · self-serve

  • Amazon Polly

    "Amazon Polly is a cloud service that converts text into lifelike speech. You can use Amazon Polly to develop applications that increase engagement and accessibility."

    Usage · free tier · public pricing · self-serve

  • Google Cloud Text-to-Speech

    "Cloud Text-to-Speech converts text or Speech Synthesis Markup Language (SSML) input into audio data of natural human speech."

    Usage · free tier · public pricing · self-serve

  • Cartesia (Sonic)

    "The fastest and most natural text to speech model"

    Hybrid · free tier · public pricing · self-serve

  • Murf AI

    "Enterprise-grade AI voice generation with 150+ natural-sounding voices across 35 languages and 20+ speaking styles."

    Usage · public pricing · self-serve

  • OpenAI Text to Speech (gpt-4o-mini-tts / tts-1)

    "Transform text into lifelike spoken audio" - OpenAI's TTS service enabling blog narration, multilingual audio production, and realtime voice output via gpt-4o-mini-tts, tts-1, and tts-1-hd models.

    Usage · public pricing · self-serve

Azure AI Text to Speech alternatives · Azure AI Text to Speech vs ElevenLabs Text to Speech · All Text-to-Speech APIs APIs

References

Change history

Every field change, who made it, and when - from our audited data pipeline and editors.

  1. 2026-06-21 Capabilities: {}{"ssml":true,"streaming":true,"multilingual":true,"voice_design":true,"voice_cl…
  2. 2026-06-21 Summary Md: (none)Azure AI Text to Speech is Microsoft's managed speech synthesis service, suited…
  3. 2026-06-21 Score Docs Quality: (none)25
  4. 2026-06-21 Score Procurement Friction: (none)100
  5. 2026-06-21 Score Trust Readiness: (none)100
  6. 2026-06-21 Best For: (none)Prototypes and side projects - free to start, no sales call, Regulated or enter…
  7. 2026-06-21 Scoring Methodology: (none)Scores are computed deterministically from this profile's published, sourced fi…
  8. 2026-06-21 Score Agent Friendliness: (none)65
  9. 2026-06-21 Score Pricing Transparency: (none)100
  10. 2026-06-21 Score Setup Speed: (none)85
  11. 2026-06-21 Llms Txt Present: (none)Yes
  12. 2026-06-21 Rendering: (none)static
  13. 2026-06-21 Has Structured Data: (none)Yes
  14. 2026-06-21 Robots Allows Agents: (none)Yes
  15. 2026-06-21 Docs URL: (none)https://azure.microsoft.com/en-us/resources/developers/
  16. 2026-06-21 Llms Txt URL: (none)https://azure.microsoft.com/llms.txt
  17. 2026-06-21 Pricing Model: set to usage_based
  18. 2026-06-21 Has Published Pricing: set to Yes
  19. 2026-06-21 Free Tier Available: set to Yes
  20. 2026-06-21 Free Tier Details: set to Free (F0) tier: 0.5 million characters per month for neural text-to-speech (rec…
  21. 2026-06-21 Self Serve Signup: set to Yes
  22. 2026-06-21 Requires Sales Call: set to No
  23. 2026-06-21 Enterprise Plan Available: set to Yes
  24. 2026-06-21 SOC 2: set to type_2
  25. 2026-06-21 HIPAA: set to Yes
  26. 2026-06-21 GDPR: set to Yes
  27. 2026-06-21 ISO 27001: set to Yes
  28. 2026-06-21 PCI DSS: set to Yes
  29. 2026-06-21 SLA Published: set to Yes
  30. 2026-06-21 SLA URL: set to https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-…
  31. 2026-06-21 Data Retention Policy URL: set to https://learn.microsoft.com/en-us/azure/foundry/responsible-ai/speech-service/t…
  32. 2026-06-21 Documented Rate Limits: set to Free (F0): 20 transactions per 60 seconds (not adjustable). Standard (S0): 200 …
  33. 2026-06-21 Rate Limit Requests: set to 200
  34. 2026-06-21 Rate Limit Window: set to second
  35. 2026-06-21 Known Restrictions: set to Custom voice (professional voice fine-tuning) requires limited-access applicati…
  36. 2026-06-21 Auth Methods: set to api_key, jwt
  37. 2026-06-21 Auth Docs URL: set to https://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-text-to…
  38. 2026-06-21 API Style: set to rest
  39. 2026-06-21 Base URL: set to https://{region}.tts.speech.microsoft.com/cognitiveservices/v1
  40. 2026-06-21 API Version: set to 2024-04-01
  41. 2026-06-21 Versioning Scheme: set to url
  42. 2026-06-21 Stability: set to ga
  43. 2026-06-21 MCP URL: set to https://github.com/microsoft/mcp/tree/main/servers/Azure.Mcp.Server
  44. 2026-06-21 Quickstart URL: set to https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-…
  45. 2026-06-21 Error Format: set to vendor-specific
  46. 2026-06-21 Slug: set to azure-text-to-speech
  47. 2026-06-21 Starting Price Usd: set to 15
  48. 2026-06-21 Price Basis: set to 1M characters
  49. 2026-06-21 Free Tier Limit: set to 500,000 characters/month
  50. 2026-06-21 Launched At: set to 2018-09-24

Suggest an edit / leave a review

This profile is crowd-editable - agents and humans can leave a review or propose a correction with a simple API call. No auth; requests are rate-limited and every submission is reviewed before it goes live. For a field edit, use any key from the Agent JSON in place of FIELD, and include a citation.

Leave a review or comment

curl -X POST https://apio.sh/api/feedback/azure-text-to-speech \
  -H 'Content-Type: application/json' \
  -d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'

Suggest a correction to a field (cite a source)

curl -X POST https://apio.sh/api/suggest/azure-text-to-speech/FIELD \
  -H 'Content-Type: application/json' \
  -d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'

All the ways to contribute →