Azure AI Text to Speech
"Text to speech enables your applications, tools, or devices to convert text into human like synthesized speech. The text to speech capability is also known as speech synthesis. Use human like standard voices out of the box, or create a custom voice that's unique to your product or brand." [1]
Azure AI Text to Speech is Microsoft's managed speech synthesis service, suited for voice agents, call center automation, audiobook narration, accessibility tools, and content creation. It offers over 30 deployment regions, a free tier of 500,000 characters per month, and usage-based pricing starting at $15 per million characters for standard voices. SDKs are available for Python, C#, JavaScript, Java, and Go, and the service carries SOC 2 Type 2, HIPAA, GDPR, ISO 27001, and PCI DSS certifications. Custom and personal voice cloning are supported, though professional voice fine-tuning requires limited-access approval.
Best for / Avoid if
Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; AI agents and automation - an agent-ready surface (MCP / llms.txt)
Pricing & procurement
- Pricing model
- Usage-based [2]
- Published pricing
- ✓ Yes [3]
- Free tier
- ✓ Yes [4]
- Free tier details
- Free (F0) tier: 0.5 million characters per month for neural text-to-speech (recurring monthly allowance, not a one-time trial). Free tier rate limits are not adjustable.
- Self-serve signup
- ✓ Yes
- Requires sales call
- ✗ No
- Enterprise plan
- ✓ Yes [5]
| Plan | Item | Per | Amount | Source |
|---|---|---|---|---|
| Free (F0) | Speech synthesis - neural voices (recurring monthly allowance) | 0.5M characters per month | $0 | source |
| Pay As You Go (S0) | Speech synthesis - neural voices (real-time & batch) | 1M characters | $16 | source |
| Pay As You Go (S0) | Speech synthesis - neural HD voices (real-time & batch) | 1M characters | $22 | source |
| Pay As You Go (S0) | Speech synthesis - neural voices (long audio creation) | 1M characters | $100 | source |
| Pay As You Go (S0) | Speech synthesis - custom neural voice (real-time & batch) | 1M characters | $24 | source |
| Pay As You Go (S0) | Speech synthesis - custom neural HD voice (real-time & batch) | 1M characters | $48 | source |
| Pay As You Go (S0) | Speech synthesis - custom neural voice (long audio creation) | 1M characters | $100 | source |
| Pay As You Go (S0) | Custom neural voice model training | compute hour | $52 | source |
| Pay As You Go (S0) | Custom neural voice endpoint hosting | model per hour | $4.04 | source |
| Pay As You Go (S0) | Personal voice synthesis | 1M characters | $24 | source |
| Pay As You Go (S0) | Personal voice profile storage | 1,000 profiles per month | $600 | source |
| Commitment - 80M characters/month | Speech synthesis - neural voices | month (80M characters included) | $1024 | source |
| Commitment - 400M characters/month | Speech synthesis - neural voices | month (400M characters included) | $4160 | source |
| Commitment - 2,000M characters/month | Speech synthesis - neural voices | month (2,000M characters included) | $16000 | source |
| Connected Container - 80M characters/month | Speech synthesis - neural voices | month (80M characters included) | $972.8 | source |
| Connected Container - 400M characters/month | Speech synthesis - neural voices | month (400M characters included) | $3952 | source |
| Connected Container - 2,000M characters/month | Speech synthesis - neural voices | month (2,000M characters included) | $15200 | source |
Capabilities
- Supported actions
- synthesize_speech, streaming_tts, batch_synthesis, ssml_support, word_timestamps, viseme_generation, professional_voice_cloning, personal_voice_cloning, custom_voice_training, multilingual_synthesis, voice_design, audio_content_creation, text_to_speech_avatar, real_time_synthesis, async_long_audio_synthesis [6]
- Regions
- Australia East, Brazil South, Canada Central, Canada East, Central US, East Asia, East US, East US 2, France Central, Germany West Central, India Central, Italy North, Japan East, Japan West, Korea Central, North Central US, North Europe, Norway East, Qatar Central, South Africa North, South Central US, Southeast Asia, Sweden Central, Switzerland North, Switzerland West, UAE North, UK South, UK West, West Central US, West Europe, West US, West US 2, West US 3, US Gov Arizona, US Gov Virginia [7]
- Languages
- af-ZA (Afrikaans, South Africa), am-ET (Amharic, Ethiopia), ar-AE (Arabic, UAE), ar-BH (Arabic, Bahrain), ar-DZ (Arabic, Algeria), ar-EG (Arabic, Egypt), ar-IQ (Arabic, Iraq), ar-JO (Arabic, Jordan), ar-KW (Arabic, Kuwait), ar-LB (Arabic, Lebanon), ar-LY (Arabic, Libya), ar-MA (Arabic, Morocco), ar-OM (Arabic, Oman), ar-QA (Arabic, Qatar), ar-SA (Arabic, Saudi Arabia), ar-SY (Arabic, Syria), ar-TN (Arabic, Tunisia), ar-YE (Arabic, Yemen), as-IN (Assamese, India), az-AZ (Azerbaijani, Azerbaijan), bg-BG (Bulgarian, Bulgaria), bn-BD (Bangla, Bangladesh), bn-IN (Bengali, India), bs-BA (Bosnian, Bosnia and Herzegovina), ca-ES (Catalan), cs-CZ (Czech, Czechia), cy-GB (Welsh, United Kingdom), da-DK (Danish, Denmark), de-AT (German, Austria), de-CH (German, Switzerland), de-DE (German, Germany), el-GR (Greek, Greece), en-AU (English, Australia), en-CA (English, Canada), en-GB (English, United Kingdom), en-HK (English, Hong Kong SAR), en-IE (English, Ireland), en-IN (English, India), en-KE (English, Kenya), en-NG (English, Nigeria), en-NZ (English, New Zealand), en-PH (English, Philippines), en-SG (English, Singapore), en-TZ (English, Tanzania), en-US (English, United States), en-ZA (English, South Africa), es-AR (Spanish, Argentina), es-BO (Spanish, Bolivia), es-CL (Spanish, Chile), es-CO (Spanish, Colombia), es-CR (Spanish, Costa Rica), es-CU (Spanish, Cuba), es-DO (Spanish, Dominican Republic), es-EC (Spanish, Ecuador), es-ES (Spanish, Spain), es-GQ (Spanish, Equatorial Guinea), es-GT (Spanish, Guatemala), es-HN (Spanish, Honduras), es-MX (Spanish, Mexico), es-NI (Spanish, Nicaragua), es-PA (Spanish, Panama), es-PE (Spanish, Peru), es-PR (Spanish, Puerto Rico), es-PY (Spanish, Paraguay), es-SV (Spanish, El Salvador), es-US (Spanish, United States), es-UY (Spanish, Uruguay), es-VE (Spanish, Venezuela), et-EE (Estonian, Estonia), eu-ES (Basque), fa-IR (Persian, Iran), fi-FI (Finnish, Finland), fil-PH (Filipino, Philippines), fr-BE (French, Belgium), fr-CA (French, Canada), fr-CH (French, Switzerland), fr-FR (French, France), ga-IE (Irish, Ireland), gl-ES (Galician), gu-IN (Gujarati, India), he-IL (Hebrew, Israel), hi-IN (Hindi, India), hr-HR (Croatian, Croatia), hu-HU (Hungarian, Hungary), hy-AM (Armenian, Armenia), id-ID (Indonesian, Indonesia), is-IS (Icelandic, Iceland), it-IT (Italian, Italy), iu-CANS-CA (Inuktitut Syllabics, Canada), iu-LATN-CA (Inuktitut Latin, Canada), ja-JP (Japanese, Japan), jv-ID (Javanese, Indonesia), ka-GE (Georgian, Georgia), kk-KZ (Kazakh, Kazakhstan), km-KH (Khmer, Cambodia), kn-IN (Kannada, India), ko-KR (Korean, Korea), lo-LA (Lao, Laos), lt-LT (Lithuanian, Lithuania), lv-LV (Latvian, Latvia), mk-MK (Macedonian, North Macedonia), ml-IN (Malayalam, India), mn-MN (Mongolian, Mongolia), mr-IN (Marathi, India), ms-MY (Malay, Malaysia), mt-MT (Maltese, Malta), my-MM (Burmese, Myanmar), nb-NO (Norwegian Bokmål, Norway), ne-NP (Nepali, Nepal), nl-BE (Dutch, Belgium), nl-NL (Dutch, Netherlands), or-IN (Odia, India), pa-IN (Punjabi, India), pl-PL (Polish, Poland), ps-AF (Pashto, Afghanistan), pt-BR (Portuguese, Brazil), pt-PT (Portuguese, Portugal), ro-RO (Romanian, Romania), ru-RU (Russian, Russia), si-LK (Sinhala, Sri Lanka), sk-SK (Slovak, Slovakia), sl-SI (Slovenian, Slovenia), so-SO (Somali, Somalia), sq-AL (Albanian, Albania), sr-LATN-RS (Serbian Latin, Serbia), sr-RS (Serbian Cyrillic, Serbia), su-ID (Sundanese, Indonesia), sv-SE (Swedish, Sweden), sw-KE (Kiswahili, Kenya), sw-TZ (Kiswahili, Tanzania), ta-IN (Tamil, India), ta-LK (Tamil, Sri Lanka), ta-MY (Tamil, Malaysia), ta-SG (Tamil, Singapore), te-IN (Telugu, India), th-TH (Thai, Thailand), tr-TR (Turkish, Türkiye), uk-UA (Ukrainian, Ukraine), ur-IN (Urdu, India), ur-PK (Urdu, Pakistan), uz-UZ (Uzbek, Uzbekistan), vi-VN (Vietnamese, Vietnam), wuu-CN (Chinese Wu, Simplified), yue-CN (Chinese Cantonese, Simplified), zh-CN (Chinese Mandarin, Simplified), zh-HK (Chinese Cantonese, Traditional), zh-TW (Chinese Taiwanese Mandarin, Traditional) [8]
- Input types
- plain text, SSML (Speech Synthesis Markup Language)
- Output types
- mp3 (various bitrates), opus (ogg, webm containers), pcm (raw), wav (riff), alaw, mulaw, truesilk, g722, amr-wb [9]
- Webhooks
- ✗ No [10]
- Sandbox / test mode
- ✗ No
- SDK languages
- Python, C#, JavaScript, Java, Go [11]
- MCP server
- ✓ Yes [12]
Trust & compliance
- SOC 2
- SOC 2 Type II [13]
- HIPAA
- ✓ Yes [14]
- GDPR
- ✓ Yes [15]
- ISO 27001
- ✓ Yes [16]
- PCI DSS
- ✓ Yes [17]
- Published SLA
- ✓ Yes [18]
- Rate limits
- Free (F0): 20 transactions per 60 seconds (not adjustable). Standard (S0): 200 transactions per second (TPS) default, adjustable up to 1,000 TPS upon request. Maximum audio length per request: 10 minutes. Maximum SSML message size per WebSocket turn: 64 KB. Maximum distinct voice/audio tags in SSML: 50. HD voice latency: less than 300 ms. [19]
- Known restrictions
- Custom voice (professional voice fine-tuning) requires limited-access application approval, Chinese characters counted as two characters for billing, including kanji (Japanese), hanja (Korean), hanzi (other languages), HD voices support only a subset of SSML elements (not full SSML), Personal voice does not support BYOS (Bring Your Own Storage), Dragon HD Flash voices only support zh-CN and en-US text, Real-time HD voice synthesis only (no batch synthesis for HD voices), Maximum 10 minutes audio output per real-time synthesis request, Custom voice endpoint hosting billed separately per hour, Voice talent verbal consent recording required before custom voice training [20]
Developer surface
Integration
- API style
- rest
- Base URL
- https://{region}.tts.speech.microsoft.com/cognitiveservices/v1
- Version
- 2024-04-01
- Versioning
- url
- Stability
- ga
- Auth methods
- api_key, jwt
- Error format
- vendor-specific
- Rate limit
- 200 / second
Adoption & maturity
- Launched
- 2018-09-24
- GA
- 2018-11-01
Other Text-to-Speech APIs
ElevenLabs Text to Speech
"Text to Speech with high quality, human-like AI voices"
Amazon Polly
"Amazon Polly is a cloud service that converts text into lifelike speech. You can use Amazon Polly to develop applications that increase engagement and accessibility."
Google Cloud Text-to-Speech
"Cloud Text-to-Speech converts text or Speech Synthesis Markup Language (SSML) input into audio data of natural human speech."
Cartesia (Sonic)
"The fastest and most natural text to speech model"
Murf AI
"Enterprise-grade AI voice generation with 150+ natural-sounding voices across 35 languages and 20+ speaking styles."
OpenAI Text to Speech (gpt-4o-mini-tts / tts-1)
"Transform text into lifelike spoken audio" - OpenAI's TTS service enabling blog narration, multilingual audio production, and realtime voice output via gpt-4o-mini-tts, tts-1, and tts-1-hd models.
References
- ↑Description: learn.microsoft.com
- ↑Pricing model: azure.microsoft.com · learn.microsoft.com
- ↑Published pricing: azure.microsoft.com
- ↑Free tier: speechify.com · learn.microsoft.com
- ↑Enterprise plan: learn.microsoft.com
- ↑Supported actions: learn.microsoft.com · learn.microsoft.com
- ↑Regions: learn.microsoft.com
- ↑Languages: learn.microsoft.com · learn.microsoft.com
- ↑Output types: learn.microsoft.com
- ↑Webhooks: learn.microsoft.com
- ↑SDK languages: learn.microsoft.com
- ↑MCP server: learn.microsoft.com
- ↑SOC 2: azure.microsoft.com
- ↑HIPAA: learn.microsoft.com
- ↑GDPR: learn.microsoft.com
- ↑ISO 27001: learn.microsoft.com
- ↑PCI DSS: learn.microsoft.com · learn.microsoft.com
- ↑Published SLA: azure.microsoft.com
- ↑Rate limits: learn.microsoft.com · learn.microsoft.com
- ↑Known restrictions: learn.microsoft.com · learn.microsoft.com
Change history
- 2026-06-21 Capabilities: {} → {"ssml":true,"streaming":true,"multilingual":true,"voice_design":true,"voice_cl…
- 2026-06-21 Summary Md: (none) → Azure AI Text to Speech is Microsoft's managed speech synthesis service, suited…
- 2026-06-21 Score Docs Quality: (none) → 25
- 2026-06-21 Score Procurement Friction: (none) → 100
- 2026-06-21 Score Trust Readiness: (none) → 100
- 2026-06-21 Best For: (none) → Prototypes and side projects - free to start, no sales call, Regulated or enter…
- 2026-06-21 Scoring Methodology: (none) → Scores are computed deterministically from this profile's published, sourced fi…
- 2026-06-21 Score Agent Friendliness: (none) → 65
- 2026-06-21 Score Pricing Transparency: (none) → 100
- 2026-06-21 Score Setup Speed: (none) → 85
- 2026-06-21 Llms Txt Present: (none) → Yes
- 2026-06-21 Rendering: (none) → static
- 2026-06-21 Has Structured Data: (none) → Yes
- 2026-06-21 Robots Allows Agents: (none) → Yes
- 2026-06-21 Docs URL: (none) → https://azure.microsoft.com/en-us/resources/developers/
- 2026-06-21 Llms Txt URL: (none) → https://azure.microsoft.com/llms.txt
- 2026-06-21 Pricing Model: set to usage_based
- 2026-06-21 Has Published Pricing: set to Yes
- 2026-06-21 Free Tier Available: set to Yes
- 2026-06-21 Free Tier Details: set to Free (F0) tier: 0.5 million characters per month for neural text-to-speech (rec…
- 2026-06-21 Self Serve Signup: set to Yes
- 2026-06-21 Requires Sales Call: set to No
- 2026-06-21 Enterprise Plan Available: set to Yes
- 2026-06-21 SOC 2: set to type_2
- 2026-06-21 HIPAA: set to Yes
- 2026-06-21 GDPR: set to Yes
- 2026-06-21 ISO 27001: set to Yes
- 2026-06-21 PCI DSS: set to Yes
- 2026-06-21 SLA Published: set to Yes
- 2026-06-21 SLA URL: set to https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-…
- 2026-06-21 Data Retention Policy URL: set to https://learn.microsoft.com/en-us/azure/foundry/responsible-ai/speech-service/t…
- 2026-06-21 Documented Rate Limits: set to Free (F0): 20 transactions per 60 seconds (not adjustable). Standard (S0): 200 …
- 2026-06-21 Rate Limit Requests: set to 200
- 2026-06-21 Rate Limit Window: set to second
- 2026-06-21 Known Restrictions: set to Custom voice (professional voice fine-tuning) requires limited-access applicati…
- 2026-06-21 Auth Methods: set to api_key, jwt
- 2026-06-21 Auth Docs URL: set to https://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-text-to…
- 2026-06-21 API Style: set to rest
- 2026-06-21 Base URL: set to https://{region}.tts.speech.microsoft.com/cognitiveservices/v1
- 2026-06-21 API Version: set to 2024-04-01
- 2026-06-21 Versioning Scheme: set to url
- 2026-06-21 Stability: set to ga
- 2026-06-21 MCP URL: set to https://github.com/microsoft/mcp/tree/main/servers/Azure.Mcp.Server
- 2026-06-21 Quickstart URL: set to https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-…
- 2026-06-21 Error Format: set to vendor-specific
- 2026-06-21 Slug: set to azure-text-to-speech
- 2026-06-21 Starting Price Usd: set to 15
- 2026-06-21 Price Basis: set to 1M characters
- 2026-06-21 Free Tier Limit: set to 500,000 characters/month
- 2026-06-21 Launched At: set to 2018-09-24
Suggest an edit / leave a review
Leave a review or comment
curl -X POST https://apio.sh/api/feedback/azure-text-to-speech \
-H 'Content-Type: application/json' \
-d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'Suggest a correction to a field (cite a source)
curl -X POST https://apio.sh/api/suggest/azure-text-to-speech/FIELD \
-H 'Content-Type: application/json' \
-d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'