IBM watsonx Speech to Text
"IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics." [1]
IBM watsonx Speech to Text is a REST API for fast, accurate transcription supporting batch, streaming, and WebSocket modes, aimed at customer self-service, call-center analytics, captioning, and accessibility applications. Pricing starts at $0.02 per minute with a 500-minute free tier and no sales call required, scaling to enterprise plans with unlimited concurrency. Deployments are available across seven global regions, SDKs cover Python, Node.js, Java, Swift, and Go, and the service holds SOC 2 Type II, HIPAA, GDPR, and ISO 27001 certifications.
Best for / Avoid if
Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; Teams needing broad API coverage out of the box
Pricing & procurement
- Pricing model
- Usage-based [2]
- Published pricing
- ✓ Yes
- Free tier
- ✓ Yes [3]
- Free tier details
- Lite plan: 500 minutes per month at no cost (recurring monthly allowance). No customization access on Lite; service deleted after 30 days of inactivity. Plus plan (paid, no base fee): first 1–999,999 minutes at $0.02 USD/minute, 1,000,000+ minutes at $0.01 USD/minute. Premium plan (requires sales contact) includes first 150,000 minutes/month at no charge; pricing beyond that is not publicly disclosed.
- Self-serve signup
- ✓ Yes [4]
- Requires sales call
- ✗ No
- Enterprise plan
- ✓ Yes [5]
| Plan | Item | Per | Amount | Source |
|---|---|---|---|---|
| Lite | Speech recognition | 500 minutes per month (recurring free allowance) | $0 | source |
| Plus | Speech recognition | minute (1–999,999 minutes/month) | $0.02 | source |
| Plus | Speech recognition | minute (1,000,000+ minutes/month) | $0.01 | source |
| Premium | Speech recognition (first 150,000 minutes/month included) | month (first 150,000 minutes at no charge; beyond that requires sales contact) | $0 | source |
Capabilities
- Supported actions
- transcribe_batch, transcribe_streaming, transcribe_websocket, transcribe_async_http, speaker_diarization, word_timestamps, interim_results, keyword_spotting, word_confidence_scores, smart_formatting, profanity_filtering, custom_language_model, custom_acoustic_model, language_identification, speech_activity_detection, transcript_enrichment, speech_begin_event_detection [6]
- Regions
- us-south (Dallas), us-east (Washington DC), eu-de (Frankfurt), eu-gb (London), au-syd (Sydney), jp-tok (Tokyo), kr-seo (Seoul) [7]
- Languages
- English (US), English (UK), English (Australian), English (Indian), French (France), French (Canadian), German, Spanish (Castilian), Spanish (Argentinian), Spanish (Chilean), Spanish (Colombian), Spanish (Mexican), Spanish (Peruvian), Brazilian Portuguese, Japanese, Italian, Dutch, Swedish, Arabic [8]
- Input types
- audio/wav, audio/mp3, audio/mpeg, audio/flac, audio/ogg, audio/ogg;codecs=opus, audio/ogg;codecs=vorbis, audio/webm, audio/webm;codecs=opus, audio/webm;codecs=vorbis, audio/l16, audio/alaw, audio/mulaw, audio/basic, audio/g729, WebSocket streaming, HTTP REST (batch), Asynchronous HTTP with callback URL [9]
- Output types
- JSON transcript, word timestamps, word confidence scores, speaker labels (diarization), keyword spotting results, interim results, WebVTT captions (via IBM Video Streaming integration) [10]
- Webhooks
- ✓ Yes [11]
- Sandbox / test mode
- ✗ No [12]
- SDK languages
- Python, Node.js, Java, Swift, Go [13]
- MCP server
- ✗ No [14]
Trust & compliance
- SOC 2
- SOC 2 Type II [15]
- HIPAA
- ✓ Yes [16]
- GDPR
- ✓ Yes [17]
- ISO 27001
- ✓ Yes [18]
- PCI DSS
- – Unknown [19]
- Published SLA
- ✓ Yes [20]
- Rate limits
- Lite plan: 500 minutes/month. Plus plan: maximum 100 concurrent transcription requests. Premium plan: unlimited concurrent transcription requests. No explicit per-request rate limit documented publicly beyond concurrency caps. [21]
- Known restrictions
- Lite plan services deleted after 30 days of inactivity, Lite plan has no access to customization (custom language/acoustic models), Standard plan no longer available for new purchases, Smart formatting limited to US English and Spanish, Profanity filter available for US English only, Speaker diarization language support varies by model generation, Audio billed by the minute including silence, Custom model training requires a paid plan (Plus or Premium), Premium plan requires direct IBM sales contact for provisioning [22]
Developer surface
Integration
- API style
- rest
- Base URL
- https://api.{location}.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}
- Version
- v1
- Versioning
- url
- Stability
- ga
- Auth methods
- api_key, oauth2
- Idempotency keys
- ✗ No
- Error format
- vendor-specific
- Webhook signing
- hmac_sha1
Adoption & maturity
- Launched
- 2015-01-01
- GA
- 2024-08-23
- Notable customers
- Citibank, Bradesco, Humana
Other Speech-to-Text & Transcription APIs
ElevenLabs Scribe (Speech to Text)
"Scribe v2 is the most accurate Speech to Text model" offering "real-time Speech to Text in under 150 ms" across "90+ languages."
Azure AI Speech to Text
"Azure Speech in Foundry Tools provides speech to text, text to speech, and other capabilities through a Microsoft Foundry resource. You can transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and conduct live AI voice conversations."
Amazon Transcribe
"Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or to add speech-to-text capabilities to any application."
Google Cloud Speech-to-Text
"Accurate voice typing and transcription powered by Gemini."
AssemblyAI
"Voice AI infrastructure for developers building products that transcribe, understand, and act on speech."
Speechmatics
"Low-latency speech-to-text for multilingual, multi-speaker conversations."
References
- ↑Description: cloud.ibm.com · ibm.com
- ↑Pricing model: github.com · cloud.ibm.com
- ↑Free tier: github.com · cloud.ibm.com
- ↑Self-serve signup: cloud.ibm.com
- ↑Enterprise plan: github.com
- ↑Supported actions: cloud.ibm.com · cloud.ibm.com
- ↑Regions: cloud.ibm.com
- ↑Languages: cloud.ibm.com · github.com
- ↑Input types: github.com
- ↑Output types: cloud.ibm.com
- ↑Webhooks: cloud.ibm.com
- ↑Sandbox: cloud.ibm.com
- ↑SDK languages: cloud.ibm.com · watson-developer-cloud.github.io
- ↑MCP server: github.com
- ↑SOC 2: ibm.com · ibm.com
- ↑HIPAA: github.com · cloud.ibm.com
- ↑GDPR: cloud.ibm.com
- ↑ISO 27001: ibm.com · ibm.com
- ↑PCI DSS: ibm.com
- ↑Published SLA: cloud.ibm.com
- ↑Rate limits: github.com · cloud.ibm.com
- ↑Known restrictions: cloud.ibm.com · github.com
Change history
- 2026-06-21 Capabilities: {} → {"real_time_streaming":true,"speaker_diarization":true}
- 2026-06-21 Summary Md: (none) → IBM watsonx Speech to Text is a REST API for fast, accurate transcription suppo…
- 2026-06-21 Score Setup Speed: (none) → 85
- 2026-06-21 Score Pricing Transparency: (none) → 100
- 2026-06-21 Score Docs Quality: (none) → 35
- 2026-06-21 Score Procurement Friction: (none) → 100
- 2026-06-21 Score Trust Readiness: (none) → 90
- 2026-06-21 Best For: (none) → Prototypes and side projects - free to start, no sales call, Regulated or enter…
- 2026-06-21 Scoring Methodology: (none) → Scores are computed deterministically from this profile's published, sourced fi…
- 2026-06-21 Score Agent Friendliness: (none) → 30
- 2026-06-21 Llms Txt Present: (none) → No
- 2026-06-21 Rendering: (none) → static
- 2026-06-21 Has Structured Data: (none) → Yes
- 2026-06-21 API Reference URL: (none) → https://cloud.ibm.com/apidocs/speech-to-text
- 2026-06-21 Docs URL: (none) → https://developer.ibm.com/
- 2026-06-21 Robots Allows Agents: (none) → Yes
- 2026-06-21 Pricing Model: set to usage_based
- 2026-06-21 Has Published Pricing: set to Yes
- 2026-06-21 Free Tier Available: set to Yes
- 2026-06-21 Free Tier Details: set to Lite plan: 500 minutes per month at no cost (recurring monthly allowance). No c…
- 2026-06-21 Self Serve Signup: set to Yes
- 2026-06-21 Requires Sales Call: set to No
- 2026-06-21 Enterprise Plan Available: set to Yes
- 2026-06-21 SOC 2: set to type_2
- 2026-06-21 HIPAA: set to Yes
- 2026-06-21 GDPR: set to Yes
- 2026-06-21 ISO 27001: set to Yes
- 2026-06-21 SLA Published: set to Yes
- 2026-06-21 SLA URL: set to https://cloud.ibm.com/docs/overview?topic=overview-slas
- 2026-06-21 Data Retention Policy URL: set to https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-information-secu…
- 2026-06-21 Documented Rate Limits: set to Lite plan: 500 minutes/month. Plus plan: maximum 100 concurrent transcription r…
- 2026-06-21 Known Restrictions: set to Lite plan services deleted after 30 days of inactivity, Lite plan has no access…
- 2026-06-21 Auth Methods: set to api_key, oauth2
- 2026-06-21 Auth Docs URL: set to https://cloud.ibm.com/docs/watson?topic=watson-iam
- 2026-06-21 API Style: set to rest
- 2026-06-21 Base URL: set to https://api.{location}.speech-to-text.watson.cloud.ibm.com/instances/{instance_…
- 2026-06-21 API Version: set to v1
- 2026-06-21 Versioning Scheme: set to url
- 2026-06-21 Stability: set to ga
- 2026-06-21 Slug: set to ibm-watson-speech-to-text
- 2026-06-21 Quickstart URL: set to https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-gettingStarted
- 2026-06-21 Idempotency Supported: set to No
- 2026-06-21 Error Format: set to vendor-specific
- 2026-06-21 Webhook Signing: set to hmac_sha1
- 2026-06-21 Webhook Events URL: set to https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async
- 2026-06-21 Requires Verification: set to No
- 2026-06-21 Starting Price Usd: set to 0.02
- 2026-06-21 Price Basis: set to minute
- 2026-06-21 Free Tier Limit: set to 500 minutes/month
- 2026-06-21 Launched At: set to 2015-01-01
Suggest an edit / leave a review
Leave a review or comment
curl -X POST https://apio.sh/api/feedback/ibm-watson-speech-to-text \
-H 'Content-Type: application/json' \
-d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'Suggest a correction to a field (cite a source)
curl -X POST https://apio.sh/api/suggest/ibm-watson-speech-to-text/FIELD \
-H 'Content-Type: application/json' \
-d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'