IBM watsonx Speech to Text

"IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics." [1]

www.ibm.com/products/speech-to-text · By IBM · Agent JSON · Suggest an edit · Last verified 2026-06-21 · Source confidence: high

IBM watsonx Speech to Text is a REST API for fast, accurate transcription supporting batch, streaming, and WebSocket modes, aimed at customer self-service, call-center analytics, captioning, and accessibility applications. Pricing starts at $0.02 per minute with a 500-minute free tier and no sales call required, scaling to enterprise plans with unlimited concurrency. Deployments are available across seven global regions, SDKs cover Python, Node.js, Java, Swift, and Go, and the service holds SOC 2 Type II, HIPAA, GDPR, and ISO 27001 certifications.

Best for / Avoid if

Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; Teams needing broad API coverage out of the box

Pricing & procurement

Pricing model
Usage-based [2]
Published pricing
Yes
Free tier
Yes [3]
Free tier details
Lite plan: 500 minutes per month at no cost (recurring monthly allowance). No customization access on Lite; service deleted after 30 days of inactivity. Plus plan (paid, no base fee): first 1–999,999 minutes at $0.02 USD/minute, 1,000,000+ minutes at $0.01 USD/minute. Premium plan (requires sales contact) includes first 150,000 minutes/month at no charge; pricing beyond that is not publicly disclosed.
Self-serve signup
Yes [4]
Requires sales call
No
Enterprise plan
Yes [5]
Published prices
PlanItemPerAmountSource
LiteSpeech recognition500 minutes per month (recurring free allowance)$0source
PlusSpeech recognitionminute (1–999,999 minutes/month)$0.02source
PlusSpeech recognitionminute (1,000,000+ minutes/month)$0.01source
PremiumSpeech recognition (first 150,000 minutes/month included)month (first 150,000 minutes at no charge; beyond that requires sales contact)$0source

Capabilities

  • Real-time streaming
  • Speaker diarization
Supported actions
transcribe_batch, transcribe_streaming, transcribe_websocket, transcribe_async_http, speaker_diarization, word_timestamps, interim_results, keyword_spotting, word_confidence_scores, smart_formatting, profanity_filtering, custom_language_model, custom_acoustic_model, language_identification, speech_activity_detection, transcript_enrichment, speech_begin_event_detection [6]
Regions
us-south (Dallas), us-east (Washington DC), eu-de (Frankfurt), eu-gb (London), au-syd (Sydney), jp-tok (Tokyo), kr-seo (Seoul) [7]
Languages
English (US), English (UK), English (Australian), English (Indian), French (France), French (Canadian), German, Spanish (Castilian), Spanish (Argentinian), Spanish (Chilean), Spanish (Colombian), Spanish (Mexican), Spanish (Peruvian), Brazilian Portuguese, Japanese, Italian, Dutch, Swedish, Arabic [8]
Input types
audio/wav, audio/mp3, audio/mpeg, audio/flac, audio/ogg, audio/ogg;codecs=opus, audio/ogg;codecs=vorbis, audio/webm, audio/webm;codecs=opus, audio/webm;codecs=vorbis, audio/l16, audio/alaw, audio/mulaw, audio/basic, audio/g729, WebSocket streaming, HTTP REST (batch), Asynchronous HTTP with callback URL [9]
Output types
JSON transcript, word timestamps, word confidence scores, speaker labels (diarization), keyword spotting results, interim results, WebVTT captions (via IBM Video Streaming integration) [10]
Webhooks
Yes [11]
Sandbox / test mode
No [12]
SDK languages
Python, Node.js, Java, Swift, Go [13]
MCP server
No [14]

Trust & compliance

SOC 2
SOC 2 Type II [15]
HIPAA
Yes [16]
GDPR
Yes [17]
ISO 27001
Yes [18]
PCI DSS
Unknown [19]
Published SLA
Yes [20]
Rate limits
Lite plan: 500 minutes/month. Plus plan: maximum 100 concurrent transcription requests. Premium plan: unlimited concurrent transcription requests. No explicit per-request rate limit documented publicly beyond concurrency caps. [21]
Known restrictions
Lite plan services deleted after 30 days of inactivity, Lite plan has no access to customization (custom language/acoustic models), Standard plan no longer available for new purchases, Smart formatting limited to US English and Spanish, Profanity filter available for US English only, Speaker diarization language support varies by model generation, Audio billed by the minute including silence, Custom model training requires a paid plan (Plus or Premium), Premium plan requires direct IBM sales contact for provisioning [22]

Developer surface

Docs rendering: static

Integration

API style
rest
Base URL
https://api.{location}.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}
Version
v1
Versioning
url
Stability
ga
Auth methods
api_key, oauth2
Idempotency keys
No
Error format
vendor-specific
Webhook signing
hmac_sha1

SDKs

  • Python ibm-watson · repo
  • Node.js ibm-watson · repo
  • Java com.ibm.watson:ibm-watson · repo
  • Swift IBMWatsonSpeechToTextV1 · repo
  • Go github.com/watson-developer-cloud/go-sdk/v2 · repo

Adoption & maturity

Launched
2015-01-01
GA
2024-08-23
Notable customers
Citibank, Bradesco, Humana

Other Speech-to-Text & Transcription APIs

  • ElevenLabs Scribe (Speech to Text)

    "Scribe v2 is the most accurate Speech to Text model" offering "real-time Speech to Text in under 150 ms" across "90+ languages."

    Hybrid · free tier · public pricing · self-serve

  • Azure AI Speech to Text

    "Azure Speech in Foundry Tools provides speech to text, text to speech, and other capabilities through a Microsoft Foundry resource. You can transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and conduct live AI voice conversations."

    Usage · free tier · public pricing · self-serve

  • Amazon Transcribe

    "Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or to add speech-to-text capabilities to any application."

    Usage · free tier · public pricing · self-serve

  • Google Cloud Speech-to-Text

    "Accurate voice typing and transcription powered by Gemini."

    Usage · free tier · public pricing · self-serve

  • AssemblyAI

    "Voice AI infrastructure for developers building products that transcribe, understand, and act on speech."

    Usage · public pricing · self-serve

  • Speechmatics

    "Low-latency speech-to-text for multilingual, multi-speaker conversations."

    Usage · free tier · public pricing · self-serve

IBM watsonx Speech to Text alternatives · IBM watsonx Speech to Text vs ElevenLabs Scribe (Speech to Text) · All Speech-to-Text & Transcription APIs APIs

References

Each field above carries a numbered source - hover for a preview, click to jump here.

  1. Description: cloud.ibm.com · ibm.com
  2. Pricing model: github.com · cloud.ibm.com
  3. Free tier: github.com · cloud.ibm.com
  4. Self-serve signup: cloud.ibm.com
  5. Enterprise plan: github.com
  6. Supported actions: cloud.ibm.com · cloud.ibm.com
  7. Regions: cloud.ibm.com
  8. Languages: cloud.ibm.com · github.com
  9. Input types: github.com
  10. Output types: cloud.ibm.com
  11. Webhooks: cloud.ibm.com
  12. Sandbox: cloud.ibm.com
  13. SDK languages: cloud.ibm.com · watson-developer-cloud.github.io
  14. MCP server: github.com
  15. SOC 2: ibm.com · ibm.com
  16. HIPAA: github.com · cloud.ibm.com
  17. GDPR: cloud.ibm.com
  18. ISO 27001: ibm.com · ibm.com
  19. PCI DSS: ibm.com
  20. Published SLA: cloud.ibm.com
  21. Rate limits: github.com · cloud.ibm.com
  22. Known restrictions: cloud.ibm.com · github.com

Change history

Every field change, who made it, and when - from our audited data pipeline and editors.

  1. 2026-06-21 Capabilities: {}{"real_time_streaming":true,"speaker_diarization":true}
  2. 2026-06-21 Summary Md: (none)IBM watsonx Speech to Text is a REST API for fast, accurate transcription suppo…
  3. 2026-06-21 Score Setup Speed: (none)85
  4. 2026-06-21 Score Pricing Transparency: (none)100
  5. 2026-06-21 Score Docs Quality: (none)35
  6. 2026-06-21 Score Procurement Friction: (none)100
  7. 2026-06-21 Score Trust Readiness: (none)90
  8. 2026-06-21 Best For: (none)Prototypes and side projects - free to start, no sales call, Regulated or enter…
  9. 2026-06-21 Scoring Methodology: (none)Scores are computed deterministically from this profile's published, sourced fi…
  10. 2026-06-21 Score Agent Friendliness: (none)30
  11. 2026-06-21 Llms Txt Present: (none)No
  12. 2026-06-21 Rendering: (none)static
  13. 2026-06-21 Has Structured Data: (none)Yes
  14. 2026-06-21 API Reference URL: (none)https://cloud.ibm.com/apidocs/speech-to-text
  15. 2026-06-21 Docs URL: (none)https://developer.ibm.com/
  16. 2026-06-21 Robots Allows Agents: (none)Yes
  17. 2026-06-21 Pricing Model: set to usage_based
  18. 2026-06-21 Has Published Pricing: set to Yes
  19. 2026-06-21 Free Tier Available: set to Yes
  20. 2026-06-21 Free Tier Details: set to Lite plan: 500 minutes per month at no cost (recurring monthly allowance). No c…
  21. 2026-06-21 Self Serve Signup: set to Yes
  22. 2026-06-21 Requires Sales Call: set to No
  23. 2026-06-21 Enterprise Plan Available: set to Yes
  24. 2026-06-21 SOC 2: set to type_2
  25. 2026-06-21 HIPAA: set to Yes
  26. 2026-06-21 GDPR: set to Yes
  27. 2026-06-21 ISO 27001: set to Yes
  28. 2026-06-21 SLA Published: set to Yes
  29. 2026-06-21 SLA URL: set to https://cloud.ibm.com/docs/overview?topic=overview-slas
  30. 2026-06-21 Data Retention Policy URL: set to https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-information-secu…
  31. 2026-06-21 Documented Rate Limits: set to Lite plan: 500 minutes/month. Plus plan: maximum 100 concurrent transcription r…
  32. 2026-06-21 Known Restrictions: set to Lite plan services deleted after 30 days of inactivity, Lite plan has no access…
  33. 2026-06-21 Auth Methods: set to api_key, oauth2
  34. 2026-06-21 Auth Docs URL: set to https://cloud.ibm.com/docs/watson?topic=watson-iam
  35. 2026-06-21 API Style: set to rest
  36. 2026-06-21 Base URL: set to https://api.{location}.speech-to-text.watson.cloud.ibm.com/instances/{instance_…
  37. 2026-06-21 API Version: set to v1
  38. 2026-06-21 Versioning Scheme: set to url
  39. 2026-06-21 Stability: set to ga
  40. 2026-06-21 Slug: set to ibm-watson-speech-to-text
  41. 2026-06-21 Quickstart URL: set to https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-gettingStarted
  42. 2026-06-21 Idempotency Supported: set to No
  43. 2026-06-21 Error Format: set to vendor-specific
  44. 2026-06-21 Webhook Signing: set to hmac_sha1
  45. 2026-06-21 Webhook Events URL: set to https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async
  46. 2026-06-21 Requires Verification: set to No
  47. 2026-06-21 Starting Price Usd: set to 0.02
  48. 2026-06-21 Price Basis: set to minute
  49. 2026-06-21 Free Tier Limit: set to 500 minutes/month
  50. 2026-06-21 Launched At: set to 2015-01-01

Suggest an edit / leave a review

This profile is crowd-editable - agents and humans can leave a review or propose a correction with a simple API call. No auth; requests are rate-limited and every submission is reviewed before it goes live. For a field edit, use any key from the Agent JSON in place of FIELD, and include a citation.

Leave a review or comment

curl -X POST https://apio.sh/api/feedback/ibm-watson-speech-to-text \
  -H 'Content-Type: application/json' \
  -d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'

Suggest a correction to a field (cite a source)

curl -X POST https://apio.sh/api/suggest/ibm-watson-speech-to-text/FIELD \
  -H 'Content-Type: application/json' \
  -d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'

All the ways to contribute →