IBM watsonx Speech to Text

"IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics." [1]cloud.ibm.com/docs/speech-to-text“IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics.”ibm.com/products/speech-to-text“IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics.”

Speech-to-Text & Transcription APIs

www.ibm.com/products/speech-to-text · By IBM · Agent JSON · Suggest an edit · Last verified 2026-06-21 · Source confidence: high

IBM watsonx Speech to Text is a REST API for fast, accurate transcription supporting batch, streaming, and WebSocket modes, aimed at customer self-service, call-center analytics, captioning, and accessibility applications. Pricing starts at $0.02 per minute with a 500-minute free tier and no sales call required, scaling to enterprise plans with unlimited concurrency. Deployments are available across seven global regions, SDKs cover Python, Node.js, Java, Swift, and Go, and the service holds SOC 2 Type II, HIPAA, GDPR, and ISO 27001 certifications.

Best for / Avoid if

Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; Teams needing broad API coverage out of the box

Pricing & procurement

Pricing model: Usage-based [2]github.com/ibm-cloud-docs/speech-to-text/blob/master/faq-pricing.md“Plus Plan: Tiered pricing: $0.02 (USD) per minute for 1-999,999 minutes monthly; $0.01 (USD) per minute for 1M+ minutes. No base subscription fee mentioned.”cloud.ibm.com/docs/speech-to-text“For the Plus plan, pricing starts at $0.02 USD per minute for 1 - 999,999 minutes used, and $0.01 USD per minute for 1,000,000+ minutes.”
Published pricing: Yes
Free tier: Yes [3]
Free tier details: Lite plan: 500 minutes per month at no cost (recurring monthly allowance). No customization access on Lite; service deleted after 30 days of inactivity. Plus plan (paid, no base fee): first 1–999,999 minutes at $0.02 USD/minute, 1,000,000+ minutes at $0.01 USD/minute. Premium plan (requires sales contact) includes first 150,000 minutes/month at no charge; pricing beyond that is not publicly disclosed.
Self-serve signup: Yes [4]
Requires sales call: No
Enterprise plan: Yes [5]

Published prices
Plan	Item	Per	Amount	Source
Lite	Speech recognition	500 minutes per month (recurring free allowance)	$0	source
Plus	Speech recognition	minute (1–999,999 minutes/month)	$0.02	source
Plus	Speech recognition	minute (1,000,000+ minutes/month)	$0.01	source
Premium	Speech recognition (first 150,000 minutes/month included)	month (first 150,000 minutes at no charge; beyond that requires sales contact)	$0	source

Capabilities

Real-time streaming
Speaker diarization

Supported actions: transcribe_batch, transcribe_streaming, transcribe_websocket, transcribe_async_http, speaker_diarization, word_timestamps, interim_results, keyword_spotting, word_confidence_scores, smart_formatting, profanity_filtering, custom_language_model, custom_acoustic_model, language_identification, speech_activity_detection, transcript_enrichment, speech_begin_event_detection [6]cloud.ibm.com/docs/speech-to-text“IBM provides the Watson speech-to-text service over several different channels, such as REST, HTTP with webhook callbacks, and WebSockets.”cloud.ibm.com/docs/speech-to-text“Keyword spotting identifies spoken phrases from the audio that match specified keyword strings. Word timestamps reports confidence levels for each word. Profanity filter censors profanity from US English transcriptions.”
Regions: us-south (Dallas), us-east (Washington DC), eu-de (Frankfurt), eu-gb (London), au-syd (Sydney), jp-tok (Tokyo), kr-seo (Seoul) [7]
Languages: English (US), English (UK), English (Australian), English (Indian), French (France), French (Canadian), German, Spanish (Castilian), Spanish (Argentinian), Spanish (Chilean), Spanish (Colombian), Spanish (Mexican), Spanish (Peruvian), Brazilian Portuguese, Japanese, Italian, Dutch, Swedish, Arabic [8]cloud.ibm.com/docs/speech-to-text“The service supports speech recognition for numerous languages including English (US, Australian, Indian, UK dialects), Japanese, French (France, Canadian), German, Spanish (Castilian, Argentinian, Chilean, Colombian, Mexican, Peruvian), Brazilian Portuguese, Dutch, Italian, Swedish, Arabic.”github.com/ibm-cloud-docs/speech-to-text/blob/master/release-notes.md“Large speech model for Italian, it-IT is now generally available (GA) as of May 2026, supporting both 8kHz and 16kHz audio.”
Input types: audio/wav, audio/mp3, audio/mpeg, audio/flac, audio/ogg, audio/ogg;codecs=opus, audio/ogg;codecs=vorbis, audio/webm, audio/webm;codecs=opus, audio/webm;codecs=vorbis, audio/l16, audio/alaw, audio/mulaw, audio/basic, audio/g729, WebSocket streaming, HTTP REST (batch), Asynchronous HTTP with callback URL [9]
Output types: JSON transcript, word timestamps, word confidence scores, speaker labels (diarization), keyword spotting results, interim results, WebVTT captions (via IBM Video Streaming integration) [10]
Webhooks: Yes [11]
Sandbox / test mode: No [12]
SDK languages: Python, Node.js, Java, Swift, Go [13]
MCP server: No [14]

Trust & compliance

SOC 2: SOC 2 Type II [15]ibm.com/products/cloud/compliance/soc-2“IBM Cloud is SOC 2 compliant because it has implemented and maintained robust controls that meet SOC 2 requirements, and it undergoes regular, independent audits. Services issue SOC 2 Type 2 reports at least once each year.”ibm.com/new/announcements/ibm-public-cloud-soc-framework“IBM Public Cloud Services Added to SOC 1 Type 2, SOC 2 Type 2, and SOC 3 Reports”
HIPAA: Yes [16]github.com/ibm-cloud-docs/speech-to-text/blob/master/faq-pricing.md“Premium Plan: HIPAA readiness support. Enterprise features: data isolation, encryption key management, mutual authentication.”cloud.ibm.com/docs/speech-to-text“IBM is committed to providing clients and partners with innovative data privacy, security and governance solutions to assist them on their journey to GDPR compliance. IBM clients who are subject to HIPAA and who wish to use IBM Cloud products for HIPAA regulated data must enter into a Business Associate Agreement (BAA) with IBM.”
GDPR: Yes [17]
ISO 27001: Yes [18]
PCI DSS: Unknown [19]
Published SLA: Yes [20]
Rate limits: Lite plan: 500 minutes/month. Plus plan: maximum 100 concurrent transcription requests. Premium plan: unlimited concurrent transcription requests. No explicit per-request rate limit documented publicly beyond concurrency caps. [21]github.com/ibm-cloud-docs/speech-to-text/blob/master/faq-pricing.md“Maximum of 100 concurrent transcription requests [Plus plan]. Premium plan includes unlimited concurrent transcriptions.”cloud.ibm.com/docs/speech-to-text“Plus version includes unlimited minutes per month and 100 concurrent transcriptions, and the Premium plan includes unlimited minutes per month and unlimited concurrent transcriptions.”
Known restrictions: Lite plan services deleted after 30 days of inactivity, Lite plan has no access to customization (custom language/acoustic models), Standard plan no longer available for new purchases, Smart formatting limited to US English and Spanish, Profanity filter available for US English only, Speaker diarization language support varies by model generation, Audio billed by the minute including silence, Custom model training requires a paid plan (Plus or Premium), Premium plan requires direct IBM sales contact for provisioning [22]cloud.ibm.com/docs/speech-to-text“Smart formatting to control how the engine transcribes numbers and dates for U.S. English and Spanish. Profanity filter censors profanity from US English transcriptions by default.”github.com/ibm-cloud-docs/speech-to-text/blob/master/faq-pricing.md“Lite Plan: No customization access. Services deleted after 30 days of inactivity. Billing: Monthly aggregation rounded to nearest minute; all audio (including silence) counts toward usage totals.”

Developer surface

Docs rendering: static

Integration

API style: rest
Base URL: https://api.{location}.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}
Version: v1
Versioning: url
Stability: ga
Auth methods: api_key, oauth2
Idempotency keys: No
Error format: vendor-specific
Webhook signing: hmac_sha1

SDKs

Python ibm-watson · repo
Node.js ibm-watson · repo
Java com.ibm.watson:ibm-watson · repo
Swift IBMWatsonSpeechToTextV1 · repo
Go github.com/watson-developer-cloud/go-sdk/v2 · repo

Adoption & maturity

Launched: 2015-01-01
GA: 2024-08-23
Notable customers: Citibank, Bradesco, Humana

Other Speech-to-Text & Transcription APIs

ElevenLabs Scribe (Speech to Text)
"Scribe v2 is the most accurate Speech to Text model" offering "real-time Speech to Text in under 150 ms" across "90+ languages."
Hybrid · free tier · public pricing · self-serve
Azure AI Speech to Text
"Azure Speech in Foundry Tools provides speech to text, text to speech, and other capabilities through a Microsoft Foundry resource. You can transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and conduct live AI voice conversations."
Usage · free tier · public pricing · self-serve
Amazon Transcribe
"Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or to add speech-to-text capabilities to any application."
Usage · free tier · public pricing · self-serve
Google Cloud Speech-to-Text
"Accurate voice typing and transcription powered by Gemini."
Usage · free tier · public pricing · self-serve
AssemblyAI
"Voice AI infrastructure for developers building products that transcribe, understand, and act on speech."
Usage · public pricing · self-serve
Speechmatics
"Low-latency speech-to-text for multilingual, multi-speaker conversations."
Usage · free tier · public pricing · self-serve

IBM watsonx Speech to Text alternatives · IBM watsonx Speech to Text vs ElevenLabs Scribe (Speech to Text) · All Speech-to-Text & Transcription APIs APIs

References

Each field above carries a numbered source - hover for a preview, click to jump here.

↑Description: cloud.ibm.com · ibm.com
↑Pricing model: github.com · cloud.ibm.com
↑Free tier: github.com · cloud.ibm.com
↑Self-serve signup: cloud.ibm.com
↑Enterprise plan: github.com
↑Supported actions: cloud.ibm.com · cloud.ibm.com
↑Regions: cloud.ibm.com
↑Languages: cloud.ibm.com · github.com
↑Input types: github.com
↑Output types: cloud.ibm.com
↑Webhooks: cloud.ibm.com
↑Sandbox: cloud.ibm.com
↑SDK languages: cloud.ibm.com · watson-developer-cloud.github.io
↑MCP server: github.com
↑SOC 2: ibm.com · ibm.com
↑HIPAA: github.com · cloud.ibm.com
↑GDPR: cloud.ibm.com
↑ISO 27001: ibm.com · ibm.com
↑PCI DSS: ibm.com
↑Published SLA: cloud.ibm.com
↑Rate limits: github.com · cloud.ibm.com
↑Known restrictions: cloud.ibm.com · github.com

Change history

Every field change, who made it, and when - from our audited data pipeline and editors.

2026-06-21 Capabilities: {} → {"real_time_streaming":true,"speaker_diarization":true}
2026-06-21 Summary Md: (none) → IBM watsonx Speech to Text is a REST API for fast, accurate transcription suppo…
2026-06-21 Score Setup Speed: (none) → 85
2026-06-21 Score Pricing Transparency: (none) → 100
2026-06-21 Score Docs Quality: (none) → 35
2026-06-21 Score Procurement Friction: (none) → 100
2026-06-21 Score Trust Readiness: (none) → 90
2026-06-21 Best For: (none) → Prototypes and side projects - free to start, no sales call, Regulated or enter…
2026-06-21 Scoring Methodology: (none) → Scores are computed deterministically from this profile's published, sourced fi…
2026-06-21 Score Agent Friendliness: (none) → 30
2026-06-21 Llms Txt Present: (none) → No
2026-06-21 Rendering: (none) → static
2026-06-21 Has Structured Data: (none) → Yes
2026-06-21 API Reference URL: (none) → https://cloud.ibm.com/apidocs/speech-to-text
2026-06-21 Docs URL: (none) → https://developer.ibm.com/
2026-06-21 Robots Allows Agents: (none) → Yes
2026-06-21 Pricing Model: set to usage_based
2026-06-21 Has Published Pricing: set to Yes
2026-06-21 Free Tier Available: set to Yes
2026-06-21 Free Tier Details: set to Lite plan: 500 minutes per month at no cost (recurring monthly allowance). No c…
2026-06-21 Self Serve Signup: set to Yes
2026-06-21 Requires Sales Call: set to No
2026-06-21 Enterprise Plan Available: set to Yes
2026-06-21 SOC 2: set to type_2
2026-06-21 HIPAA: set to Yes
2026-06-21 GDPR: set to Yes
2026-06-21 ISO 27001: set to Yes
2026-06-21 SLA Published: set to Yes
2026-06-21 SLA URL: set to https://cloud.ibm.com/docs/overview?topic=overview-slas
2026-06-21 Data Retention Policy URL: set to https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-information-secu…
2026-06-21 Documented Rate Limits: set to Lite plan: 500 minutes/month. Plus plan: maximum 100 concurrent transcription r…
2026-06-21 Known Restrictions: set to Lite plan services deleted after 30 days of inactivity, Lite plan has no access…
2026-06-21 Auth Methods: set to api_key, oauth2
2026-06-21 Auth Docs URL: set to https://cloud.ibm.com/docs/watson?topic=watson-iam
2026-06-21 API Style: set to rest
2026-06-21 Base URL: set to https://api.{location}.speech-to-text.watson.cloud.ibm.com/instances/{instance_…
2026-06-21 API Version: set to v1
2026-06-21 Versioning Scheme: set to url
2026-06-21 Stability: set to ga
2026-06-21 Slug: set to ibm-watson-speech-to-text
2026-06-21 Quickstart URL: set to https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-gettingStarted
2026-06-21 Idempotency Supported: set to No
2026-06-21 Error Format: set to vendor-specific
2026-06-21 Webhook Signing: set to hmac_sha1
2026-06-21 Webhook Events URL: set to https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async
2026-06-21 Requires Verification: set to No
2026-06-21 Starting Price Usd: set to 0.02
2026-06-21 Price Basis: set to minute
2026-06-21 Free Tier Limit: set to 500 minutes/month
2026-06-21 Launched At: set to 2015-01-01

Suggest an edit / leave a review

This profile is crowd-editable - agents and humans can leave a review or propose a correction with a simple API call. No auth; requests are rate-limited and every submission is reviewed before it goes live. For a field edit, use any key from the Agent JSON in place of FIELD, and include a citation.

Leave a review or comment

curl -X POST https://apio.sh/api/feedback/ibm-watson-speech-to-text \
  -H 'Content-Type: application/json' \
  -d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'

Suggest a correction to a field (cite a source)

curl -X POST https://apio.sh/api/suggest/ibm-watson-speech-to-text/FIELD \
  -H 'Content-Type: application/json' \
  -d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'

All the ways to contribute →

Best for / Avoid if

Pricing & procurement

Capabilities

Trust & compliance

Developer surface

Integration

Adoption & maturity

Other Speech-to-Text & Transcription APIs

ElevenLabs Scribe (Speech to Text)

Azure AI Speech to Text

Amazon Transcribe

Google Cloud Speech-to-Text

AssemblyAI

Speechmatics

References

Change history

Suggest an edit / leave a review