OpenAI Speech-to-Text

"The Audio API provides two speech-to-text endpoints, transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model." [1]

platform.openai.com/docs/guides/speech-to-text · By OpenAI · Agent JSON · Suggest an edit · Last verified 2026-06-21 · Source confidence: high

OpenAI Speech-to-Text is a REST API offering batch, streaming, and real-time audio transcription, speaker diarization, language detection, and translation to English, built on Whisper and newer gpt-4o-based models. It is priced at $0.003 per minute on a self-serve, pay-as-you-go basis with no sales call required, and an enterprise plan is available. The API ships official SDKs for Python, Node.js, Java, Go, Ruby, and .NET, and holds SOC 2 Type II, HIPAA, GDPR, ISO 27001, and PCI DSS certifications.

Best for / Avoid if

Best for: Regulated or enterprise workloads - compliance attestations and an enterprise plan; AI agents and automation - an agent-ready surface (MCP / llms.txt); Teams needing broad API coverage out of the box

Avoid if: You want to try it free before paying

Pricing & procurement

Pricing model
Usage-based [2]
Published pricing
Yes [3]
Free tier
No [4]
Self-serve signup
Yes
Requires sales call
No
Enterprise plan
Yes [5]
Published prices
ItemPerAmountSource
whisper-1 transcriptionminute$0.006source
gpt-4o-transcribe audio input1M tokens$2.5source
gpt-4o-transcribe text output1M tokens$10source
gpt-4o-transcribe estimated costminute$0.006source
gpt-4o-mini-transcribe audio input (snapshots: gpt-4o-mini-transcribe-2025-12-15, gpt-4o-mini-transcribe-2025-03-20)1M tokens$1.25source
gpt-4o-mini-transcribe text output1M tokens$5source
gpt-4o-mini-transcribe estimated costminute$0.003source
gpt-4o-transcribe-diarize audio input (speaker diarization)1M tokens$2.5source
gpt-4o-transcribe-diarize text output (speaker diarization)1M tokens$10source
gpt-4o-transcribe-diarize estimated cost (speaker diarization)minute$0.006source
gpt-realtime-whisper streaming transcription (audio duration)minute$0.017source
gpt-realtime-translate streaming speech translation (audio duration)minute$0.034source

Capabilities

  • Real-time streaming
  • Speaker diarization
  • Speech translation
Supported actions
transcribe_batch, transcribe_streaming, transcribe_realtime, translation_to_english, speaker_diarization, word_timestamps, segment_timestamps, language_detection, prompting_for_accuracy, logprobs_confidence_scoring, voice_activity_detection [6]
Languages
99+ languages including Afrikaans, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bosnian, Breton, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, German, Greek, Gujarati, Haitian Creole, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Myanmar, Nepali, Norwegian, Nynorsk, Occitan, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, Yoruba [7]
Input types
audio/mp3, audio/mp4, audio/mpeg, audio/mpga, audio/m4a, audio/wav, audio/webm, WebSocket (realtime streaming), WebRTC (realtime browser) [8]
Output types
JSON, plain text, SRT, VTT, verbose JSON, diarized JSON, word timestamps, segment timestamps, streaming transcript deltas [9]
Webhooks
No [10]
Sandbox / test mode
No [11]
SDK languages
Python, Node.js, Java, Go, Ruby, .NET [12]
MCP server
Yes [13]

Trust & compliance

SOC 2
SOC 2 Type II [14]
HIPAA
Yes [15]
GDPR
Yes [16]
ISO 27001
Yes [17]
PCI DSS
Yes [18]
Published SLA
Yes [19]
Rate limits
whisper-1: Free 3 RPM / 200 RPD; Tier 1: 500 RPM; Tier 2: 2,500 RPM; Tier 3: 5,000 RPM; Tier 4: 7,500 RPM; Tier 5: 10,000 RPM. gpt-4o-transcribe / gpt-4o-transcribe-diarize: Tier 1: 500 RPM / 10K TPM; Tier 2: 2,000 RPM / 100K TPM; Tier 3: 5,000 RPM / 400K TPM; Tier 4: 10,000 RPM / 2M TPM; Tier 5: 10,000 RPM / 6M TPM. gpt-4o-mini-transcribe: Tier 1: 500 RPM / 50K TPM; Tier 2: 2,000 RPM / 150K TPM; Tier 3: 5,000 RPM / 600K TPM; Tier 4: 10,000 RPM / 2M TPM; Tier 5: 10,000 RPM / 8M TPM. gpt-realtime-whisper: Tier 1: 100 min/min; Tier 2: 350 min/min; Tier 3: 650 min/min; Tier 4: 1,000 min/min; Tier 5: 1,300 min/min. [20]
Known restrictions
Maximum file upload size: 25 MB, Translation endpoint outputs English only (whisper-1 only; not available on gpt-4o-transcribe models), Speaker diarization (gpt-4o-transcribe-diarize) requires chunking_strategy for audio longer than 30 seconds, gpt-4o-transcribe-diarize does not support prompts, logprobs, or timestamp_granularities[], Prompt steering not supported for gpt-realtime-whisper in realtime sessions, Context window: 16,000 tokens; max output: 2,000 tokens (gpt-4o-transcribe models), gpt-4o-transcribe and gpt-4o-mini-transcribe output JSON or plain text only (not SRT/VTT) [21]

Developer surface

Docs rendering: static · markdown variants served

Integration

API style
rest
Base URL
https://api.openai.com/v1
Version
v1
Versioning
url
Stability
ga
Auth methods
api_key
Error format
vendor-specific
Webhook signing
hmac_sha256
Rate limit
500 / minute

SDKs

  • Python openai · repo
  • Node.js openai · repo
  • Java com.openai:openai-java · repo
  • Go github.com/openai/openai-go · repo
  • Ruby openai · repo
  • .NET OpenAI

Adoption & maturity

Launched
2023-03-01
GA
2025-04-01
Notable customers
Speak

Other Speech-to-Text & Transcription APIs

  • ElevenLabs Scribe (Speech to Text)

    "Scribe v2 is the most accurate Speech to Text model" offering "real-time Speech to Text in under 150 ms" across "90+ languages."

    Hybrid · free tier · public pricing · self-serve

  • Azure AI Speech to Text

    "Azure Speech in Foundry Tools provides speech to text, text to speech, and other capabilities through a Microsoft Foundry resource. You can transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and conduct live AI voice conversations."

    Usage · free tier · public pricing · self-serve

  • Amazon Transcribe

    "Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or to add speech-to-text capabilities to any application."

    Usage · free tier · public pricing · self-serve

  • Google Cloud Speech-to-Text

    "Accurate voice typing and transcription powered by Gemini."

    Usage · free tier · public pricing · self-serve

  • IBM watsonx Speech to Text

    "IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics."

    Usage · free tier · public pricing · self-serve

  • AssemblyAI

    "Voice AI infrastructure for developers building products that transcribe, understand, and act on speech."

    Usage · public pricing · self-serve

OpenAI Speech-to-Text alternatives · OpenAI Speech-to-Text vs ElevenLabs Scribe (Speech to Text) · All Speech-to-Text & Transcription APIs APIs

References

Change history

Every field change, who made it, and when - from our audited data pipeline and editors.

  1. 2026-06-21 Summary Md: (none)OpenAI Speech-to-Text is a REST API offering batch, streaming, and real-time au…
  2. 2026-06-21 Summary Md: OpenAI Speech-to-Text offers transcription and translation via two model famili…(none)
  3. 2026-06-21 Score Trust Readiness: 90100
  4. 2026-06-21 Supported Languages: Afrikaans, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bosnian, Breton, …99+ languages including Afrikaans, Arabic, Armenian, Azerbaijani, Basque, Belar…
  5. 2026-06-21 Input Types: audio/mp3, audio/mp4, audio/mpeg, audio/mpga, audio/m4a, audio/wav, audio/webm,…audio/mp3, audio/mp4, audio/mpeg, audio/mpga, audio/m4a, audio/wav, audio/webm,…
  6. 2026-06-21 Output Types: JSON, plain text, SRT, VTT, verbose JSON, diarized JSON, word timestamps, segme…JSON, plain text, SRT, VTT, verbose JSON, diarized JSON, word timestamps, segme…
  7. 2026-06-21 PCI DSS: NoYes
  8. 2026-06-21 SDK Packages: Python, Node.js, Java, Go, Ruby, .NETPython, Node.js, Java, Go, Ruby, .NET
  9. 2026-06-21 Name: OpenAI Speech-to-Text (gpt-4o-transcribe / Whisper API)OpenAI Speech-to-Text
  10. 2026-06-21 Supported Actions: transcribe_batch, transcribe_streaming, transcribe_realtime, translation, speak…transcribe_batch, transcribe_streaming, transcribe_realtime, translation_to_eng…
  11. 2026-06-21 Known Restrictions: Maximum file upload size: 25 MB, Translation endpoint outputs English only (whi…Maximum file upload size: 25 MB, Translation endpoint outputs English only (whi…
  12. 2026-06-21 Documented Rate Limits: whisper-1: Free tier 3 RPM / 200 RPD; Tier 1: 500 RPM; Tier 2: 2,500 RPM; Tier …whisper-1: Free 3 RPM / 200 RPD; Tier 1: 500 RPM; Tier 2: 2,500 RPM; Tier 3: 5,…
  13. 2026-06-21 Fields Not Found: supported_regions (no explicit data residency regions listed for the STT API sp…supported_regions (no explicit data residency regions listed for the STT API), …
  14. 2026-06-21 Starting Price Usd: 0.0030.003
  15. 2026-06-21 Capabilities: {}{"translation":true,"real_time_streaming":true,"speaker_diarization":true}
  16. 2026-06-21 Summary Md: (none)OpenAI Speech-to-Text offers transcription and translation via two model famili…
  17. 2026-06-21 Scoring Methodology: (none)Scores are computed deterministically from this profile's published, sourced fi…
  18. 2026-06-21 Score Agent Friendliness: (none)50
  19. 2026-06-21 Score Pricing Transparency: (none)85
  20. 2026-06-21 Score Setup Speed: (none)60
  21. 2026-06-21 Score Docs Quality: (none)50
  22. 2026-06-21 Score Procurement Friction: (none)85
  23. 2026-06-21 Score Trust Readiness: (none)90
  24. 2026-06-21 Best For: (none)Regulated or enterprise workloads - compliance attestations and an enterprise p…
  25. 2026-06-21 Avoid If: (none)You want to try it free before paying
  26. 2026-06-21 Llms Txt Present: (none)No
  27. 2026-06-21 Docs URL: (none)https://developers.openai.com/api/docs
  28. 2026-06-21 Markdown Docs URL: (none)https://platform.openai.com/docs/guides/speech-to-text.md
  29. 2026-06-21 Markdown Docs Served: (none)Yes
  30. 2026-06-21 API Reference URL: (none)https://platform.openai.com/api/reference/overview
  31. 2026-06-21 Robots Allows Agents: (none)Yes
  32. 2026-06-21 Has Structured Data: (none)No
  33. 2026-06-21 Rendering: (none)static
  34. 2026-06-21 Known Restrictions: set to Maximum file upload size: 25 MB, Translation endpoint outputs English only (whi…
  35. 2026-06-21 Auth Methods: set to api_key
  36. 2026-06-21 Auth Docs URL: set to https://developers.openai.com/api/docs/quickstart
  37. 2026-06-21 API Style: set to rest
  38. 2026-06-21 Base URL: set to https://api.openai.com/v1
  39. 2026-06-21 API Version: set to v1
  40. 2026-06-21 Versioning Scheme: set to url
  41. 2026-06-21 Stability: set to ga
  42. 2026-06-21 Deprecation Policy URL: set to https://developers.openai.com/api/docs/deprecations
  43. 2026-06-21 MCP URL: set to https://developers.openai.com/mcp
  44. 2026-06-21 Quickstart URL: set to https://developers.openai.com/api/docs/guides/speech-to-text
  45. 2026-06-21 Error Format: set to vendor-specific
  46. 2026-06-21 Webhook Signing: set to hmac_sha256
  47. 2026-06-21 Webhook Events URL: set to https://developers.openai.com/api/docs/guides/webhooks
  48. 2026-06-21 Requires Verification: set to No
  49. 2026-06-21 Starting Price Usd: set to 0.003
  50. 2026-06-21 Price Basis: set to minute

Suggest an edit / leave a review

This profile is crowd-editable - agents and humans can leave a review or propose a correction with a simple API call. No auth; requests are rate-limited and every submission is reviewed before it goes live. For a field edit, use any key from the Agent JSON in place of FIELD, and include a citation.

Leave a review or comment

curl -X POST https://apio.sh/api/feedback/openai-transcribe \
  -H 'Content-Type: application/json' \
  -d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'

Suggest a correction to a field (cite a source)

curl -X POST https://apio.sh/api/suggest/openai-transcribe/FIELD \
  -H 'Content-Type: application/json' \
  -d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'

All the ways to contribute →