Use cases · Speech-to-Text & Transcription APIs

Best Speech Translation APIs

Speech-to-text APIs that transcribe and translate spoken audio into another language in a single call.

Required capability: Speech translation.

Our pick: Azure AI Speech to Text

Azure AI Speech to Text is Microsoft's cloud speech recognition service, offering real-time transcription, batch processing, speaker diarization, pronunciation assessment, and speech translation across more than 30 Azure regions. It starts at $1.00 per hour of audio with a free tier of 5 hours per month, scales via usage-based pricing, and supports self-serve signup with no sales call required. SDKs cover C#, Python, JavaScript, Java, Go, and Objective-C, and the service holds SOC 2 Type II, HIPAA, GDPR, ISO 27001, and PCI DSS certifications.

Best for: Prototypes and side projects - free to start, no sales call; Regulated or enterprise workloads - compliance attestations and an enterprise plan; AI agents and automation - an agent-ready surface (MCP / llms.txt).

Azure AI Speech to Text profile →

Best for…

Best overall
Azure AI Speech to Text - our default pick: strongest across pricing, trust and breadth
Best free pick
Azure AI Speech to Text - free tier: Free (F0) tier: 5 audio hours per month for Standard and Custom Speech to Text (shared; b…
Best for enterprise
Azure AI Speech to Text - for regulated or large teams: SOC 2 Type II, HIPAA, published SLA
Cheapest to start
Rev AI - from $0.0017 minute to start; compare on your real usage, not the entry price
Best for agents
Azure AI Speech to Text - easiest to wire up programmatically: MCP server + llms.txt
Broadest surface
AssemblyAI - 22 documented actions; breadth isn't quality, but it's the most to build on

Ranked (8)

  • #1 Azure AI Speech to Text

    79 / 100
    • Best overall
    • Best free pick
    • Best for enterprise
    • Best for agents

    Azure AI Speech to Text is Microsoft's cloud speech recognition service, offering real-time transcription, batch processing, speaker diarization, pronunciation assessment, and speech translation across more than 30 Azure regions. It starts at $1.00 per hour of audio with a free tier of 5 hours per month, scales via usage-based pricing, and supports self-serve signup with no sales call required. SDKs cover C#, Python, JavaScript, Java, Go, and Objective-C, and the service holds SOC 2 Type II, HIPAA, GDPR, ISO 27001, and PCI DSS certifications.

    PricingUsage · from $1 hour of audio · free tier
    TrustSOC 2 Type II · HIPAA · GDPR · ISO 27001 · PCI DSS
    Does
    • Real-time streaming
    • Speaker diarization
    • Speech translation
    Used byMicrosoft Teams, Microsoft Office 365, Microsoft Edge

    Azure AI Speech to Text profile →

  • #2 AssemblyAI

    79 / 100
    • Broadest surface

    AssemblyAI is a voice AI platform providing speech-to-text transcription, speaker diarization, and audio intelligence features via REST API, aimed at developers building products on top of speech data. Pricing is usage-based at $0.0025 per minute with a $50 one-time free credit requiring no credit card, and enterprise plans are available. The service holds SOC 2 Type II, HIPAA, GDPR, ISO 27001, and PCI DSS certifications, with data processed in the US and EU. Customers include Zoom, Spotify, and Dovetail, and SDKs are actively maintained for Python and Node.js.

    PricingUsage · from $0.0025 minute · free tier
    TrustSOC 2 Type II · HIPAA · GDPR · ISO 27001 · PCI DSS
    Does
    • Real-time streaming
    • Speaker diarization
    • Speech translation
    • Medical transcription
    • PII redaction
    Used byZoom, Spotify, Veed, CallRail
    Avoid ifYou want to try it free before paying

    AssemblyAI profile →

  • #3 Speechmatics

    67 / 100

    Speechmatics is a speech-to-text API supporting batch and real-time transcription across EU, US, and Australia regions, with capabilities including speaker diarization, language detection, translation, summarization, and audio event detection, making it suited for contact centers, legal, medical, and broadcast use cases. Pricing starts at $0.0022 per minute with a free tier of 3,000 minutes per month and self-serve signup, scaling to enterprise plans with dedicated regional endpoints. The API is REST-based with SDK support for Python, Node.js, .NET, and Rust, and holds SOC 2 Type 2, HIPAA, GDPR, and ISO 27001 certifications.

    PricingUsage · from $0.0022 minute · free tier
    TrustSOC 2 Type II · HIPAA · GDPR · ISO 27001
    Does
    • Real-time streaming
    • Speaker diarization
    • Speech translation
    • Medical transcription
    Used bywhat3words, 3Play Media, Veritone, Deloitte UK

    Speechmatics profile →

  • #4 OpenAI Speech-to-Text

    72 / 100

    OpenAI Speech-to-Text is a REST API offering batch, streaming, and real-time audio transcription, speaker diarization, language detection, and translation to English, built on Whisper and newer gpt-4o-based models. It is priced at $0.003 per minute on a self-serve, pay-as-you-go basis with no sales call required, and an enterprise plan is available. The API ships official SDKs for Python, Node.js, Java, Go, Ruby, and .NET, and holds SOC 2 Type II, HIPAA, GDPR, ISO 27001, and PCI DSS certifications.

    PricingUsage · from $0.003 minute · free tier
    TrustSOC 2 Type II · HIPAA · GDPR · ISO 27001 · PCI DSS
    Does
    • Real-time streaming
    • Speaker diarization
    • Speech translation
    Used bySpeak
    Avoid ifYou want to try it free before paying

    OpenAI Speech-to-Text profile →

  • #5 Gladia

    71 / 100

    Gladia is an audio infrastructure API covering batch and real-time speech-to-text transcription, speaker diarization, translation, summarization, sentiment and emotion analysis, and named entity recognition, targeting voice agents, contact centers, meeting assistants, and media captioning workflows. Pricing is usage-based at $0.61 per hour with a free tier of 10 hours per month and no sales call required to start. The API is REST-based with TypeScript, JavaScript, and Python SDKs, webhooks, and an MCP server, and is hosted in EU (France, default) and US regions. Gladia holds SOC 2 Type II, HIPAA, and GDPR compliance, and counts Aircall, Citibank, Samsung, Oracle, and Microsoft among its customers.

    PricingUsage · from $0.61 hour · free tier
    TrustSOC 2 Type II · HIPAA · GDPR
    Does
    • Real-time streaming
    • Speaker diarization
    • Speech translation
    • PII redaction
    Used byAircall, Attention, Recall, VEED

    Gladia profile →

  • #6 Rev AI

    58 / 100
    • Cheapest to start

    Rev AI is a speech-to-text API from Rev, offering both asynchronous batch transcription and real-time streaming, with capabilities including speaker diarization, word timestamps, custom vocabulary, language detection, translation, sentiment analysis, and summarization. Pricing is usage-based at $0.0017 per minute with a 5-hour free tier and self-serve signup, making it accessible without a sales call. SDKs are available for Python, Node.js, Java, and Go, and the service is SOC 2 Type 2 certified, HIPAA compliant, and GDPR compliant, with data residency options in the US and EU.

    PricingUsage · from $0.0017 minute · free tier
    TrustSOC 2 Type II · HIPAA · GDPR
    Does
    • Real-time streaming
    • Speaker diarization
    • Speech translation
    Avoid ifYou want to try it free before paying

    Rev AI profile →

  • #7 Soniox

    68 / 100

    Soniox is a speech-to-text API built for real-time and batch transcription workloads, targeting voice agents, call centers, medical teams, and media producers who need multilingual support, speaker diarization, and word-level timestamps. Pricing is usage-based at $0.0017 per minute with self-serve sign-up and no sales call required, though free credits were discontinued in October 2025. The platform holds SOC 2 Type 2, HIPAA, GDPR, and ISO 27001 certifications, with data residency options across the United States, European Union, and Japan. SDKs are available for Python, Node.js, and browser JavaScript, and an MCP server is also supported.

    PricingUsage · from $0.0017 minute · free tier
    TrustSOC 2 Type II · HIPAA · GDPR · ISO 27001
    Does
    • Real-time streaming
    • Speaker diarization
    • Speech translation
    • Medical transcription
    Used byScribe
    Avoid ifYou want to try it free before paying

    Soniox profile →

  • #8 Groq Speech-to-Text (Whisper)

    71 / 100

    Groq Speech-to-Text runs OpenAI-compatible Whisper endpoints (large-v3 and large-v3-turbo) optimized for speed, supporting batch transcription, word and segment timestamps, language detection, and audio translation to English across dozens of languages. Pricing starts at $0.04 per hour of audio on a self-serve basis, with a generous free tier covering 2,000 requests and 28,800 audio seconds per day. SDKs are available for Python, Node.js, C#, and PHP, an MCP server is available, and the platform holds SOC 2 Type 2, HIPAA, and GDPR certifications with an enterprise plan for higher rate limits.

    PricingUsage · from $0.04 hour of audio · free tier
    TrustSOC 2 Type II · HIPAA · GDPR
    Does
    • Speech translation
    Used byIBM, PGA of America, Stats Perform, GPTZero

    Groq Speech-to-Text (Whisper) profile →

Scope: only APIs with the required capability, picked from published, cited data. The score is one input, not the verdict, and we lead with each one’s trade-off. No reviews yet, no paid placement. See the full Speech-to-Text & Transcription APIs directory.