Voicegain

"Voice AI under your Control" - Build AI Voice Agents and Voice AI Apps with Speech-to-Text and LLM APIs, deployable in datacenter or cloud. [1]

Speech-to-Text & Transcription APIs

www.voicegain.ai · By Voicegain · Agent JSON · Suggest an edit · Last verified 2026-06-21 · Source confidence: high

Voicegain is a speech-to-text and voice AI platform aimed at contact centers, healthcare payers, and enterprises that need telephony transcription, PII/PCI redaction, real-time agent assist, and custom ASR model training. Pricing starts at $0.0015 per minute on a pay-as-you-go basis, with a $50 one-time signup credit and no credit card required; on-premise and private-cloud deployments are available but require an annual commitment. The platform holds SOC 2 Type 2, HIPAA, GDPR, and PCI DSS certifications, and customers include Aetna, Samsung, and Sutherland.

Best for / Avoid if

Best for: Regulated or enterprise workloads - compliance attestations and an enterprise plan; Teams needing broad API coverage out of the box; Cost-sensitive teams - low, transparent entry price

Avoid if: You want to try it free before paying

Pricing & procurement

Pricing model: Usage-based [2]
Published pricing: Yes [3]
Free tier: No [4]
Free tier details: $50 one-time credit on signup, no credit card required (not a recurring free allowance)
Self-serve signup: Yes
Requires sales call: No
Enterprise plan: Yes [5]
Minimum commitment: Edge/on-premise deployment requires annual commitment and minimum port purchase

Published prices
Plan	Item	Per	Amount	Source
Pay As You Go	STT Offline Basic (mono-channel, no diarization)	second	$0	source
Pay As You Go	STT Offline Basic (mono-channel, no diarization)	minute	$0.0015	source
Pay As You Go	STT Offline Basic (mono-channel, no diarization)	hour	$0.09	source
Pay As You Go	STT Offline Enhanced (two-channel call center, diarization, PII redaction)	second	$0.0001	source
Pay As You Go	STT Offline Enhanced (two-channel call center, diarization, PII redaction)	minute	$0.003	source
Pay As You Go	STT Offline Enhanced (two-channel call center, diarization, PII redaction)	hour	$0.18	source
Pay As You Go	STT Realtime Basic (streaming transcription)	second	$0.0001	source
Pay As You Go	STT Realtime Basic (streaming transcription)	minute	$0.003	source
Pay As You Go	STT Realtime Basic (streaming transcription)	hour	$0.18	source
Pay As You Go	STT Realtime Enhanced / MRCP ASR	second	$0.0001	source
Pay As You Go	STT Realtime Enhanced / MRCP ASR	minute	$0.0054	source
Pay As You Go	STT Realtime Enhanced / MRCP ASR	hour	$0.324	source
Edge Deployment	STT Offline Enhanced & Multi-channel - port-based license	port/month	$60	source
Edge Deployment	STT Offline Enhanced & Multi-channel - usage-based license	audio hour	$0.16	source
Edge Deployment	STT Realtime Transcription - port-based license	port/month	$72	source
Edge Deployment	STT Realtime Transcription - usage-based license	audio hour	$0.2	source
Edge Deployment	MRCP ASR Tier 1 - port-based license	port/month	$40	source
Edge Deployment	MRCP ASR Tier 2 - port-based license	port/month	$70	source
Voice Agent Platform	AI Voice Agent Standard (Voicegain STT + Standard TTS + SIP Stack + LLM integration)	minute	$0.04	source
Voice Agent Platform	AI Voice Agent Premium (Premium Neural TTS + Voicegain STT + SIP Stack + LLM integration)	minute	$0.06	source

Capabilities

Real-time streaming
Speaker diarization
PII redaction
Self-hosted option

Supported actions: transcribe_batch, transcribe_streaming, speaker_diarization, word_timestamps, language_detection, sentiment_analysis, named_entity_recognition, keyword_extraction, intent_classification, pii_redaction, pci_redaction, custom_model_training, telephony_bot_api, mrcp_asr, speech_analytics, call_summarization, real_time_agent_assist, automated_qa [6]voicegain.ai/products“PII redaction, Speaker diarization, 99 languages support, Real-time agent assist (AI co-pilot), Automated QA and coaching, Voice-of-Customer analytics”voicegain.ai/speech-to-text-apis“Batch processing: Process audio 100x faster than real-time. Streaming: WebSocket support plus telephony protocols (SIPREC, MRCP). Speaker Diarization. PII/PCI redaction capabilities. Sentiment and emotion analysis.”
Regions: US (Google Cloud), AWS VPC, Azure VPC, IBM Cloud VPC, Oracle VPC, on-premise datacenter [7]
Languages: English, Spanish, Hindi, German, Portuguese (Alpha early access), Polish (Alpha early access), Korean (Alpha early access), Dutch (Alpha early access), Ukrainian (Alpha early access), French (coming soon), Arabic (coming soon), Italian (coming soon), 50+ languages for batch transcription via Whisper API [8]voicegain.ai/post/new-languages-available-in-voicegain-speech-to-text“Generally Available: English, Spanish, Hindi, German. Alpha Early Access: Portuguese, Polish, Korean, Dutch, Ukrainian. Coming Soon: French, Arabic, Italian”support.voicegain.ai/hc/en-us/articles/6535757706772-Languages-supported-in-Voicegain-Speech-to-Text“Voicegain supports over 50 languages for batch transcription, while for streaming, they support English and Spanish”
Input types: audio file upload (40+ formats via ffmpeg for batch), WebSocket streaming (L16 linear PCM 16-bit mono, F32 linear PCM 32-bit floating point mono), RTP audio stream, SIPREC, MRCP, stereo/two-channel audio
Output types: JSON transcript, word-level timestamps, speaker diarization labels, sentiment scores, named entities, keywords, intent labels, call summaries, redacted transcript, redacted audio
Webhooks: Yes [9]
Sandbox / test mode: No [10]
SDK languages: Python [11]
MCP server: No

Trust & compliance

SOC 2: SOC 2 Type II [12]
HIPAA: Yes [13]
GDPR: Yes [14]
ISO 27001: No [15]
PCI DSS: Yes [16]
Published SLA: Yes [17]
Rate limits: 4 concurrent/simultaneous requests or 4 hours of audio-processing per hour (standard pay-as-you-go). API Request Limit: 75 requests per minute (fixed 1-minute window). Higher limits available with volume/term commitments. [18]
Known restrictions: Streaming supports English and Spanish only (batch supports 50+ languages via Whisper), Real-time transcription input limited to L16 and F32 PCM audio formats, Whisper API is batch-only (no real-time/streaming), Minimum billing of 6 seconds per API request, then 1-second increments, Alpha/early-access language models initially available in offline/batch mode only, Edge/on-premise deployment requires annual commitment and minimum port purchase

Developer surface

Docs rendering: static

Integration

API style: rest
Base URL: https://api.voicegain.ai/v1
Version: v1
Versioning: url
Stability: ga
Auth methods: jwt
Rate limit: 4 / concurrent

SDKs

Python voicegain-speech · repo

Adoption & maturity

Launched: 2019-01-01
Notable customers: Sutherland, Samsung, Aetna, LevelAI, Onvisource, Hammer

Other Speech-to-Text & Transcription APIs

ElevenLabs Scribe (Speech to Text)
"Scribe v2 is the most accurate Speech to Text model" offering "real-time Speech to Text in under 150 ms" across "90+ languages."
Hybrid · free tier · public pricing · self-serve
Azure AI Speech to Text
"Azure Speech in Foundry Tools provides speech to text, text to speech, and other capabilities through a Microsoft Foundry resource. You can transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and conduct live AI voice conversations."
Usage · free tier · public pricing · self-serve
Amazon Transcribe
"Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or to add speech-to-text capabilities to any application."
Usage · free tier · public pricing · self-serve
Google Cloud Speech-to-Text
"Accurate voice typing and transcription powered by Gemini."
Usage · free tier · public pricing · self-serve
IBM watsonx Speech to Text
"IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics."
Usage · free tier · public pricing · self-serve
AssemblyAI
"Voice AI infrastructure for developers building products that transcribe, understand, and act on speech."
Usage · public pricing · self-serve

Voicegain alternatives · Voicegain vs ElevenLabs Scribe (Speech to Text) · All Speech-to-Text & Transcription APIs APIs

References

Each field above carries a numbered source - hover for a preview, click to jump here.

↑Description: voicegain.ai
↑Pricing model: voicegain.ai
↑Published pricing: voicegain.ai
↑Free tier: voicegain.ai
↑Enterprise plan: voicegain.ai
↑Supported actions: voicegain.ai · voicegain.ai
↑Regions: voicegain.ai
↑Languages: voicegain.ai · support.voicegain.ai
↑Webhooks: voicegain.ai
↑Sandbox: voicegain.ai
↑SDK languages: voicegain.ai
↑SOC 2: voicegain.ai · voicegain.ai
↑HIPAA: voicegain.ai · voicegain.ai
↑GDPR: voicegain.ai
↑ISO 27001: voicegain.ai
↑PCI DSS: voicegain.ai · voicegain.ai
↑Published SLA: voicegain.ai
↑Rate limits: voicegain.ai · support.voicegain.ai

Change history

Every field change, who made it, and when - from our audited data pipeline and editors.

2026-06-21 Capabilities: {} → {"self_hosted":true,"pii_redaction":true,"real_time_streaming":true,"speaker_di…
2026-06-21 Summary Md: (none) → Voicegain is a speech-to-text and voice AI platform aimed at contact centers, h…
2026-06-21 Score Setup Speed: (none) → 50
2026-06-21 Score Docs Quality: (none) → 35
2026-06-21 Score Procurement Friction: (none) → 85
2026-06-21 Score Trust Readiness: (none) → 85
2026-06-21 Best For: (none) → Regulated or enterprise workloads - compliance attestations and an enterprise p…
2026-06-21 Avoid If: (none) → You want to try it free before paying
2026-06-21 Scoring Methodology: (none) → Scores are computed deterministically from this profile's published, sourced fi…
2026-06-21 Score Agent Friendliness: (none) → 20
2026-06-21 Score Pricing Transparency: (none) → 85
2026-06-21 Docs URL: (none) → https://www.voicegain.ai/api
2026-06-21 Rendering: (none) → static
2026-06-21 Has Structured Data: (none) → No
2026-06-21 Llms Txt Present: (none) → No
2026-06-21 API Reference URL: (none) → https://www.voicegain.ai/api
2026-06-21 Robots Allows Agents: (none) → Yes
2026-06-21 Pricing Model: set to usage_based
2026-06-21 Has Published Pricing: set to Yes
2026-06-21 Free Tier Available: set to No
2026-06-21 Free Tier Details: set to $50 one-time credit on signup, no credit card required (not a recurring free al…
2026-06-21 Minimum Commitment: set to Edge/on-premise deployment requires annual commitment and minimum port purchase
2026-06-21 Self Serve Signup: set to Yes
2026-06-21 Requires Sales Call: set to No
2026-06-21 Enterprise Plan Available: set to Yes
2026-06-21 SOC 2: set to type_2
2026-06-21 HIPAA: set to Yes
2026-06-21 GDPR: set to Yes
2026-06-21 ISO 27001: set to No
2026-06-21 PCI DSS: set to Yes
2026-06-21 SLA Published: set to Yes
2026-06-21 SLA URL: set to https://www.voicegain.ai/post/voicegain-introduces-industry-first-relative-spee…
2026-06-21 Data Retention Policy URL: set to https://www.voicegain.ai/privacy-policy
2026-06-21 Documented Rate Limits: set to 4 concurrent/simultaneous requests or 4 hours of audio-processing per hour (sta…
2026-06-21 Rate Limit Requests: set to 4
2026-06-21 Rate Limit Window: set to concurrent
2026-06-21 Known Restrictions: set to Streaming supports English and Spanish only (batch supports 50+ languages via W…
2026-06-21 Auth Methods: set to jwt
2026-06-21 Auth Docs URL: set to https://console.voicegain.ai/api/v1/index.html
2026-06-21 API Style: set to rest
2026-06-21 API Version: set to v1
2026-06-21 Versioning Scheme: set to url
2026-06-21 Stability: set to ga
2026-06-21 Quickstart URL: set to https://www.voicegain.ai/trial
2026-06-21 Requires Verification: set to No
2026-06-21 Starting Price Usd: set to 0.0015
2026-06-21 Price Basis: set to minute
2026-06-21 Free Tier Limit: set to $50 one-time credit on signup, no credit card required
2026-06-21 Launched At: set to 2019-01-01
2026-06-21 Notable Customers: set to Sutherland, Samsung, Aetna, LevelAI, Onvisource, Hammer

Suggest an edit / leave a review

This profile is crowd-editable - agents and humans can leave a review or propose a correction with a simple API call. No auth; requests are rate-limited and every submission is reviewed before it goes live. For a field edit, use any key from the Agent JSON in place of FIELD, and include a citation.

Leave a review or comment

curl -X POST https://apio.sh/api/feedback/voicegain \
  -H 'Content-Type: application/json' \
  -d '{"kind":"review","rating":5,"body":"Your experience with this API…"}'

Suggest a correction to a field (cite a source)

curl -X POST https://apio.sh/api/suggest/voicegain/FIELD \
  -H 'Content-Type: application/json' \
  -d '{"value":"corrected value","citations":[{"url":"https://source.example/page","excerpt":"supporting quote"}],"note":"what changed and why"}'

All the ways to contribute →

Best for / Avoid if

Pricing & procurement

Capabilities

Trust & compliance

Developer surface

Integration

Adoption & maturity

Other Speech-to-Text & Transcription APIs

ElevenLabs Scribe (Speech to Text)

Azure AI Speech to Text

Amazon Transcribe

Google Cloud Speech-to-Text

IBM watsonx Speech to Text

AssemblyAI

References

Change history

Suggest an edit / leave a review