Product Overview | DolphinVoice's Docs

Speech recognition is an AI-driven technology that transforms audio data into text. By utilizing the speech recognition service of the DolphinVoice platform, you can seamlessly incorporate this feature into your products.

Our speech recognition services are categorized as follows: short speech recognition, real-time speech recognition, audio file transcription (Standard) and audio file transcription (VIP). A brief comparison of each product is shown in the table below:

Project/Product	Real-Time Speech Recognition	Audio File Transcription (VIP)	Audio File Transcription (Standard)	Short Speech Recognition
Function	Recognize streaming audio files and return recognition results in real-time.	Transcribe long audio files into manuscripts or subtitles, and return the results for 1-hour audio data within 5 minutes.	Transcribe long audio files into manuscripts or subtitles, and return the results for 1-hour audio data within 15 minutes.	Recognize a short speech all at once and return the recognition result, or return the result while recognizing.
Audio Limitations	Up to 37 hours continuously	Audio: 1GB Duration: 5h	Audio: 1GB Video: 2GB Duration: 5h	60s
Supported Formats	WAV/PCM/MP3	WAV/PCM/OPUS MP3/AMR/M4A/AAC	WAV/PCM/OPUS MP3/MP4/M4A/AMR/3GP/AAC	WAV/PCM/MP3
Supported Sampling Rates	16kHz, 8kHz	16kHz, 8kHz	16kHz, 8kHz	16kHz, 8kHz
Typical Use Cases	Real-time subtitles	Subtitle generation for videos	Transcription of audio files	Voice Assistant

For detailed product specifications, please refer to the Developer Guides.