Logo

Product Overview

Speech recognition is an AI-driven technology that transforms audio data into text. By utilizing the speech recognition service of the DolphinVoice platform, you can seamlessly incorporate this feature into your products.

Our speech recognition services are categorized as follows: short speech recognition, real-time speech recognition, audio file transcription (Standard) and audio file transcription (VIP). A brief comparison of each product is shown in the table below:

Project/ProductReal-Time Speech RecognitionAudio File Transcription (VIP)Audio File Transcription (Standard)Short Speech Recognition
FunctionRecognize streaming audio files and return recognition results in real-time.Transcribe long audio files into manuscripts or subtitles, and return the results for 1-hour audio data within 5 minutes.Transcribe long audio files into manuscripts or subtitles, and return the results for 1-hour audio data within 15 minutes.Recognize a short speech all at once and return the recognition result, or return the result while recognizing.
Audio LimitationsUp to 37 hours continuouslyAudio: 1GB
Duration: 5h
Audio: 1GB
Video: 2GB
Duration: 5h
60s
Supported FormatsWAV/PCM/MP3WAV/PCM/OPUS
MP3/AMR/3GP/AAC
WAV/PCM/OPUS
MP3/MP4/M4A/AMR/3GP/AAC
WAV/PCM/MP3
Supported Sampling Rates16kHz, 8kHz16kHz, 8kHz16kHz, 8kHz16kHz, 8kHz
Typical Use CasesReal-time subtitlesSubtitle generation for videosTranscription of audio filesVoice Assistant

For detailed product specifications, please refer to the Developer Guides.