Product Overview
Speech recognition is an AI-driven technology that transforms audio data into text. By utilizing the speech recognition service of the DolphinVoice platform, you can seamlessly incorporate this feature into your products.
Our speech recognition services are categorized as follows: short speech recognition, real-time speech recognition, audio file transcription (Standard) and audio file transcription (VIP). A brief comparison of each product is shown in the table below:
| Project/Product | Real-Time Speech Recognition | Audio File Transcription (VIP) | Audio File Transcription (Standard) | Short Speech Recognition |
|---|---|---|---|---|
| Function | Recognize streaming audio files and return recognition results in real-time. | Transcribe long audio files into manuscripts or subtitles, and return the results for 1-hour audio data within 5 minutes. | Transcribe long audio files into manuscripts or subtitles, and return the results for 1-hour audio data within 15 minutes. | Recognize a short speech all at once and return the recognition result, or return the result while recognizing. |
| Audio Limitations | Up to 37 hours continuously | Audio: 1GB Duration: 5h | Audio: 1GB Video: 2GB Duration: 5h | 60s |
| Supported Formats | WAV/PCM/MP3 | WAV/PCM/OPUS MP3/AMR/3GP/AAC | WAV/PCM/OPUS MP3/MP4/M4A/AMR/3GP/AAC | WAV/PCM/MP3 |
| Supported Sampling Rates | 16kHz, 8kHz | 16kHz, 8kHz | 16kHz, 8kHz | 16kHz, 8kHz |
| Typical Use Cases | Real-time subtitles | Subtitle generation for videos | Transcription of audio files | Voice Assistant |
For detailed product specifications, please refer to the Developer Guides.