FAQs
1 General Questions
1. How to apply for a DolphinVoice account?
Navigate to the DolphinVoice website and click on the login button to register. For detailed steps, please refer to the Quick Start.
2. How to create a project?
Once your account application is successful, a project is automatically created by default. If you need a new project, you can create it by clicking the New Project button.
3. Where to check the AppID and AppSecret?
Please refer to the Connection URL on the DolphinVoice User Center.
4. Will the existing Token become invalid upon re-obtaining a new one?
Re-obtaining the Token does not affect the validity of the Token you already have. The validity of the Token is related only to the valid time period.
5. How long is the default validity period for the Token?
The default validity period for the Token is 7 days.
2 Short Speech Recognition/Real-Time Speech Recognition
1. What is the difference between RESTful API and WebSocket connections?
In the case of RESTful, the service only returns a single recognition result after the user's speech is finished. In contrast, with WebSocket, recognition results are returned while the user is still speaking, and many intermediate results are returned before the final recognition result.
2. What is the difference between WebSocket/HTTP APIs and SDK?
Recognition engine API: For WebSocket and HTTP APIs, developers need to integrate code to develop applications and call the engine.
SDK: The package includes the engine recognition function and its interface, making it easier for developers to develop.
3. What programming languages does the SDK support?
Real-time speech recognition uses the WS protocol, and short speech recognition supports both WS and HTTP protocols. WS protocol integration with the engine is more challenging, so SDK support is provided.
Types of SDK: Python, Android, iOS, H5/JS.
4. If an error occurs during SDK development and it does not work properly, what should I do?
First, please try our official demo. Once the demo works correctly, then add your own code, which should function normally.
5. How long is the response time?
The response time for recognition results is ≤500ms.
6. Which languages are supported?
The supported languages include Japanese, English, Chinese, and others.
For details, please refer to the Developer Guides.
7. What sampling rates and bit depths are supported?
Default sampling rates: 16000 Hz, 8000Hz.
Other sampling rates: If the original audio sampling rate is known, you can use ffmpeg to convert the sampling rate.
8. What audio formats are supported?
Short speech recognition & real-time speech recognition: WAV, PCM, MP3, OPUS formats are supported.
9. What audio channels does speech recognition support?
A channel refers to an independent audio signal that is captured or played back at different spatial positions during recording or playback. Therefore, the number of channels is equivalent to the number of sound sources during recording or the corresponding number of speakers during playback. Except for the transcription of audio files, all other speech recognition services currently only support single-channel (mono) audio.
10. What fields does speech recognition support?
Currently, speech recognition supports two primary domains: the general field and the call center field. For the general field, the supported sampling rate is 16000Hz, while for the call center field, it is 8000Hz.
11. Is it necessary to send audio data continuously?
Audio data must be sent continuously. If the server does not receive audio data within a certain period, it will timeout and return an error message. To send data again, the client needs to initiate a new request.
12. Why can I still receive data from the server after the audio data transmission is interrupted?
After the audio data transmission is interrupted due to non-continuity and timing out, if the server still has unprocessed data from before, it will continue to return the recognition results for that data.
3 Audio Transcription
1. What SDK versions are available?
The audio file transcription uses an HTTP protocol interface, which is convenient to call. The SDK is not provided.
2. What programming languages are supported?
Java, Python, C++, C, C#, H5/JS, etc.
3. What is the output format of the transcription result?
It supports two formats: manuscript and subtitle. You can choose the appropriate format according to your needs.
4. What audio file formats are supported for audio transcription?
.wav/.mp3/.opus/.pcm/.amr/.3gp/.aac formats are supported.
5. What are the upper limits of file duration and file size?
The duration limit is 5 hours. The size of the audio file must be less than 1GB.
6. Are there timestamps in the transcription results?
Audio processing steps:
(1) Firstly, the audio is segmented into small pieces using a 450ms VAD (Voice Activity Detection) threshold, then sentences are defined to be between 5 to 30 seconds, and audio is combined or forcibly cut according to the slices and tasks, followed by synchronous transcription.
(2) Return the transcription results, which include the start and end times of an audio segment, with precision to the millisecond.
7. Is speaker diarization supported?
Yes, our platform supports speaker diarization through either the number of audio channels or speaker diarization technology, but the two methods cannot be used simultaneously.
(1) Based on the number of channels: Please ensure that the uploaded file must have two or more channels, and text content from the same channel number is attributed to one speaker.
(2) Based on the speaker diarization technology: The uploaded audio files are analyzed. Distinguish speaker IDs and corresponding start and end times based on voiceprint information.
8. How is speaker diarization done in subtitle mode?
Currently, speaker diarization is not supported in subtitle mode. It is only possible when the output format is manuscript.
9. What are the differences between Audio File Transcription Standard and VIP?
(1) The supported file formats differ:
Standard: wav, mp3, wma, mp4, pcm, m4a, amr, 3gp, aac.
VIP: wav, mp3, wma, pcm, amr, 3gp, aac.
(2) The requirements for file size differ:
Standard: The size of audio files should not exceed 1GB; the size of video files should not exceed 2GB.
VIP: The size of audio files should not exceed 1GB.
(3) The speed of returning results differs:
Standard: For a 1-hour audio file, results are returned within an average of 15 minutes.
VIP: For a 1-hour audio file, results are returned within an average of 5 minutes.
4 Hotwords
1. What services and languages does hotwords support?
Speech recognition related services all support setting hotwords, i.e. Short Speech Recognition、Real-time Speech Recognition、Audio File Transcription (Standard)、Audio File Transcription (VIP).
All languages launched on the platform support hotword settings, including Japanese, Japanese-English mixed, English, Chinese and Chinese-English mixed.
2.How to set hotwords?
(1) Japanese & Japanese-English Mixed Language:Set in the form of hotword groups, composed of "writing, pronunciation, category". For example:早稲田大学,ワセダダイガク,固有名詞.
- Writing and pronunciation should not exceed 30 characters.
(2) Chinese & Chinese-English Mixed & English Language:Set in the form of hotwords, consisting only of "writing". For example:汇演.
- A single hotword should not exceed 30 characters.
3.How to create hotwords?
Real-time hotwords: When calling speech recognition-related services, pass the hotwords list 'hotwords_list' parameter in a single connection/request to make it effective.
Non-real-time hotwords: hotwords set through Method One and Method Two can be viewed by logging into the DolphinVoice platform and selecting "Customized Word Lists-Hot Words".
-
Method One: Create by calling the hotwords related interface. See the Hotwords API for details.
-
Method Two: Create hotwords by logging into the DolphinVoice platform and selecting "Customized Word Lists-Hot Words".
4.What are the differences between real-time and non-real-time hotwords?
(1) Real-time hotwords do not require creating a hotword database ID in advance. You can simply pass the hotwords list in a single connection/request. Non-real-time hotwords require creating a hotword database ID by calling the Hotwords API first, and then using the hotwords database ID in subsequent recognition service calls.
(2) Real-time hotwords are deleted after use, i.e., one-time hotwords. Non-real-time hotwords can be used multiple times.
(3) Up to 100 hotwords/hotword groups can be set at a time for real-time hotwords. There is no upper limit for creating non-real-time hotwords database, but a single hotword database should not exceed 20000 entries.
(4) If both real-time and non-real-time hotwords are used in a single call, the real-time hotwords will take precedence.
5 Status Codes
5.1 General Status Codes
| Error Code | Error Message | Description | Solution |
|---|---|---|---|
| 110000 | Token Missing | Token is missing | Add the token parameter |
| 110001 | Invalid Token | Token error | Pass the correct token value |
| 110005 | Concurrency Quota Exceeded | Concurrency Quota Exceeded | Please contact Business |
| 110006 | Failed To Create Token | Token creation failed | Recreate the token |
| 110007 | APP ID Not Found | APP ID Not Found | Please check and enter the correct app_id |
| 110008 | Invalid Signature | Invalid Signature | Please regenerate the correct signature |
| 110009 | Token Expired | Token has expired | Re-obtain the token |
| 110011 | Illegal Current Time | Invalid current time | Check if the time is correct |
| 110012 | Payment Status Abnormal, Service Unavailable | Payment Status Abnormal, Service Unavailable | Please contact Business |
| 120000 | Network Error | Network error | Check your network |
| 120001 | Lack Of Network Permissions | Lack of network access rights | Check your network |
| 120002 | Network Disconnected | Network connection has been disconnected | Check your network |
| 120003 | No Network Connection | No network connection | Check your network |
| 130000 | Lack Of Recording Permissions | Lack of recording access rights | Check your recording access rights |
| 130001 | Microphone is not initialized, please call initRecorder after obtaining recording permissions | Microphone is not initialized. Please call initRecorder after obtaining recording permissions | Initialize the microphone |
| 130002 | No Recording Devices Available | No recording devices were found | Check your device |
| 140000 | Database is busy, please try again later | The database is busy, please try again later | Contact business department |
| 140004 | APPID/APPSecret Cannot Be Null | APPID/APPSecret cannot be null | Enter APPID/APPSecret |
| 140005 | Listener is null, please call setListener method first | Listener is null. Please call the setListener method first | Call the setListener method first |
| 140006 | InitListener Cannot Be Null | InitListener cannot be null | InitListener cannot be null |
| 140010 | Invalid Parameter | Parameter error | Check the parameters (to ensure they are not unspecified, incorrect, or empty strings) |
| 140011 | Parameter Missing | Parameter missing | The required parameter is missing |
| 140012 | Invalid Parameter Type | Parameter type error | Check the type of the parameter |
| 140013 | Invalid Parameter Format | Parameter format error | Check the format of the parameter |
5.2 Short Speech Recognition/Real-Time Speech Recognition
| Error Code | Error Message | Description | Solution |
|---|---|---|---|
| 200000 | Invalid Parameter | Parameter error | Check the parameters (to ensure they are not unspecified, incorrect, or empty strings) |
| 200001 | Parameter Missing | Parameter missing | The required parameter is missing |
| 200002 | Invalid Parameter Type | Parameter type error | Check the type of the parameter |
| 200003 | Invalid Parameter Format | Parameter format error | Check the format of the parameter |
| 210500 | Failed To Call Engine | Service call failed | Please contact the business department |
| 210200 | Audio Format Is Inconsistent With Parameters | Audio format does not match parameters | Ensure the audio format and parameters match |
| 210201 | Reading Audio Failed | Audio reading failed | Please resend the audio |
| 210202 | Invalid Audio Sample Rate | Audio sampling rate error | Ensure the wav audio sampling rate matches the request parameters |
| 210203 | Invalid Number Of Channels | Incorrect number of audio channels | Check if the audio is single-channel |
| 210204 | Failed To Save Audio | Failed to save audio | Please contact the business department |
| 210000 | Gateway Timeout In Receiving Data | Gateway timeout in receiving data | Please resend the data |
| 210001 | Connection Error 1 | Connection error 1 | Please contact the business department |
| 210002 | Connection Error2 | Connection error 2 | Please contact the business department |
| 210003 | Disconnected | The connection has been disconnected | Please contact the business department |
| 210004 | Service Not Started | Service has not started | Please contact the business department |
| 210100 | Invalid Calling Sequence | Incorrect calling sequence | Please contact the business department |
5.3 Audio Transcription
| Error Code | Error Message | Description | Solution |
|---|---|---|---|
| 200000 | Invalid Parameter | Parameter error | Check the parameters (to ensure they are not unspecified, incorrect, or empty strings) |
| 200001 | Parameter Missing | Parameter is missing | The required parameter is missing |
| 200002 | Invalid Parameter Type | Parameter type error | Check the type of the parameter |
| 200003 | Invalid Parameter Format | Parameter format error | Check the format of the parameter |
| 220500 | Failed To Call Engine | Service call failed | Please contact the business department |
| 220502 | VAD Engine Error | VAD error | Please contact the business department |
| 220200 | Audio Format Is Inconsistent With Parameters | Audio format does not match parameters | Ensure the audio format and parameters match |
| 220201 | File Size Exceeds Limit | File size exceeds limit | Please upload a file that meets the requirements |
| 220202 | File Duration Exceeds Limit | File duration exceeds limit | Please upload a file that meets the requirements |
| 220203 | Invalid Number Of Channels | Invalid Number Of Channels | Please check that the number of audio channels matches the parameters passed |
| 220403 | Audio Download Failed | Audio download failed | Check if the file URL can be accessed normally |
| 220203 | Invalid Number Of Channels | Audio channel number error | Ensure the channel number of the uploaded file matches the parameters |
| 220301 | Connection Error 1 | Connection error 1 | Please contact the business department |
| 220400 | Failed To Get Audio Duration | Failed to get audio duration | Please contact the business department |
| 220401 | Failed To Save File | Failed to save the file | Please contact the business department |
| 220402 | Failed To Open File | Failed to open the file | Please contact the business department |
| 220403 | Audio Download Failed | Failed to download the audio | Please contact the business department |
| 220404 | Task ID Not Found | taskid does not exist | Please enter the correct task ID |
| 220405 | Task Execution Timeout | Task execution timeout | Please re-upload |
5.4 Hotwords Related
| Error Code | Error Message | Description | Solution |
|---|---|---|---|
| 200100 | Hot Word File Format Error | Hot Word File Format Error | Please upload a txt file |
| 200101 | Hot word file content is empty | Hot word file content is empty | Please check the content of the hot word file |
| 200102 | Failed To Read Hot Word Library | Failed To Read Hot Word Library | Please re-upload or contact business department |
| 200103 | Character count exceeds limit | Character count exceeds limit | Document-based optimization character count exceeds limit (1000000 characters) |
| 200104 | Language Not Supported | Language Not Supported | Language Not Supported For Document-based Optimization |
| 200105 | Failed To Create Document-based Optimization | Failed To Create Document-based Optimization | Please recreate or contact business department |
| 200106 | Hot word file size exceeds limit | Hot word file size exceeds limit | The hot word file size must be within 3MB |
| 200107 | Hot word library ID does not exist | Hot word library ID does not exist | Please enter the correct hot word library ID/ hot word |
| 200108 | Hot Words In Use; Operations Prohibited | Hot Words In Use; Operations Prohibited | Hot Words In Use; Operations Prohibited |
| 200109 | Number of hot words exceeds limit | Number of hot words exceeds limit | Max 20000 hotwords/groups per library |