FAQs

1 General Questions

1. How to apply for a DolphinVoice account?

Navigate to the DolphinVoice website and click on the login button to register. For detailed steps, please refer to the Quick Start.

2. How to create a project?

Once your account application is successful, a project is automatically created by default. If you need a new project, you can create it by clicking the New Project button.

Each project has all AI capabilities enabled by default.

3. Where to check the AppID and AppSecret?

Please refer to the Connection URL on the DolphinVoice User Center.

4. Will the existing Token become invalid upon re-obtaining a new one?

Re-obtaining the Token does not affect the validity of the Token you already have. The validity of the Token is related only to the valid time period.

5. How long is the default validity period for the Token?

The default validity period for the Token is 7 days.

2 Short Speech Recognition/Real-Time Speech Recognition

1. What is the difference between RESTful API and WebSocket connections?

In the case of RESTful, the service only returns a single recognition result after the user's speech is finished. In contrast, with WebSocket, recognition results are returned while the user is still speaking, and many intermediate results are returned before the final recognition result.

2. What is the difference between WebSocket/HTTP APIs and SDK?

Recognition engine API: For WebSocket and HTTP APIs, developers need to integrate code to develop applications and call the engine.

SDK: The package includes the engine recognition function and its interface, making it easier for developers to develop.

3. What programming languages does the SDK support?

Real-time speech recognition uses the WS protocol, and short speech recognition supports both WS and HTTP protocols. WS protocol integration with the engine is more challenging, so SDK support is provided.

Types of SDK: Python, Android, iOS, H5/JS.

4. If an error occurs during SDK development and it does not work properly, what should I do?

First, please try our official demo. Once the demo works correctly, then add your own code, which should function normally.

5. How long is the response time?

The response time for recognition results is ≤500ms.

6. Which languages are supported?

The supported languages include Japanese, English, Chinese, and others.

For details, please refer to the Developer Guides.

7. What sampling rates and bit depths are supported?

Default sampling rates: 16000 Hz, 8000Hz.

Other sampling rates: If the original audio sampling rate is known, you can use ffmpeg to convert the sampling rate.

8. What audio formats are supported?

Short speech recognition & real-time speech recognition: WAV, PCM, MP3, OPUS formats are supported.

9. What audio channels does speech recognition support?

A channel refers to an independent audio signal that is captured or played back at different spatial positions during recording or playback. Therefore, the number of channels is equivalent to the number of sound sources during recording or the corresponding number of speakers during playback. Except for the transcription of audio files, all other speech recognition services currently only support single-channel (mono) audio.

10. What fields does speech recognition support?

Currently, speech recognition supports two primary domains: the general field and the call center field. For the general field, the supported sampling rate is 16000Hz, while for the call center field, it is 8000Hz.

11. Is it necessary to send audio data continuously?

Audio data must be sent continuously. If the server does not receive audio data within a certain period, it will timeout and return an error message. To send data again, the client needs to initiate a new request.

12. Why can I still receive data from the server after the audio data transmission is interrupted?

After the audio data transmission is interrupted due to non-continuity and timing out, if the server still has unprocessed data from before, it will continue to return the recognition results for that data.

3 Audio Transcription

1. What SDK versions are available?

The audio file transcription uses an HTTP protocol interface, which is convenient to call. The SDK is not provided.

2. What programming languages are supported?

Java, Python, C++, C, C#, H5/JS, etc.

3. What is the output format of the transcription result?

It supports two formats: manuscript and subtitle. You can choose the appropriate format according to your needs.

4. What audio file formats are supported for audio transcription?

.wav/.mp3/.opus/.pcm/.amr/.3gp/.aac formats are supported.

5. What are the upper limits of file duration and file size?

The duration limit is 5 hours. The size of the audio file must be less than 1GB.

6. Are there timestamps in the transcription results?

Audio processing steps:

(1) Firstly, the audio is segmented into small pieces using a 450ms VAD (Voice Activity Detection) threshold, then sentences are defined to be between 5 to 30 seconds, and audio is combined or forcibly cut according to the slices and tasks, followed by synchronous transcription.

(2) Return the transcription results, which include the start and end times of an audio segment, with precision to the millisecond.

7. Is speaker diarization supported?

Yes, our platform supports speaker diarization through either the number of audio channels or speaker diarization technology, but the two methods cannot be used simultaneously.

(1) Based on the number of channels: Please ensure that the uploaded file must have two or more channels, and text content from the same channel number is attributed to one speaker.

(2) Based on the speaker diarization technology: The uploaded audio files are analyzed. Distinguish speaker IDs and corresponding start and end times based on voiceprint information.

8. How is speaker diarization done in subtitle mode?

Currently, speaker diarization is not supported in subtitle mode. It is only possible when the output format is manuscript.

9. What are the differences between Audio File Transcription Standard and VIP?

(1) The supported file formats differ:

Standard: wav, mp3, wma, mp4, pcm, m4a, amr, 3gp, aac.

VIP: wav, mp3, wma, pcm, amr, 3gp, aac.

(2) The requirements for file size differ:

Standard: The size of audio files should not exceed 1GB; the size of video files should not exceed 2GB.

VIP: The size of audio files should not exceed 1GB.

(3) The speed of returning results differs:

Standard: For a 1-hour audio file, results are returned within an average of 15 minutes.

VIP: For a 1-hour audio file, results are returned within an average of 5 minutes.

4 Hotwords

1. What services and languages does hotwords support?

Speech recognition related services all support setting hotwords, i.e. Short Speech Recognition、Real-time Speech Recognition、Audio File Transcription (Standard)、Audio File Transcription (VIP).

All languages launched on the platform support hotword settings, including Japanese, Japanese-English mixed, English, Chinese and Chinese-English mixed.

2.How to set hotwords?

(1) Japanese & Japanese-English Mixed Language:Set in the form of hotword groups, composed of "writing, pronunciation, category". For example:早稲田大学,ワセダダイガク,固有名詞.

Writing and pronunciation should not exceed 30 characters.

(2) Chinese & Chinese-English Mixed & English Language:Set in the form of hotwords, consisting only of "writing". For example:汇演.

A single hotword should not exceed 30 characters.

3.How to create hotwords?

Real-time hotwords: When calling speech recognition-related services, pass the hotwords list 'hotwords_list' parameter in a single connection/request to make it effective.

Non-real-time hotwords: hotwords set through Method One and Method Two can be viewed by logging into the DolphinVoice platform and selecting "Customized Word Lists-Hot Words".

Method One: Create by calling the hotwords related interface. See the Hotwords API for details.
Method Two: Create hotwords by logging into the DolphinVoice platform and selecting "Customized Word Lists-Hot Words".

4.What are the differences between real-time and non-real-time hotwords?

(1) Real-time hotwords do not require creating a hotword database ID in advance. You can simply pass the hotwords list in a single connection/request. Non-real-time hotwords require creating a hotword database ID by calling the Hotwords API first, and then using the hotwords database ID in subsequent recognition service calls.

(2) Real-time hotwords are deleted after use, i.e., one-time hotwords. Non-real-time hotwords can be used multiple times.

(3) Up to 100 hotwords/hotword groups can be set at a time for real-time hotwords. There is no upper limit for creating non-real-time hotwords database, but a single hotword database should not exceed 20000 entries.

(4) If both real-time and non-real-time hotwords are used in a single call, the real-time hotwords will take precedence.

5 Status Codes

5.1 General Status Codes

Error Code	Error Message	Description	Solution
110000	Token Missing	Token is missing	Add the token parameter
110001	Invalid Token	Token error	Pass the correct token value
110005	Concurrency Quota Exceeded	Concurrency Quota Exceeded	Please contact Business
110006	Failed To Create Token	Token creation failed	Recreate the token
110007	APP ID Not Found	APP ID Not Found	Please check and enter the correct app_id
110008	Invalid Signature	Invalid Signature	Please regenerate the correct signature
110009	Token Expired	Token has expired	Re-obtain the token
110011	Illegal Current Time	Invalid current time	Check if the time is correct
110012	Payment Status Abnormal, Service Unavailable	Payment Status Abnormal, Service Unavailable	Please contact Business
120000	Network Error	Network error	Check your network
120001	Lack Of Network Permissions	Lack of network access rights	Check your network
120002	Network Disconnected	Network connection has been disconnected	Check your network
120003	No Network Connection	No network connection	Check your network
130000	Lack Of Recording Permissions	Lack of recording access rights	Check your recording access rights
130001	Microphone is not initialized, please call initRecorder after obtaining recording permissions	Microphone is not initialized. Please call initRecorder after obtaining recording permissions	Initialize the microphone
130002	No Recording Devices Available	No recording devices were found	Check your device
140000	Database is busy, please try again later	The database is busy, please try again later	Contact business department
140004	APPID/APPSecret Cannot Be Null	APPID/APPSecret cannot be null	Enter APPID/APPSecret
140005	Listener is null, please call setListener method first	Listener is null. Please call the setListener method first	Call the setListener method first
140006	InitListener Cannot Be Null	InitListener cannot be null	InitListener cannot be null
140010	Invalid Parameter	Parameter error	Check the parameters (to ensure they are not unspecified, incorrect, or empty strings)
140011	Parameter Missing	Parameter missing	The required parameter is missing
140012	Invalid Parameter Type	Parameter type error	Check the type of the parameter
140013	Invalid Parameter Format	Parameter format error	Check the format of the parameter

5.2 Short Speech Recognition/Real-Time Speech Recognition

Error Code	Error Message	Description	Solution
200000	Invalid Parameter	Parameter error	Check the parameters (to ensure they are not unspecified, incorrect, or empty strings)
200001	Parameter Missing	Parameter missing	The required parameter is missing
200002	Invalid Parameter Type	Parameter type error	Check the type of the parameter
200003	Invalid Parameter Format	Parameter format error	Check the format of the parameter
210500	Failed To Call Engine	Service call failed	Please contact the business department
210200	Audio Format Is Inconsistent With Parameters	Audio format does not match parameters	Ensure the audio format and parameters match
210201	Reading Audio Failed	Audio reading failed	Please resend the audio
210202	Invalid Audio Sample Rate	Audio sampling rate error	Ensure the wav audio sampling rate matches the request parameters
210203	Invalid Number Of Channels	Incorrect number of audio channels	Check if the audio is single-channel
210204	Failed To Save Audio	Failed to save audio	Please contact the business department
210000	Gateway Timeout In Receiving Data	Gateway timeout in receiving data	Please resend the data
210001	Connection Error 1	Connection error 1	Please contact the business department
210002	Connection Error2	Connection error 2	Please contact the business department
210003	Disconnected	The connection has been disconnected	Please contact the business department
210004	Service Not Started	Service has not started	Please contact the business department
210100	Invalid Calling Sequence	Incorrect calling sequence	Please contact the business department

5.3 Audio Transcription

Error Code	Error Message	Description	Solution
200000	Invalid Parameter	Parameter error	Check the parameters (to ensure they are not unspecified, incorrect, or empty strings)
200001	Parameter Missing	Parameter is missing	The required parameter is missing
200002	Invalid Parameter Type	Parameter type error	Check the type of the parameter
200003	Invalid Parameter Format	Parameter format error	Check the format of the parameter
220500	Failed To Call Engine	Service call failed	Please contact the business department
220502	VAD Engine Error	VAD error	Please contact the business department
220200	Audio Format Is Inconsistent With Parameters	Audio format does not match parameters	Ensure the audio format and parameters match
220201	File Size Exceeds Limit	File size exceeds limit	Please upload a file that meets the requirements
220202	File Duration Exceeds Limit	File duration exceeds limit	Please upload a file that meets the requirements
220203	Invalid Number Of Channels	Invalid Number Of Channels	Please check that the number of audio channels matches the parameters passed
220403	Audio Download Failed	Audio download failed	Check if the file URL can be accessed normally
220203	Invalid Number Of Channels	Audio channel number error	Ensure the channel number of the uploaded file matches the parameters
220301	Connection Error 1	Connection error 1	Please contact the business department
220400	Failed To Get Audio Duration	Failed to get audio duration	Please contact the business department
220401	Failed To Save File	Failed to save the file	Please contact the business department
220402	Failed To Open File	Failed to open the file	Please contact the business department
220403	Audio Download Failed	Failed to download the audio	Please contact the business department
220404	Task ID Not Found	taskid does not exist	Please enter the correct task ID
220405	Task Execution Timeout	Task execution timeout	Please re-upload

Error Code	Error Message	Description	Solution
200100	Hot Word File Format Error	Hot Word File Format Error	Please upload a txt file
200101	Hot word file content is empty	Hot word file content is empty	Please check the content of the hot word file
200102	Failed To Read Hot Word Library	Failed To Read Hot Word Library	Please re-upload or contact business department
200103	Character count exceeds limit	Character count exceeds limit	Document-based optimization character count exceeds limit (1000000 characters)
200104	Language Not Supported	Language Not Supported	Language Not Supported For Document-based Optimization
200105	Failed To Create Document-based Optimization	Failed To Create Document-based Optimization	Please recreate or contact business department
200106	Hot word file size exceeds limit	Hot word file size exceeds limit	The hot word file size must be within 3MB
200107	Hot word library ID does not exist	Hot word library ID does not exist	Please enter the correct hot word library ID/ hot word
200108	Hot Words In Use; Operations Prohibited	Hot Words In Use; Operations Prohibited	Hot Words In Use; Operations Prohibited
200109	Number of hot words exceeds limit	Number of hot words exceeds limit	Max 20000 hotwords/groups per library

On this page