Logo

FAQs

1 General Questions

1. How to apply for a DolphinVoice account?

Navigate to the DolphinVoice website and click on the login button to register. For detailed steps, please refer to the Quick Start.

2. How to create a project?

Once your account application is successful, a project is automatically created by default. If you need a new project, you can create it by clicking the New Project button.

Each project has all AI capabilities enabled by default.

3. Where to check the AppID and AppSecret?

Please refer to the Connection URL on the DolphinVoice User Center.

4. Will the existing Token become invalid upon re-obtaining a new one?

Re-obtaining the Token does not affect the validity of the Token you already have. The validity of the Token is related only to the valid time period.

5. How long is the default validity period for the Token?

The default validity period for the Token is 7 days.

2 Short Speech Recognition/Real-Time Speech Recognition

1. What is the difference between RESTful API and WebSocket connections?

In the case of RESTful, the service only returns a single recognition result after the user's speech is finished. In contrast, with WebSocket, recognition results are returned while the user is still speaking, and many intermediate results are returned before the final recognition result.

2. What is the difference between WebSocket/HTTP APIs and SDK?

Recognition engine API: For WebSocket and HTTP APIs, developers need to integrate code to develop applications and call the engine.

SDK: The package includes the engine recognition function and its interface, making it easier for developers to develop.

3. What programming languages does the SDK support?

Real-time speech recognition uses the WS protocol, and short speech recognition supports both WS and HTTP protocols. WS protocol integration with the engine is more challenging, so SDK support is provided.

Types of SDK: Python, Android, iOS, H5/JS.

4. If an error occurs during SDK development and it does not work properly, what should I do?

First, please try our official demo. Once the demo works correctly, then add your own code, which should function normally.

5. How long is the response time?

The response time for recognition results is ≤500ms.

6. Which languages are supported?

The supported languages include Japanese, English, Chinese, and others.

For details, please refer to the Developer Guides.

7. What sampling rates and bit depths are supported?

Default sampling rates: 16000 Hz, 8000Hz.

Other sampling rates: If the original audio sampling rate is known, you can use ffmpeg to convert the sampling rate.

8. What audio formats are supported?

Short speech recognition & real-time speech recognition: WAV, PCM, MP3, OPUS formats are supported.

9. What audio channels does speech recognition support?

A channel refers to an independent audio signal that is captured or played back at different spatial positions during recording or playback. Therefore, the number of channels is equivalent to the number of sound sources during recording or the corresponding number of speakers during playback. Except for the transcription of audio files, all other speech recognition services currently only support single-channel (mono) audio.

10. What fields does speech recognition support?

Currently, speech recognition supports two primary domains: the general field and the call center field. For the general field, the supported sampling rate is 16000Hz, while for the call center field, it is 8000Hz.

11. Is it necessary to send audio data continuously?

Audio data must be sent continuously. If the server does not receive audio data within a certain period, it will timeout and return an error message. To send data again, the client needs to initiate a new request.

12. Why can I still receive data from the server after the audio data transmission is interrupted?

After the audio data transmission is interrupted due to non-continuity and timing out, if the server still has unprocessed data from before, it will continue to return the recognition results for that data.

3 Audio Transcription

1. What SDK versions are available?

The audio file transcription uses an HTTP protocol interface, which is convenient to call. The SDK is not provided.

2. What programming languages are supported?

Java, Python, C++, C, C#, H5/JS, etc.

3. What is the output format of the transcription result?

It supports two formats: manuscript and subtitle. You can choose the appropriate format according to your needs.

4. What audio file formats are supported for audio transcription?

.wav/.mp3/.opus/.pcm/.amr/.3gp/.aac formats are supported.

5. What are the upper limits of file duration and file size?

The duration limit is 5 hours. The size of the audio file must be less than 1GB.

6. Are there timestamps in the transcription results?

Audio processing steps:

(1) Firstly, the audio is segmented into small pieces using a 450ms VAD (Voice Activity Detection) threshold, then sentences are defined to be between 5 to 30 seconds, and audio is combined or forcibly cut according to the slices and tasks, followed by synchronous transcription.

(2) Return the transcription results, which include the start and end times of an audio segment, with precision to the millisecond.

7. Is speaker diarization supported?

Yes, our platform supports speaker diarization through either the number of audio channels or speaker diarization technology, but the two methods cannot be used simultaneously.

(1) Based on the number of channels: Please ensure that the uploaded file must have two or more channels, and text content from the same channel number is attributed to one speaker.

(2) Based on the speaker diarization technology: The uploaded audio files are analyzed. Distinguish speaker IDs and corresponding start and end times based on voiceprint information.

8. How is speaker diarization done in subtitle mode?

Currently, speaker diarization is not supported in subtitle mode. It is only possible when the output format is manuscript.

9. What are the differences between Audio File Transcription Standard and VIP?

(1) The supported file formats differ:

Standard: wav, mp3, wma, mp4, pcm, m4a, amr, 3gp, aac.

VIP: wav, mp3, wma, pcm, amr, 3gp, aac.

(2) The requirements for file size differ:

Standard: The size of audio files should not exceed 1GB; the size of video files should not exceed 2GB.

VIP: The size of audio files should not exceed 1GB.

(3) The speed of returning results differs:

Standard: For a 1-hour audio file, results are returned within an average of 15 minutes.

VIP: For a 1-hour audio file, results are returned within an average of 5 minutes.

4 Hotwords

1. What services and languages does hotwords support?

Speech recognition related services all support setting hotwords, i.e. Short Speech Recognition、Real-time Speech Recognition、Audio File Transcription (Standard)、Audio File Transcription (VIP).

All languages launched on the platform support hotword settings, including Japanese, Japanese-English mixed, English, Chinese and Chinese-English mixed.

2.How to set hotwords?

(1) Japanese & Japanese-English Mixed Language:Set in the form of hotword groups, composed of "writing, pronunciation, category". For example:早稲田大学,ワセダダイガク,固有名詞.

  • Writing and pronunciation should not exceed 30 characters.

(2) Chinese & Chinese-English Mixed & English Language:Set in the form of hotwords, consisting only of "writing". For example:汇演.

  • A single hotword should not exceed 30 characters.

3.How to create hotwords?

Real-time hotwords: When calling speech recognition-related services, pass the hotwords list 'hotwords_list' parameter in a single connection/request to make it effective.

Non-real-time hotwords: hotwords set through Method One and Method Two can be viewed by logging into the DolphinVoice platform and selecting "Customized Word Lists-Hot Words".

  • Method One: Create by calling the hotwords related interface. See the Hotwords API for details.

  • Method Two: Create hotwords by logging into the DolphinVoice platform and selecting "Customized Word Lists-Hot Words".

4.What are the differences between real-time and non-real-time hotwords?

(1) Real-time hotwords do not require creating a hotword database ID in advance. You can simply pass the hotwords list in a single connection/request. Non-real-time hotwords require creating a hotword database ID by calling the Hotwords API first, and then using the hotwords database ID in subsequent recognition service calls.

(2) Real-time hotwords are deleted after use, i.e., one-time hotwords. Non-real-time hotwords can be used multiple times.

(3) Up to 100 hotwords/hotword groups can be set at a time for real-time hotwords. There is no upper limit for creating non-real-time hotwords database, but a single hotword database should not exceed 20000 entries.

(4) If both real-time and non-real-time hotwords are used in a single call, the real-time hotwords will take precedence.

5 Status Codes

5.1 General Status Codes

Error CodeError MessageDescriptionSolution
110000Token MissingToken is missingAdd the token parameter
110001Invalid TokenToken errorPass the correct token value
110005Concurrency Quota ExceededConcurrency Quota ExceededPlease contact Business
110006Failed To Create TokenToken creation failedRecreate the token
110007APP ID Not FoundAPP ID Not FoundPlease check and enter the correct app_id
110008Invalid SignatureInvalid SignaturePlease regenerate the correct signature
110009Token ExpiredToken has expiredRe-obtain the token
110011Illegal Current TimeInvalid current timeCheck if the time is correct
110012Payment Status Abnormal, Service UnavailablePayment Status Abnormal, Service UnavailablePlease contact Business
120000Network ErrorNetwork errorCheck your network
120001Lack Of Network PermissionsLack of network access rightsCheck your network
120002Network DisconnectedNetwork connection has been disconnectedCheck your network
120003No Network ConnectionNo network connectionCheck your network
130000Lack Of Recording PermissionsLack of recording access rightsCheck your recording access rights
130001Microphone is not initialized, please call initRecorder after obtaining recording permissionsMicrophone is not initialized. Please call initRecorder after obtaining recording permissionsInitialize the microphone
130002No Recording Devices AvailableNo recording devices were foundCheck your device
140000Database is busy, please try again laterThe database is busy, please try again laterContact business department
140004APPID/APPSecret Cannot Be NullAPPID/APPSecret cannot be nullEnter APPID/APPSecret
140005Listener is null, please call setListener method firstListener is null. Please call the setListener method firstCall the setListener method first
140006InitListener Cannot Be NullInitListener cannot be nullInitListener cannot be null
140010Invalid ParameterParameter errorCheck the parameters (to ensure they are not unspecified, incorrect, or empty strings)
140011Parameter MissingParameter missingThe required parameter is missing
140012Invalid Parameter TypeParameter type errorCheck the type of the parameter
140013Invalid Parameter FormatParameter format errorCheck the format of the parameter

5.2 Short Speech Recognition/Real-Time Speech Recognition

Error CodeError MessageDescriptionSolution
200000Invalid ParameterParameter errorCheck the parameters (to ensure they are not unspecified, incorrect, or empty strings)
200001Parameter MissingParameter missingThe required parameter is missing
200002Invalid Parameter TypeParameter type errorCheck the type of the parameter
200003Invalid Parameter FormatParameter format errorCheck the format of the parameter
210500Failed To Call EngineService call failedPlease contact the business department
210200Audio Format Is Inconsistent With ParametersAudio format does not match parametersEnsure the audio format and parameters match
210201Reading Audio FailedAudio reading failedPlease resend the audio
210202Invalid Audio Sample RateAudio sampling rate errorEnsure the wav audio sampling rate matches the request parameters
210203Invalid Number Of ChannelsIncorrect number of audio channelsCheck if the audio is single-channel
210204Failed To Save AudioFailed to save audioPlease contact the business department
210000Gateway Timeout In Receiving DataGateway timeout in receiving dataPlease resend the data
210001Connection Error 1Connection error 1Please contact the business department
210002Connection Error2Connection error 2Please contact the business department
210003DisconnectedThe connection has been disconnectedPlease contact the business department
210004Service Not StartedService has not startedPlease contact the business department
210100Invalid Calling SequenceIncorrect calling sequencePlease contact the business department

5.3 Audio Transcription

Error CodeError MessageDescriptionSolution
200000Invalid ParameterParameter errorCheck the parameters (to ensure they are not unspecified, incorrect, or empty strings)
200001Parameter MissingParameter is missingThe required parameter is missing
200002Invalid Parameter TypeParameter type errorCheck the type of the parameter
200003Invalid Parameter FormatParameter format errorCheck the format of the parameter
220500Failed To Call EngineService call failedPlease contact the business department
220502VAD Engine ErrorVAD errorPlease contact the business department
220200Audio Format Is Inconsistent With ParametersAudio format does not match parametersEnsure the audio format and parameters match
220201File Size Exceeds LimitFile size exceeds limitPlease upload a file that meets the requirements
220202File Duration Exceeds LimitFile duration exceeds limitPlease upload a file that meets the requirements
220203Invalid Number Of ChannelsInvalid Number Of ChannelsPlease check that the number of audio channels matches the parameters passed
220403Audio Download FailedAudio download failedCheck if the file URL can be accessed normally
220203Invalid Number Of ChannelsAudio channel number errorEnsure the channel number of the uploaded file matches the parameters
220301Connection Error 1Connection error 1Please contact the business department
220400Failed To Get Audio DurationFailed to get audio durationPlease contact the business department
220401Failed To Save FileFailed to save the filePlease contact the business department
220402Failed To Open FileFailed to open the filePlease contact the business department
220403Audio Download FailedFailed to download the audioPlease contact the business department
220404Task ID Not Foundtaskid does not existPlease enter the correct task ID
220405Task Execution TimeoutTask execution timeoutPlease re-upload
Error CodeError MessageDescriptionSolution
200100Hot Word File Format ErrorHot Word File Format ErrorPlease upload a txt file
200101Hot word file content is emptyHot word file content is emptyPlease check the content of the hot word file
200102Failed To Read Hot Word LibraryFailed To Read Hot Word LibraryPlease re-upload or contact business department
200103Character count exceeds limitCharacter count exceeds limitDocument-based optimization character count exceeds limit (1000000 characters)
200104Language Not SupportedLanguage Not SupportedLanguage Not Supported For Document-based Optimization
200105Failed To Create Document-based OptimizationFailed To Create Document-based OptimizationPlease recreate or contact business department
200106Hot word file size exceeds limitHot word file size exceeds limitThe hot word file size must be within 3MB
200107Hot word library ID does not existHot word library ID does not existPlease enter the correct hot word library ID/ hot word
200108Hot Words In Use; Operations ProhibitedHot Words In Use; Operations ProhibitedHot Words In Use; Operations Prohibited
200109Number of hot words exceeds limitNumber of hot words exceeds limitMax 20000 hotwords/groups per library

On this page

1 General Questions
1. How to apply for a DolphinVoice account?
2. How to create a project?
3. Where to check the AppID and AppSecret?
4. Will the existing Token become invalid upon re-obtaining a new one?
5. How long is the default validity period for the Token?
2 Short Speech Recognition/Real-Time Speech Recognition
1. What is the difference between RESTful API and WebSocket connections?
2. What is the difference between WebSocket/HTTP APIs and SDK?
3. What programming languages does the SDK support?
4. If an error occurs during SDK development and it does not work properly, what should I do?
5. How long is the response time?
6. Which languages are supported?
7. What sampling rates and bit depths are supported?
8. What audio formats are supported?
9. What audio channels does speech recognition support?
10. What fields does speech recognition support?
11. Is it necessary to send audio data continuously?
12. Why can I still receive data from the server after the audio data transmission is interrupted?
3 Audio Transcription
1. What SDK versions are available?
2. What programming languages are supported?
3. What is the output format of the transcription result?
4. What audio file formats are supported for audio transcription?
5. What are the upper limits of file duration and file size?
6. Are there timestamps in the transcription results?
7. Is speaker diarization supported?
8. How is speaker diarization done in subtitle mode?
9. What are the differences between Audio File Transcription Standard and VIP?
4 Hotwords
1. What services and languages does hotwords support?
2.How to set hotwords?
3.How to create hotwords?
4.What are the differences between real-time and non-real-time hotwords?
5 Status Codes
5.1 General Status Codes
5.2 Short Speech Recognition/Real-Time Speech Recognition
5.3 Audio Transcription
5.4 Hotwords Related