H5/JS SDK
Short Speech Recognition H5/JS SDK
Before using the SDK, please read the Interface Protocol first. For details, refer to Cloud API.
1 Browser Compatibility
| Operating System | Minimum Supported Browser Version | |
|---|---|---|
| Chrome | Firefox | |
| Windows 7 | 41.0.2272.76 | 46 |
| Windows 10/11 | 43.0.2357.81 | 46 |
| MAC OS | 43.0.2357.81 | 46 |
2 Integration
<!-- 【1.1】For the audio format mp3, it is necessary to introduce the core recording files; for the audio format pcm, it is not required. -->
<script type="text/javascript" src="static/recorder/recorder.min.js"></script>
<!-- 【1.2】Introduce the SDK core files -->
<script type="text/javascript" src="sdk/AsrSDK.min.js"></script>3 Parameter Settings
1. Engine Parameters
(1)Parameter Instance
- Set app_id and secret
- Please go to the user backend for app_id and secret
- Signature and Timestamp
- Users obtain timestamp and signature by calling the interface
- Set capability parameter engine
- Real-time speech recognition: “SpeechTranscriber” (default)
- Short speech recognition: “SpeechRecognizer”
- Set recognition parameters
- Please set recognition parameters in the
payloadfield through JSON - If parameters are missing or assigned values are out of range, an error code and error message will be returned in the
onErrorcallback
- Please set recognition parameters in the
| Parameter | Type | Required | Description | Default Value |
|---|---|---|---|---|
| lang_type | String | Yes | Language option | Required |
| format | String | No | Audio encoding format | pcm |
| sample_rate | Integer | No | Audio sampling rate When sample_rate=‘8000’ field parameter field is required, and field=‘call-center’ | 16000 |
| enable_intermediate_result | Boolean | No | Whether to return intermediate recognition results | true |
| enable_punctuation_prediction | Boolean | No | Whether to add punctuation in post-processing | true |
| enable_inverse_text_normalization | Boolean | No | Whether to perform ITN in post-processing | true |
| max_sentence_silence | Integer | No | Speech sentence breaking detection threshold. Silence longer than this threshold is considered as a sentence break. The valid parameter range is 200~1200. Unit: Milliseconds | sample_rate=16000:800 sample_rate=8000:250 |
| enable_words | Boolean | No | Whether to return word information | false |
| enable_intermediate_words | Boolean | No | Whether to return intermediate result word information | false |
| enable_modal_particle_filter | Boolean | No | Whether to enable modal particle filtering/en/docs/api/asr/guidance) | true |
| hotwords_list | List<String> | No | One-time hotwords list, effective only for the current connection. If both hotwords_list and hotwords_id parameters exist, hotwords_list will be used. Up to 100 entries can be provided at a time. | None |
| hotwords_id | String | No | Hotwords ID | None |
| hotwords_weight | Float | No | Hotwords weight, the range of values [0.1, 1.0] | 0.4 |
| correction_words_id | String | No | Forced correction vocabulary ID Supports multiple IDs, separated by a vertical bar |; all indicates using all IDs. | None |
| forbidden_words_id | String | No | Forbidden words ID Supports multiple IDs, separated by a vertical bar |; all indicates using all IDs. | None |
| field | String | No | Field General: general (supports the sampling rate of 16000Hz) Call Center: call-center (supports the sampling rate of 8000Hz) | None |
| audio_url | String | No | Returned audio format (stored on the platform for only 30 days) mp3: Returns a url for the audio in mp3 format pcm: Returns a url for the audio in pcm format wav: Returns a url for the audio in wav format | None |
| connect_timeout | Integer | No | Connection timeout (seconds), range: 5-60 | 10 |
| gain | Integer | No | Amplitude gain factor, range [1, 20] 1 indicates no amplification, 2 indicates the original amplitude doubled (amplified by 1 times), and so on | sample_rate=16000:1 sample_rate=8000:2 |
| max_suffix_silence | Integer | No | Post-speech silence detection threshold (in seconds), with a range of 1 to 10 seconds. If the duration of silence at the end of a sentence exceeds this threshold, recognition will automatically stop When the parameter value is set to 0 or the parameter is not provided, the post-speech silence detection feature is disabled | 0 |
| user_id | String | No | Custom user information, which will be returned unchanged in the response message, with a maximum length of 36 characters | None |
| enable_save_log | Boolean | No | Provide log of audio data and recognition results to help us improve the quality of our products and services. | true |
Example (Method 1, recommended): Obtain timestamp and signature from the backend
new AsrEngine({
engine: '',
app_id: '',
signature : '',
timestamp : 1234567890,
payload: {
lang_type: 'ja-JP',
format: 'pcm',
sample_rate: 16000,
enable_intermediate_result : true,
enable_punctuation_prediction : true,
enable_inverse_text_normalization : true,
enable_words : true,
}
})Example (Method 2, Insecure): Embed the secret in the front-end, and have the front-end generate the signature
new AsrEngine ({
engine: '',
app_id: '',
secret : '',
payload: {
lang_type: 'ja-JP',
format: 'pcm',
sample_rate: 16000,
enable_intermediate_result : true,
enable_punctuation_prediction : true,
enable_inverse_text_normalization : true,
enable_words : true,
}
})2. Microphone Methods
(1)Parameter Settings
| Name | Type | Description | Default Value |
|---|---|---|---|
| micAllowCallback | Function | Callback method for microphone permission | None |
| micForbidCallback | Function | Callback method for microphone denial | None |
(2)Parameter Instance
new AsrEngine ({
micAllowCallback:function(){},
micForbidCallback:function(status,msg){}
})3. Initialization Methods
(1)Parameter Settings
| Name | Type | Description | Default Value |
|---|---|---|---|
| engineFirstInitDone | Function | Callback method for successful initialization | None |
| engineFirstInitFail | Function | Callback method for failed initialization | None |
(2)Parameter Instance
new AsrEngine ({
engineFirstInitDone:function(){},
engineFirstInitFail:function(status,msg){}
})4. Recognition Result Return
(1)Parameter Settings
| Name | Type | Description | Default Value |
|---|---|---|---|
| onStart | Function | Callback method when engine connection starts | None |
| onSentenceBegin | Function | Callback method when engine returns the start of a sentence result, note: only applies to real-time speech recognition | None |
| onIntermediateResult | Function | Callback method when engine returns intermediate results | None |
| onSentenceEnd | Function | Callback method when engine returns the end of a sentence result, note: only applies to real-time speech recognition | None |
| onStop | Function | Callback method when engine connection ends, note: only applies to real-time speech recognition | None |
| onResult | Function | Callback method when engine returns the end of a sentence result, note: only applies to short speech recognition | None |
| onWarning | Function | Callback method when engine returns a result warning | None |
| onError | Function | Callback method when engine returns a result error | None |
Note: When an onError callback is received, recognition will automatically stop and the connection will be disconnected, so there is no need to call the stopAsr() method again.
(2)Parameter Instance
new AsrEngine ({
onStart: function (data,taskId){},
onSentenceBegin: function (data,taskId){},
onIntermediateResult: function (data,taskId){},
onSentenceEnd: function (data,taskId) {},
onStop: function (data,taskId) {},
onResult: function (data,taskId) {},
onWarning:function(status,msg,taskId){},
onError:function(status,msg,taskId){}
})5. Network
(1)Parameter Settings
| Name | Type | Description | Default Value |
|---|---|---|---|
| onNetworkError | Function | Network monitoring callback | None |
Note: When an onNetworkError callback is received, recognition will automatically stop and the connection will be disconnected, so there is no need to call the stopAsr() method again.
(2)Parameter Instance
new AsrEngine ({
onNetworkError:function(status,msg){},
})4 Method Invocation
4.1 startAsr()
Function: Start recognition;
4.2 stopAsr()
Function: Stop recognition;
4.3 sentenceEnd()
Function: Force sentence ending;
Note: This method is only used for real-time speech recognition.
4.4 speakerStart(speaker_id)
Function: Customize speaker number;
Note: This method is only used for real-time speech recognition;
Parameter: @speaker_id The value of the customized speaker number, string type
- The speaker_id supports up to 36 characters, with anything beyond that being cut off.
- If the speaker_id parameter is not provided in the SpeakerStart event, the returned speaker_id will be empty.
- The SpeakerStart event triggers a forced sentence ending. Therefore, please send the SpeakerStart event only before the speaker switches.
4.5 cancelAsr()
Function: Cancel recognition;
4.6 destroyAsr()
Function: Destroy instance;