H5/JS SDK

Before using the SDK, please read the Interface Protocol first. For details, refer to Cloud API.

1 Browser Compatibility

Operating System	Minimum Supported Browser Version
	Chrome	Firefox
Windows 7	41.0.2272.76	46
Windows 10/11	43.0.2357.81	46
MAC OS	43.0.2357.81	46

2 Integration

<!-- 【1.1】For the audio format mp3, it is necessary to introduce the core recording files; for the audio format pcm, it is not required. -->
<script type="text/javascript" src="static/recorder/recorder.min.js"></script>

<!-- 【1.2】Introduce the SDK core files -->
<script type="text/javascript" src="sdk/AsrSDK.min.js"></script>

3 Parameter Settings

1. Engine Parameters

（1）Parameter Instance

Set app_id and secret
- Please go to the user backend for app_id and secret
Signature and Timestamp
- Users obtain timestamp and signature by calling the interface
Set capability parameter engine
- Real-time speech recognition: “SpeechTranscriber” (default)
- Short speech recognition: “SpeechRecognizer”
Set recognition parameters
- Please set recognition parameters in the payload field through JSON
- If parameters are missing or assigned values are out of range, an error code and error message will be returned in the onError callback

Parameter	Type	Required	Description	Default Value
lang_type	String	Yes	Language option	Required
format	String	No	Audio encoding format	pcm
sample_rate	Integer	No	Audio sampling rate When sample_rate=‘8000’ field parameter field is required, and field=‘call-center’	16000
enable_intermediate_result	Boolean	No	Whether to return intermediate recognition results	true
enable_punctuation_prediction	Boolean	No	Whether to add punctuation in post-processing	true
enable_inverse_text_normalization	Boolean	No	Whether to perform ITN in post-processing	true
max_sentence_silence	Integer	No	Speech sentence breaking detection threshold. Silence longer than this threshold is considered as a sentence break. The valid parameter range is 200～1200. Unit: Milliseconds	sample_rate=16000：800 sample_rate=8000：250
enable_words	Boolean	No	Whether to return word information	false
enable_intermediate_words	Boolean	No	Whether to return intermediate result word information	false
enable_modal_particle_filter	Boolean	No	Whether to enable modal particle filtering/en/docs/api/asr/guidance)	true
hotwords_list	List`<String>`	No	One-time hotwords list, effective only for the current connection. If both `hotwords_list` and `hotwords_id` parameters exist, `hotwords_list` will be used. Up to 100 entries can be provided at a time.	None
hotwords_id	String	No	Hotwords ID	None
hotwords_weight	Float	No	Hotwords weight, the range of values [0.1, 1.0]	0.4
correction_words_id	String	No	Forced correction vocabulary ID Supports multiple IDs, separated by a vertical bar `\|`; `all` indicates using all IDs.	None
forbidden_words_id	String	No	Forbidden words ID Supports multiple IDs, separated by a vertical bar `\|`; `all` indicates using all IDs.	None
field	String	No	Field General: general (supports the sampling rate of 16000Hz) Call Center: call-center (supports the sampling rate of 8000Hz)	None
audio_url	String	No	Returned audio format (stored on the platform for only 30 days) mp3: Returns a url for the audio in mp3 format pcm: Returns a url for the audio in pcm format wav: Returns a url for the audio in wav format	None
connect_timeout	Integer	No	Connection timeout (seconds), range: 5-60	10
gain	Integer	No	Amplitude gain factor, range [1, 20] 1 indicates no amplification, 2 indicates the original amplitude doubled (amplified by 1 times), and so on	sample_rate=16000：1 sample_rate=8000：2
max_suffix_silence	Integer	No	Post-speech silence detection threshold (in seconds), with a range of 1 to 10 seconds. If the duration of silence at the end of a sentence exceeds this threshold, recognition will automatically stop When the parameter value is set to 0 or the parameter is not provided, the post-speech silence detection feature is disabled	0
user_id	String	No	Custom user information, which will be returned unchanged in the response message, with a maximum length of 36 characters	None
enable_save_log	Boolean	No	Provide log of audio data and recognition results to help us improve the quality of our products and services.	true

Example (Method 1, recommended): Obtain timestamp and signature from the backend

new AsrEngine({
    engine: '',
    app_id: '',
    signature : '',
    timestamp : 1234567890,
    payload: {
      lang_type: 'ja-JP',
      format: 'pcm',
      sample_rate: 16000,
      enable_intermediate_result : true,
      enable_punctuation_prediction : true,
      enable_inverse_text_normalization : true,
      enable_words : true,
    }
})

Example (Method 2, Insecure): Embed the secret in the front-end, and have the front-end generate the signature

new AsrEngine ({
    engine: '',
    app_id: '',
    secret : '',
    payload: {
      lang_type: 'ja-JP',
      format: 'pcm',
      sample_rate: 16000,
      enable_intermediate_result : true,
      enable_punctuation_prediction : true,
      enable_inverse_text_normalization : true,
      enable_words : true,
    }
})

2. Microphone Methods

（1）Parameter Settings

Name	Type	Description	Default Value
micAllowCallback	Function	Callback method for microphone permission	None
micForbidCallback	Function	Callback method for microphone denial	None

（2）Parameter Instance

new AsrEngine ({ 
    micAllowCallback:function(){},
    micForbidCallback:function(status,msg){} 
})

3. Initialization Methods

（1）Parameter Settings

Name	Type	Description	Default Value
engineFirstInitDone	Function	Callback method for successful initialization	None
engineFirstInitFail	Function	Callback method for failed initialization	None

（2）Parameter Instance

new AsrEngine ({
    engineFirstInitDone:function(){},
    engineFirstInitFail:function(status,msg){} 
})

4. Recognition Result Return

（1）Parameter Settings

Name	Type	Description	Default Value
onStart	Function	Callback method when engine connection starts	None
onSentenceBegin	Function	Callback method when engine returns the start of a sentence result, note: only applies to real-time speech recognition	None
onIntermediateResult	Function	Callback method when engine returns intermediate results	None
onSentenceEnd	Function	Callback method when engine returns the end of a sentence result, note: only applies to real-time speech recognition	None
onStop	Function	Callback method when engine connection ends, note: only applies to real-time speech recognition	None
onResult	Function	Callback method when engine returns the end of a sentence result, note: only applies to short speech recognition	None
onWarning	Function	Callback method when engine returns a result warning	None
onError	Function	Callback method when engine returns a result error	None

Note: When an onError callback is received, recognition will automatically stop and the connection will be disconnected, so there is no need to call the stopAsr() method again.

（2）Parameter Instance

new AsrEngine ({
    onStart: function (data,taskId){},
    onSentenceBegin: function (data,taskId){},
    onIntermediateResult: function (data,taskId){},
    onSentenceEnd: function (data,taskId) {},
    onStop: function (data,taskId) {},
    onResult: function (data,taskId) {},
    onWarning:function(status,msg,taskId){},
    onError:function(status,msg,taskId){}
})

5. Network

（1）Parameter Settings

Name	Type	Description	Default Value
onNetworkError	Function	Network monitoring callback	None

Note: When an onNetworkError callback is received, recognition will automatically stop and the connection will be disconnected, so there is no need to call the stopAsr() method again.

（2）Parameter Instance

new AsrEngine ({ 
    onNetworkError:function(status,msg){},
})

4 Method Invocation

4.1 startAsr()

Function: Start recognition;

4.2 stopAsr()

Function: Stop recognition;

4.3 sentenceEnd()

Function: Force sentence ending;

Note: This method is only used for real-time speech recognition.

4.4 speakerStart(speaker_id)

Function: Customize speaker number;

Note: This method is only used for real-time speech recognition;

Parameter: @speaker_id The value of the customized speaker number, string type

The speaker_id supports up to 36 characters, with anything beyond that being cut off.
If the speaker_id parameter is not provided in the SpeakerStart event, the returned speaker_id will be empty.
The SpeakerStart event triggers a forced sentence ending. Therefore, please send the SpeakerStart event only before the speaker switches.

4.5 cancelAsr()

Function: Cancel recognition;

4.6 destroyAsr()

Function: Destroy instance;

5 SDK Download

H5 SDK

On this page