Logo
Short Speech Recognition

H5/JS SDK

Short Speech Recognition H5/JS SDK

Before using the SDK, please read the Interface Protocol first. For details, refer to Cloud API.

1 Browser Compatibility

Operating SystemMinimum Supported Browser Version
ChromeFirefox
Windows 741.0.2272.7646
Windows 10/1143.0.2357.8146
MAC OS43.0.2357.8146

2 Integration

<!-- 【1.1】For the audio format mp3, it is necessary to introduce the core recording files; for the audio format pcm, it is not required. -->
<script type="text/javascript" src="static/recorder/recorder.min.js"></script>

<!-- 【1.2】Introduce the SDK core files -->
<script type="text/javascript" src="sdk/AsrSDK.min.js"></script>

3 Parameter Settings

1. Engine Parameters

(1)Parameter Instance

  • Set app_id and secret
    • Please go to the user backend for app_id and secret
  • Signature and Timestamp
    • Users obtain timestamp and signature by calling the interface
  • Set capability parameter engine
    • Real-time speech recognition: “SpeechTranscriber” (default)
    • Short speech recognition: “SpeechRecognizer”
  • Set recognition parameters
    • Please set recognition parameters in the payload field through JSON
    • If parameters are missing or assigned values are out of range, an error code and error message will be returned in the onError callback
ParameterTypeRequiredDescriptionDefault Value
lang_typeStringYesLanguage optionRequired
formatStringNoAudio encoding formatpcm
sample_rateIntegerNoAudio sampling rate
When sample_rate=‘8000’
field parameter field is required, and field=‘call-center’
16000
enable_intermediate_resultBooleanNoWhether to return intermediate recognition resultstrue
enable_punctuation_predictionBooleanNoWhether to add punctuation in post-processingtrue
enable_inverse_text_normalizationBooleanNoWhether to perform ITN in post-processingtrue
max_sentence_silenceIntegerNoSpeech sentence breaking detection threshold. Silence longer than this threshold is considered as a sentence break. The valid parameter range is 200~1200. Unit: Millisecondssample_rate=16000:800
sample_rate=8000:250
enable_wordsBooleanNoWhether to return word informationfalse
enable_intermediate_wordsBooleanNoWhether to return intermediate result word informationfalse
enable_modal_particle_filterBooleanNoWhether to enable modal particle filtering/en/docs/api/asr/guidance)true
hotwords_listList<String>NoOne-time hotwords list, effective only for the current connection. If both hotwords_list and hotwords_id parameters exist, hotwords_list will be used. Up to 100 entries can be provided at a time.None
hotwords_idStringNoHotwords IDNone
hotwords_weightFloatNoHotwords weight, the range of values [0.1, 1.0]0.4
correction_words_idStringNoForced correction vocabulary ID
Supports multiple IDs, separated by a vertical bar |; all indicates using all IDs.
None
forbidden_words_idStringNoForbidden words ID
Supports multiple IDs, separated by a vertical bar |; all indicates using all IDs.
None
fieldStringNoField
General: general (supports the sampling rate of 16000Hz)
Call Center: call-center (supports the sampling rate of 8000Hz)
None
audio_urlStringNoReturned audio format (stored on the platform for only 30 days)
mp3: Returns a url for the audio in mp3 format
pcm: Returns a url for the audio in pcm format
wav: Returns a url for the audio in wav format
None
connect_timeoutIntegerNoConnection timeout (seconds), range: 5-6010
gainIntegerNoAmplitude gain factor, range [1, 20]
1 indicates no amplification, 2 indicates the original amplitude doubled (amplified by 1 times), and so on
sample_rate=16000:1
sample_rate=8000:2
max_suffix_silenceIntegerNoPost-speech silence detection threshold (in seconds), with a range of 1 to 10 seconds. If the duration of silence at the end of a sentence exceeds this threshold, recognition will automatically stop
When the parameter value is set to 0 or the parameter is not provided, the post-speech silence detection feature is disabled
0
user_idStringNoCustom user information, which will be returned unchanged in the response message, with a maximum length of 36 charactersNone
enable_save_logBooleanNoProvide log of audio data and recognition results to help us improve the quality of our products and services.true

Example (Method 1, recommended): Obtain timestamp and signature from the backend

new AsrEngine({
    engine: '',
    app_id: '',
    signature : '',
    timestamp : 1234567890,
    payload: {
      lang_type: 'ja-JP',
      format: 'pcm',
      sample_rate: 16000,
      enable_intermediate_result : true,
      enable_punctuation_prediction : true,
      enable_inverse_text_normalization : true,
      enable_words : true,
    }
})

Example (Method 2, Insecure): Embed the secret in the front-end, and have the front-end generate the signature

new AsrEngine ({
    engine: '',
    app_id: '',
    secret : '',
    payload: {
      lang_type: 'ja-JP',
      format: 'pcm',
      sample_rate: 16000,
      enable_intermediate_result : true,
      enable_punctuation_prediction : true,
      enable_inverse_text_normalization : true,
      enable_words : true,
    }
})

2. Microphone Methods

(1)Parameter Settings

NameTypeDescriptionDefault Value
micAllowCallbackFunctionCallback method for microphone permissionNone
micForbidCallbackFunctionCallback method for microphone denialNone

(2)Parameter Instance

new AsrEngine ({ 
    micAllowCallback:function(){},
    micForbidCallback:function(status,msg){} 
})

3. Initialization Methods

(1)Parameter Settings

NameTypeDescriptionDefault Value
engineFirstInitDoneFunctionCallback method for successful initializationNone
engineFirstInitFailFunctionCallback method for failed initializationNone

(2)Parameter Instance

new AsrEngine ({
    engineFirstInitDone:function(){},
    engineFirstInitFail:function(status,msg){} 
})

4. Recognition Result Return

(1)Parameter Settings

NameTypeDescriptionDefault Value
onStartFunctionCallback method when engine connection startsNone
onSentenceBeginFunctionCallback method when engine returns the start of a sentence result, note: only applies to real-time speech recognitionNone
onIntermediateResultFunctionCallback method when engine returns intermediate resultsNone
onSentenceEndFunctionCallback method when engine returns the end of a sentence result, note: only applies to real-time speech recognitionNone
onStopFunctionCallback method when engine connection ends, note: only applies to real-time speech recognitionNone
onResultFunctionCallback method when engine returns the end of a sentence result, note: only applies to short speech recognitionNone
onWarningFunctionCallback method when engine returns a result warningNone
onErrorFunctionCallback method when engine returns a result errorNone

Note: When an onError callback is received, recognition will automatically stop and the connection will be disconnected, so there is no need to call the stopAsr() method again.

(2)Parameter Instance

new AsrEngine ({
    onStart: function (data,taskId){},
    onSentenceBegin: function (data,taskId){},
    onIntermediateResult: function (data,taskId){},
    onSentenceEnd: function (data,taskId) {},
    onStop: function (data,taskId) {},
    onResult: function (data,taskId) {},
    onWarning:function(status,msg,taskId){},
    onError:function(status,msg,taskId){}
})

5. Network

(1)Parameter Settings

NameTypeDescriptionDefault Value
onNetworkErrorFunctionNetwork monitoring callbackNone

Note: When an onNetworkError callback is received, recognition will automatically stop and the connection will be disconnected, so there is no need to call the stopAsr() method again.

(2)Parameter Instance

new AsrEngine ({ 
    onNetworkError:function(status,msg){},
})

4 Method Invocation

4.1 startAsr()

Function: Start recognition;

4.2 stopAsr()

Function: Stop recognition;

4.3 sentenceEnd()

Function: Force sentence ending;

Note: This method is only used for real-time speech recognition.

4.4 speakerStart(speaker_id)

Function: Customize speaker number;

Note: This method is only used for real-time speech recognition;

Parameter: @speaker_id The value of the customized speaker number, string type

  • The speaker_id supports up to 36 characters, with anything beyond that being cut off.
  • If the speaker_id parameter is not provided in the SpeakerStart event, the returned speaker_id will be empty.
  • The SpeakerStart event triggers a forced sentence ending. Therefore, please send the SpeakerStart event only before the speaker switches.

4.5 cancelAsr()

Function: Cancel recognition;

4.6 destroyAsr()

Function: Destroy instance;

5 SDK Download

H5 SDK