Short Speech Recognition
iOS SDK
Short Speech Recognition iOS SDK
- Supports iOS versions 11.0 and above.
- Before using the SDK, please read the Interface Protocol first. For details, refer to Cloud API.
1 Integration Steps
- Manual import: Drag
SpeechEvaluate.frameworkinto your project. Then, underGeneral -> Frameworks, Libraries, and Embedded Content, change theEmbedsetting forSpeechRecognitionSDK.frameworktoEmbed&Sign. - Ensure the following pods are installed: SocketRocket0.6.0, AFNetworking.
1.1 Add App-Related Permissions
Add Privacy - Microphone Usage Description to your project's info.plist file to add microphone access permissions.
1.2 Invocation Steps/Sample Code
In the file that requires the speech recognition function, please adhere to the delegate protocol EvalListener.
//Configure main parameters
SDKParams *params = [[SDKParams alloc] init];
params.appId = @"";
params.appSecret = @"";
params.sample_rate = 16000;//Sampling rate
params.format = @"pcm";//Audio encoding format
params.realtime = YES;//Whether it is Real-time Recognition, true represents Real-time Recognition, false represents Short Speech Recognition
params.langType = @"zh-cmn-Hans-CN";//Required Language Type
params.enable_intermediate_result = YES;//Whether to return intermediate results
params.enable_punctuation_prediction = YES;//Whether to perform ITN in post-processing
params.max_sentence_silence = 450;//Speech sentence breaking detection threshold. Silence longer than this threshold is considered as a sentence break. The valid parameter range is 200〜1200(ms), and the default value is 450ms
params.enable_words = YES;//Whether to enable returning word information1.2.1 Create a Speech Recognition Class and Grant Authorization
| Name | Type | Description |
|---|---|---|
| listener | id | Recognition class |
| params | SDKRecognitionParams | Parameters and configuration |
//Initialize the engine
SpeechRecognition *speechManger = [[SpeechRecognition alloc] init];
[speechManger setInitSDK:self params:params];
self.speechManger = speechManger;1.2.2 Callback Methods
| Name | Type | Description |
|---|---|---|
| onRecognitionStart | String | Callback method when engine connection starts |
| onRecognitionResult | String | Callback method when engine returns content results |
| onRecognitionRealtimeResult | String | Callback method when engine returns intermediate results |
| onRecognitionWarning | String | Callback method when engine returns a result warning |
| onRecognitionError | String | Callback method when engine returns a result error |
/**
* Return intermediate recognition results
*/
- (void) onRecognitionRealtimeResult: (NSString *) result;
/**
* Return recognition results
*/
- (void) onRecognitionResult: (NSString *) result;
/**
* Indicates successful start of recording
*/
- (void) onRecognitionStart: (NSString *) taskId;
/**
* Indicates successful end of recognition
*/
- (void) onRecognitionStop;
/**
* Return real-time recorded audio data
*/
- (void) onRecognitionGetAudio: (NSData *)data;
/**
* Error callback, return error code and message
*/
- (void) onRecognitionError: (NSString *)code msg:(NSString*)msg taskId:(nullable NSString*)taskId;
/**
* Warning callback
*/
- (void) onRecognitionWarning: (NSString *)code msg:(NSString*)msg taskId:(nullable NSString*)taskId;1.2.3 Parameter Description
| Parameter | Type | Required | Description | Default Value |
|---|---|---|---|---|
| lang_type | String | Yes | Language option | Required |
| format | String | No | Audio encoding format | pcm |
| sample_rate | Integer | No | Audio sampling rate | 16000 |
| enable_intermediate_result | Boolean | No | Whether to return intermediate recognition results | false |
| enable_punctuation_prediction | Boolean | No | Whether to add punctuation in post-processing | false |
| enable_inverse_text_normalization | Boolean | No | Whether to perform ITN in post-processing | false |
| max_sentence_silence | Integer | No | Speech sentence breaking detection threshold. Silence longer than this threshold is considered as a sentence break. The valid parameter range is 200~1200. Unit: Milliseconds | 450 |
| enable_words | Boolean | No | Whether to return word information | false |
| enable_modal_particle_filter | Boolean | No | Whether to enable modal particle filtering | false |
| hotwords_id | String | No | Hotwords ID | None |
| hotwords_weight | Float | No | Hotwords weight, the range is [0.1, 1.0] | 0.4 |
| correction_words_id | String | No | Forced correction vocabulary ID Supports multiple IDs, separated by a vertical bar; all indicates using all IDs. | None |
| forbidden_words_id | String | No | Forbidden words ID Supports multiple IDs, separated by a vertical bar |; all indicates using all IDs. | None |
1.2.4 Start/Stop Recognition
<1>Start Recognition (Internal Recording by SDK)
[self.speechManger startRecording];
End Recognition
[self.speechManger stopRecording];
<2>File Recognition (Directly pass the path of the audio file, local path)
[self.speechManger startRecognitionOralWithWavPath:@"wav audio path"];
<3>Audio Data Recognition (External recording by SDK, or file converted to NSData for recognition)
- (void)doStart:(FinishBlock)finishBlock;
- (BOOL)doSetData:(NSData *) data isLast:(bool)isLast;
Call Method
[self.speechManger doStart:^(_Bool success) {
if (success) {
If(Last Segment With Audio) {
[self.speechManger doSetData:data isLast:YES];
}else{
[self.speechManger doSetData:data isLast:NO];
}
}
}];1.2.5 Force Sentence Ending
[self.speechManger sentenceEnd];1.2.6 Customize Speaker
[self.speechManger speakerStart:@"speaker_name"];