Logo
Short Speech Recognition

Python SDK

Short Speech Recognition Python SDK

1 Summary

The Python SDK for voice interaction services.

Supported services: One sentence recognition, real-time speech recognition.

1.1 SDK files description

File/DirectoryDescription
speech_recSDK related files
demoExample code
 ├─ transcriber_demo.pyReal time speech recognition example code
 ├─ recognizer_demo.pyOne sentence recognition example code
 ├─ demo.wavChinese Mandarin Sample Audio (WAV Format)
 ├─ demo.mp3Chinese Mandarin Sample Audio (MP3 Format)
setup.pyinstall file
README-JA.mdJapanese Operator's Manual
README-EN.mdEnglish Operator's Manual
The recognition results of the test audio provided in the SDK are consistent. The default audio used is MP3 format. If the incoming audio is in WAV or other formats, it will be converted to MP3 format.

2 Operating environment

Python3.4 or later, ffmpeg. It is recommended to create a separate python runtime environment, otherwise version conflicts may occur.

3 Installation method

  1. Ensure that the Python package management tool setuptools is installed. If it is not installed, install it.On the command line, type:
$ pip install setuptools
  1. Unzip the SDK, go to the folder (where the setup.py file is located), and run the following command in the SDK directory:
$ python setup.py install
  • The above pip and python commands correspond to Python3.
  • If the following information is displayed, the installation is successful: Finished processing dependencies for speech-python-rec-sdk==1.0.0.8
  • After installation, the build, dist, and speech_python_rec_sdk.egg-info files are generated.

3.Modify the concrete parameters of the file in demo:

//recognizer_demo.py and transcriber_demo.py are execution files for one-sentence recognition and real-time speech recognition, respectively.

//Enter the appID that you get when you purchase a service in the platform
app_id = '#####'
  
//Enter appSecret, which you get when you purchase a service in the platform
app_secret = '#####'
  
//Enter the path of the voice file to be identified. Change it to the path of the customized audio file
audio_path = '####'
  
//Input language, format see platform documentation center-Speech recognition-Development Guide
lang_type = 'ja-JP'
  1. Run the file recognizer_demo.py or transcriber_demo.py to recognize the speech. If the token fails or expires, please delete the local SpeechRecognizer_token.txt file or SpeechTranscriber_token.txt file and try again. If it is still outdated, please contact the technical staff.

To run the command in the demo directory, set parameters such as app_id corresponding to python files in the demo.

$ python recognizer_demo.py 
$ python transcriber_demo.py 

#After the successful run, the SpeechRecognizer_token.txt or SpeechTranscriber_token.txt files are generated in the path where the demo is running.

Note: - If "timestamp timeout" or "timestamp is greater than the current time" is displayed, the local time is inconsistent with the server time. According to the time difference in the error message, Modify the _token.py file in the speech-python-rec file, modify the timestamp = int(t) line code appropriately, timestamp = int(t)+1 or 2,3,4, etc., or timestamp = int(t)-1 or 2,3,4, etc.

  • After _token.py is modified, the modification takes effect only after it is created again. The specific steps are as follows:

Delete build, dist, and speech_python_rec_sdk.egg-info files created and generated in the SDK directory.

To uninstall and reinstall the SDK, run $pip uninstall speech-python-rec-sdk and repeat steps 2,3,4.

4 Parameter description

4.1 Short Speech Recognition Demo use

speech_rec/demo/recognizer_demo.py For a sentence to identify the demo, run directly.

4.1.1 Key interface description

In one sentence, the recognition SDK is mainly completed by using the Recognizer class, and the authorization is completed by using the Token class.

  • Acquire the token by calling the get_token() method in the SpeechClient class.
  • Create an instance of the SpeechRecognizer.
  • Create theCallback instance.
  • Call the set_token() method of the SpeechRecognizer instance to set the parameters.
  • Connect to the server by calling the start() method of the SpeechRecognizer instance.
  • Call the SpeechRecognizer instance's send() method to send audio.
  • Call the stop() method of the SpeechRecognizer instance to stop the transmission.
  • Disconnect from the server by calling the close() method of the SpeechRecognizer instance.

4.1.2 Parameter description

ParameterTypeRequiredDescriptionDefault Value
lang_typeStringYesLanguage optionRequired
formatStringNoAudio encoding formatpcm
sample_rateIntegerNoAudio sampling rate
When sample_rate=‘8000’
field parameter field is required, and field=‘call-center’
16000
enable_intermediate_resultBooleanNoWhether to return intermediate recognition resultstrue
enable_punctuation_predictionBooleanNoWhether to add punctuation in post-processingtrue
enable_inverse_text_normalizationBooleanNoWhether to perform ITN in post-processingtrue
max_sentence_silenceIntegerNoSpeech sentence breaking detection threshold. Silence longer than this threshold is considered as a sentence break. The valid parameter range is 200~1200. Unit: Millisecondssample_rate=16000:800
sample_rate=8000:250
enable_wordsBooleanNoWhether to return word informationfalse
enable_intermediate_wordsBooleanNoWhether to return intermediate result word informationfalse
enable_modal_particle_filterBooleanNoWhether to enable modal particle filteringtrue
hotwords_listList<String>NoOne-time hotwords list, effective only for the current connection. If both hotwords_list and hotwords_id parameters exist, hotwords_list will be used. Up to 100 entries can be provided at a time.None
hotwords_idStringNoHotwords IDNone
hotwords_weightFloatNoHotwords weight, the range of values [0.1, 1.0]0.4
correction_words_idStringNoForced correction vocabulary ID
Supports multiple IDs, separated by a vertical bar |; all indicates using all IDs.
None
forbidden_words_idStringNoForbidden words ID
Supports multiple IDs, separated by a vertical bar |; all indicates using all IDs.
None
fieldStringNoField
general: supports the sample_rate of 16000Hz
call-center: supports the sample_rate of 8000Hz
None
audio_urlStringNoReturned audio format (stored on the platform for only 30 days)
mp3: Returns a url for the audio in mp3 format
pcm: Returns a url for the audio in pcm format
wav: Returns a url for the audio in wav format
None
connect_timeoutIntegerNoConnection timeout (seconds), range: 5-6010
gainIntegerNoAmplitude gain factor, range [1, 20]
1 indicates no amplification, 2 indicates the original amplitude doubled (amplified by 1 times), and so on
sample_rate=16000:1
sample_rate=8000:2
max_suffix_silenceIntegerNoPost-speech silence detection threshold (in seconds), with a range of 1 to 10 seconds. If the duration of silence at the end of a sentence exceeds this threshold, recognition will automatically stop
When the parameter value is set to 0 or the parameter is not provided, the post-speech silence detection feature is disabled. Special case: If set to -1, recognition will stop immediately when the speech ends.
0
user_idStringNoCustom user information, which will be returned unchanged in the response message, with a maximum length of 36 charactersNone
enable_save_logBooleanNoProvide log of audio data and recognition results to help us improve the quality of our products and services.true

4.1.3 Short Speech Recognition sample code

For the full code, see the speech_python_rec/demo/recognizer_demo.pyfile in the SDK.

# -*- coding: utf-8 -*-
import os
import time
import threading
import speech_rec
from speech_rec.callbacks import SpeechRecognizerCallback
from speech_rec.parameters import DefaultParameters

token = None
expire_time = 7  # Expiration time

class Callback(SpeechRecognizerCallback):
    """
    The parameters of the constructor are not required. You can add them as needed
    The name parameter in the example can be used as the audio file name to be recognized for distinguishing in multithreading
    """

    def __init__(self, name='SpeechRecognizer'):
        self._name = name

    def started(self, message):
        print('MyCallback.OnRecognitionStarted: %s' % message)

    def result_changed(self, message):
        print('MyCallback.OnRecognitionResultChanged: file: %s, task_id: %s, payload: %s' % (
            self._name, message['header']['task_id'], message['payload']))

    def completed(self, message):
        print('MyCallback.OnRecognitionCompleted: file: %s, task_id:%s, payload:%s' % (
            self._name, message['header']['task_id'], message['payload']))

    def task_failed(self, message):
        print(message)

    def warning_info(self, message):
        print(message)

    def channel_closed(self):
        print('MyCallback.OnRecognitionChannelClosed')

def solution(client, app_id, app_secret, audio_path, lang_type, kwargs):
    """
    Recognize speech,single thread
    :param client: SpeechClient
    :param app_id: Your app_id
    :param app_secret: Your app_secret
    :param audio_path: Audio path
    :param lang_type: Language type
    """
    assert os.path.exists(audio_path), "Audio file path error, please check your audio path."
    sample_rate = kwargs.get("sample_rate", DefaultParameters.SAMPLE_RATE_16K)
    each_audio_format = kwargs.get("audio_format", DefaultParameters.MP3)
    field_ = kwargs.get("field", DefaultParameters.FIELD)

    if judging_expire_time(app_id, app_secret, expire_time):
        callback = Callback(audio_path)
        recognizer = client.create_recognizer(callback)
        recognizer.set_app_id(app_id)
        recognizer.set_token(token)
        # fixme You can customize the configuration according to the official website documentation
        payload = {
            "lang_type": lang_type,
            "format": each_audio_format,
            "field": field_,
            "sample_rate": sample_rate,
        }
        recognizer._payload.update(**payload)
        try:
            ret = recognizer.start()
            if ret < 0:
                return ret
            print('sending audio...')
            cnt = 0
            with open(audio_path, 'rb') as f:
                audio = f.read(7680)
                while audio:
                    cnt += 0.24
                    ret = recognizer.send(audio)
                    if ret < 0:
                        break
                    time.sleep(0.24)
                    audio = f.read(7680)
            recognizer.stop()
        except Exception as ee:
            print(f"send ee:{ee}")
        finally:
            recognizer.close()
    else:
        print("token expired")


def judging_expire_time(app_id, app_secret, extime):
    global token
    new_time = time.time()
    token_file = "SpeechRecognizer_token.txt"
    if not os.path.exists(token_file):
        client.get_token(app_id, app_secret, token_file)
    with open(token_file, "r", encoding="utf-8") as fr:
        token_info = eval(fr.read())
    old_time = token_info['time']
    token = token_info['token']
    flag = True
    if new_time - old_time > 60 * 60 * 24 * (extime - 1):
        flag, _ = client.get_token(app_id, app_secret, token_file)
        if flag:
            flag = True
            pass
        else:
            for i in range(7):
                flag, _ = client.get_token(app_id, app_secret, token_file)
                if flag is not None:
                    flag = True
                    break
    return flag


def channels_split_solution(audio_path, right_path, left_path, **kwargs):
    client = kwargs.get('client')
    appid = kwargs.get('app_id')
    appsecret = kwargs.get('app_secret')
    langtype = kwargs.get('lang_type')
    remove_audio = kwargs.get('rm_audio', True)
    client.auto_split_audio(audio_path, right_path, left_path)
    thread_list = []
    thread_r = threading.Thread(target=solution, args=(client, appid, appsecret, right_path, langtype, kwargs))
    thread_list.append(thread_r)
    thread_l = threading.Thread(target=solution, args=(client, appid, appsecret, left_path, langtype, kwargs))
    thread_list.append(thread_l)
    for thread in thread_list:
        thread.start()
    for thread in thread_list:
        thread.join()
    if remove_audio:
        os.remove(right_path)
        os.remove(left_path)
    pass


if __name__ == "__main__":
    client = speech_rec.SpeechClient()
    # Set the level of output log information:DEBUG、INFO、WARNING、ERROR
    client.set_log_level('INFO')
    # Type your app_id and app_secret
    app_id = ""  # your app id
    app_secret = ""  # your app secret
    audio_path = ""  # audio path
    lang_type = ""  # lang type
    field = ""  # field
    sample_rate = 16000  # sample rate [int] 16000 or 8000
    audio_format = ""  # audio format
    assert app_id and app_secret and audio_path and lang_type and field and sample_rate and audio_format, "Please check args"
    channel = client.get_audio_info(audio_path)['channel']
    # fixme This is just a simple example, please modify it according to your needs.
    multi = False
    process_num = 4
    if channel == 1:
        kwargs = {
            "field": field,
            "sample_rate": sample_rate,
            "audio_format":audio_format
        }
        solution(client, app_id, app_secret, audio_path, lang_type, kwargs)

    elif channel == 2:
        # Dual channel 8K audio solution
        channels_split_solution(audio_path=audio_path,
                                left_path=f"left.{audio_format}",
                                right_path=f"right.{audio_format}",
                                client=client,
                                app_id=app_id,
                                app_secret=app_secret,
                                lang_type=lang_type,
                                field=field,
                                sample_rate=sample_rate,
                                audio_format=audio_format)

5 SDK Download

Python SDK