Python SDK

1. SDK Integration Guide

Download the corresponding SDK file.

The current SDK only supports Python 3.4 and above.

1.1 SDK File Description

File/Directory	Description
speech_syn	SDK-related files
├─ demo.py	Example code
pythonSDK.md	Operation manual
README.rst	Documentation
setup.py	Installation file

1.2 SDK Installation

If you haven't installed the Python package management tool setuptools, you can install it using the following command.

pip install setuptools

In the SDK root directory, run the following command to package the files.

python setup.py bdist_egg

Run the following command to install the SDK.

python setup.py install

If the following information is displayed, it indicates that the installation is successful, and you can call the synthesis class in your project.

Finished processing dependencies for hy-python-syn-sdk==1.0.0

-After installation, three files will be generated: build、dist and hy_python_syn_sdk.egg-info.

1.3 Using the Demo

speech_syn/demo/synthesizer_demo.py is a demo for real-time speech synthesis. You can run it directly.

2 Parameter Description and Code Examples

2.1 Key Interface Description

The real-time speech synthesis SDK mainly uses the Transcriber class to complete the task, and the authorization is completed using the Token class. The code invocation steps are as follows:

Obtain a token by calling the get_token() method in the SpeechClient class.
Create an instance of SpeechTranscriber .
Create an instance of Callback.
Set parameters by calling methods such as set_token() on the SpeechTranscriber instance.
Establish a connection with the server by calling the start() method of the SpeechTranscriber instance.
Send audio by calling the send() method of the SpeechTranscriber instance.
Stop the transmission by calling the stop() method of the SpeechTranscriber instance.
Disconnect from the server by calling the close() method of the SpeechTranscriber instance.

2.2 Parameter Description

Parameter Name	Description	Default Value
text	Text to be synthesized, length limit: 1024 bytes (UTF-8 encoding)	Required
lang_type	Language option	Required
voice	VoiceID	Japanese: Yuko English：Julie Chinese：Xiaohui
format	Audio encode format, wav / pcm / mp3, note: wav does not support streaming	pcm
sample_rate	Audio sample rate, options are 8000, 16000, 24000	24000
volume	Volume, parameter range [0.1, 3], usually retaining one decimal place is sufficient	1
speech_rate	Speech rate, parameter range [0.2, 3], usually retaining one decimal place is sufficient	1
pitch_ratio	Pitch rate, parameter range [0.1, 3], usually retaining one decimal place is sufficient	1
silence_duration	Silence duration at the end of the sentence, in ms	125
enable_timestamp	Timestamp related, when passed as true, it indicates enabling, and the original text’s timestamps can be returned. Note: multiple consecutive punctuation or spaces in the original text will still be processed, but this will not affect the continuity of the timestamps	false
emotion	Emotion/style	No

The SDK return parameters are specified in the interface protocol

2.3 Example Code

The complete code can be found in the speech_syn/demo/demo.py file in the SDK.

# -*- coding: utf-8 -*-
if __name__ == "__main__":
    client = speech_syn.SpeechClient()
    # Set the level of output log information: debug, info, warning, error
    client.set_log_level('INFO')
    # Type your app_id and app_secret
    app_id = "a8d54833-01a3-4451-93cb-3f2bf37911ff"  # your app id
    app_secret = "GxM30WM6qN"  # your app secret
    # Type your text and lang_type
    text = "Today is sunny. Have you eaten?"
    # Optional: zh-cmn-Hans-CN en-US ja-JP
    lang_type = 'ja-JP'

    # Optional: Set the parameters of the synthesis
    format = ''                        # Default: MP3, Optional: PCM, 
    voice = ''                          # Default: Xiaohui 
    sampe_rate = ''                     # Default: 24K, Optional: 16K, 8k
    volume = ''                         # Default: 1.0, Optional: 0.1-3.0
    speech_rate = ''                    # Default: 1.0, Optional: 0.2-3.0
    pitch_ratio = ''                    # Default: 1.0, Optional: 0.1-3.0
    emotion = ''                        # Default: None
    silence_duration = ''               # Default：125
    enable_timestamp = False               # Default: False
    audio_name = f'syAudio.mp3'
    solution(client, app_id, app_secret, text, audio_name, lang_type, format, 
             sampe_rate, voice, volume, speech_rate, pitch_ratio, emotion, silence_duration,
             enable_timestamp)

3. SDK Download

Python SDK

On this page