Logo
Short Text to Speech

Python SDK

TTS Python SDK

1. SDK Integration Guide

Download the corresponding SDK file.

The current SDK only supports Python 3.4 and above.

1.1 SDK File Description

File/DirectoryDescription
speech_synSDK-related files
├─ demo.pyExample code
pythonSDK.mdOperation manual
README.rstDocumentation
setup.pyInstallation file

1.2 SDK Installation

  1. If you haven't installed the Python package management tool setuptools, you can install it using the following command.
pip install setuptools
  1. In the SDK root directory, run the following command to package the files.
python setup.py bdist_egg
  1. Run the following command to install the SDK.
python setup.py install
  1. If the following information is displayed, it indicates that the installation is successful, and you can call the synthesis class in your project.

Finished processing dependencies for hy-python-syn-sdk==1.0.0

-After installation, three files will be generated: build、dist and hy_python_syn_sdk.egg-info.

1.3 Using the Demo

speech_syn/demo/synthesizer_demo.py is a demo for real-time speech synthesis. You can run it directly.

2 Parameter Description and Code Examples

2.1 Key Interface Description

The real-time speech synthesis SDK mainly uses the Transcriber class to complete the task, and the authorization is completed using the Token class. The code invocation steps are as follows:

  1. Obtain a token by calling the get_token() method in the SpeechClient class.
  2. Create an instance of SpeechTranscriber .
  3. Create an instance of Callback.
  4. Set parameters by calling methods such as set_token() on the SpeechTranscriber instance.
  5. Establish a connection with the server by calling the start() method of the SpeechTranscriber instance.
  6. Send audio by calling the send() method of the SpeechTranscriber instance.
  7. Stop the transmission by calling the stop() method of the SpeechTranscriber instance.
  8. Disconnect from the server by calling the close() method of the SpeechTranscriber instance.

2.2 Parameter Description

Parameter NameDescriptionDefault Value
textText to be synthesized, length limit: 1024 bytes (UTF-8 encoding)Required
lang_typeLanguage optionRequired
voiceVoiceIDJapanese: Yuko
English:Julie
Chinese:Xiaohui
formatAudio encode format, wav / pcm / mp3, note: wav does not support streamingpcm
sample_rateAudio sample rate, options are 8000, 16000, 2400024000
volumeVolume, parameter range [0.1, 3], usually retaining one decimal place is sufficient1
speech_rateSpeech rate, parameter range [0.2, 3], usually retaining one decimal place is sufficient1
pitch_ratioPitch rate, parameter range [0.1, 3], usually retaining one decimal place is sufficient1
silence_durationSilence duration at the end of the sentence, in ms125
enable_timestampTimestamp related, when passed as true, it indicates enabling, and the original text’s timestamps can be returned. Note: multiple consecutive punctuation or spaces in the original text will still be processed, but this will not affect the continuity of the timestampsfalse
emotionEmotion/styleNo

The SDK return parameters are specified in the interface protocol

2.3 Example Code

The complete code can be found in the speech_syn/demo/demo.py file in the SDK.

# -*- coding: utf-8 -*-
if __name__ == "__main__":
    client = speech_syn.SpeechClient()
    # Set the level of output log information: debug, info, warning, error
    client.set_log_level('INFO')
    # Type your app_id and app_secret
    app_id = "a8d54833-01a3-4451-93cb-3f2bf37911ff"  # your app id
    app_secret = "GxM30WM6qN"  # your app secret
    # Type your text and lang_type
    text = "Today is sunny. Have you eaten?"
    # Optional: zh-cmn-Hans-CN en-US ja-JP
    lang_type = 'ja-JP'

    # Optional: Set the parameters of the synthesis
    format = ''                        # Default: MP3, Optional: PCM, 
    voice = ''                          # Default: Xiaohui 
    sampe_rate = ''                     # Default: 24K, Optional: 16K, 8k
    volume = ''                         # Default: 1.0, Optional: 0.1-3.0
    speech_rate = ''                    # Default: 1.0, Optional: 0.2-3.0
    pitch_ratio = ''                    # Default: 1.0, Optional: 0.1-3.0
    emotion = ''                        # Default: None
    silence_duration = ''               # Default:125
    enable_timestamp = False               # Default: False
    audio_name = f'syAudio.mp3'
    solution(client, app_id, app_secret, text, audio_name, lang_type, format, 
             sampe_rate, voice, volume, speech_rate, pitch_ratio, emotion, silence_duration,
             enable_timestamp)

3. SDK Download

Python SDK