Short Text to Speech
Python SDK
TTS Python SDK
1. SDK Integration Guide
Download the corresponding SDK file.
The current SDK only supports Python 3.4 and above.
1.1 SDK File Description
| File/Directory | Description |
|---|---|
| speech_syn | SDK-related files |
| ├─ demo.py | Example code |
| pythonSDK.md | Operation manual |
| README.rst | Documentation |
| setup.py | Installation file |
1.2 SDK Installation
- If you haven't installed the Python package management tool
setuptools, you can install it using the following command.
pip install setuptools- In the SDK root directory, run the following command to package the files.
python setup.py bdist_egg- Run the following command to install the SDK.
python setup.py install- If the following information is displayed, it indicates that the installation is successful, and you can call the synthesis class in your project.
Finished processing dependencies for hy-python-syn-sdk==1.0.0
-After installation, three files will be generated: build、dist and hy_python_syn_sdk.egg-info.
1.3 Using the Demo
speech_syn/demo/synthesizer_demo.py is a demo for real-time speech synthesis. You can run it directly.
2 Parameter Description and Code Examples
2.1 Key Interface Description
The real-time speech synthesis SDK mainly uses the Transcriber class to complete the task, and the authorization is completed using the Token class. The code invocation steps are as follows:
- Obtain a token by calling the
get_token()method in theSpeechClientclass. - Create an instance of
SpeechTranscriber. - Create an instance of
Callback. - Set parameters by calling methods such as
set_token()on theSpeechTranscriberinstance. - Establish a connection with the server by calling the
start()method of theSpeechTranscriberinstance. - Send audio by calling the
send()method of theSpeechTranscriberinstance. - Stop the transmission by calling the
stop()method of theSpeechTranscriberinstance. - Disconnect from the server by calling the
close()method of theSpeechTranscriberinstance.
2.2 Parameter Description
| Parameter Name | Description | Default Value |
|---|---|---|
| text | Text to be synthesized, length limit: 1024 bytes (UTF-8 encoding) | Required |
| lang_type | Language option | Required |
| voice | VoiceID | Japanese: Yuko English:Julie Chinese:Xiaohui |
| format | Audio encode format, wav / pcm / mp3, note: wav does not support streaming | pcm |
| sample_rate | Audio sample rate, options are 8000, 16000, 24000 | 24000 |
| volume | Volume, parameter range [0.1, 3], usually retaining one decimal place is sufficient | 1 |
| speech_rate | Speech rate, parameter range [0.2, 3], usually retaining one decimal place is sufficient | 1 |
| pitch_ratio | Pitch rate, parameter range [0.1, 3], usually retaining one decimal place is sufficient | 1 |
| silence_duration | Silence duration at the end of the sentence, in ms | 125 |
| enable_timestamp | Timestamp related, when passed as true, it indicates enabling, and the original text’s timestamps can be returned. Note: multiple consecutive punctuation or spaces in the original text will still be processed, but this will not affect the continuity of the timestamps | false |
| emotion | Emotion/style | No |
The SDK return parameters are specified in the interface protocol
2.3 Example Code
The complete code can be found in the speech_syn/demo/demo.py file in the SDK.
# -*- coding: utf-8 -*-
if __name__ == "__main__":
client = speech_syn.SpeechClient()
# Set the level of output log information: debug, info, warning, error
client.set_log_level('INFO')
# Type your app_id and app_secret
app_id = "a8d54833-01a3-4451-93cb-3f2bf37911ff" # your app id
app_secret = "GxM30WM6qN" # your app secret
# Type your text and lang_type
text = "Today is sunny. Have you eaten?"
# Optional: zh-cmn-Hans-CN en-US ja-JP
lang_type = 'ja-JP'
# Optional: Set the parameters of the synthesis
format = '' # Default: MP3, Optional: PCM,
voice = '' # Default: Xiaohui
sampe_rate = '' # Default: 24K, Optional: 16K, 8k
volume = '' # Default: 1.0, Optional: 0.1-3.0
speech_rate = '' # Default: 1.0, Optional: 0.2-3.0
pitch_ratio = '' # Default: 1.0, Optional: 0.1-3.0
emotion = '' # Default: None
silence_duration = '' # Default:125
enable_timestamp = False # Default: False
audio_name = f'syAudio.mp3'
solution(client, app_id, app_secret, text, audio_name, lang_type, format,
sampe_rate, voice, volume, speech_rate, pitch_ratio, emotion, silence_duration,
enable_timestamp)