Development Guidelines
Basic Knowledge
Basic Terminology
Language
The Speech Synthesis service supports multiple languages, and you can specify the language of the text in your request. Our platform currently supports 3 synthesis languages: Japanese, English, and Chinese.
Voice
Currently, the Speech Synthesis service supports multiple voices in different languages. You can view the supported voices in the section below titled “Language and Voice Support.”
Sample Rate(sample rate)
The audio sample rate refers to the number of times a recording device samples an audio signal in one second. The higher the sample frequency, the more realistic and natural the sound reproduction will be. You can specify the sample rate for the synthesized audio. Currently, the speech synthesis service supports audio with sample rates of 8000Hz, 16000Hz, and 24000Hz.
Audio Format(format)
Supports PCM/WAV/MP3 formats. Note: WAV does not support streaming.
Practical Function
Voice Selection
Offering multiple languages and various voices to meet the synthetic needs in different scenarios. For the full list of voices, please refer to the "Voice List" below.
Speech Speed Adjustment
Supports custom playback speed, with a maximum of 4 times faster or slower than normal speed.
Pitch Adjustment
Supports setting the pitch of synthesized audio, with a maximum adjustment of up to 20 semitones higher or lower than the default output.
Volume Adjustment
Supports adjusting the volume of the synthesized audio, allowing the output volume to be increased up to 16db or decreased down to -96db.
Emotion/Style Selection
Supports adjustment of the emotion/style characteristics of the voice, such as happy, customer service, etc. For the full range of emotion/style options, please refer to the “Emotion/Style Configuration” below.
Language and Voice Support
The language code adopts the format of language-variant-script-region.
- language:language (ISO 639-1), all lowercase, for example, Chinese is zh, English is en
- variant (optional):Pronunciation or dialect (ISO 639-3), all lowercase, for example, Mandarian is cmn,Cantonese is yue
- script (optional):Writing variants (ISO 15924), The first letter is capitalized, for example, Simplified Chinese is Hans, and Traditional Chinese is Hant.
- region:The geographical area of language is used (ISO 3166), in all caps, such as China mainland CN, Hong Kong HK, and United State US.
The Speech Synthesis service currently supports the following languages and voices.
Voice List
Note: The Chinese voice mentioned above supports both Chinese and mixed Chinese-English scenarios. If you are using it for a purely English context, it is recommended to use an English voice.
| Language | Language Code | Scenario | Voice Name | VoiceID | Description | Emotion/Style Configuration |
|---|---|---|---|---|---|---|
| Japanese | ja-JP | General Scenario | Yuko | Yuko | General Female Voice | |
| Japanese | ja-JP | General Scenario | Norika | Norika | General Female Voice | |
| Japanese | ja-JP | General Scenario | Yosuke | Yosuke | General Male Voice | |
| American English | en-US | General Scenario | Julie | Julie | Vibrant Female Voice | |
| American English | en-US | General Scenario | John | John | Vibrant Male Voice | |
| Chinese (Mandarin) | zh-cmn-Hans-CN | General Scenario | Xiaohui | Xiaohui | General Female Voice | Supported |
| Chinese (Mandarin) | zh-cmn-Hans-CN | General Scenario | Ruoxuan | Ruoxuan | General Female Voice | Supported |
| Chinese (Mandarin) | zh-cmn-Hans-CN | General Scenario | Siyue | Siyue | General Male Voice | |
| Chinese (Mandarin) | zh-cmn-Hans-CN | Audio Reading | Mingcheng | Mingcheng | General Male Voice | Supported |
| Chinese (Mandarin) | zh-cmn-Hans-CN | Audio Reading | Haoxuan | Haoxuan | General Male Voice | Supported |
| Chinese (Mandarin) | zh-cmn-Hans-CN | Audio Reading | Sids | Sida | Gentle Male Voice | Supported |
| Chinese (Mandarin) | zh-cmn-Hans-CN | Audio Reading | Ziyi | Ziyi | Gentle Male Voice | Supported |
| Chinese (Mandarin) | zh-cmn-Hans-CN | Audio Reading | Ziyang | Ziyang | Ancient-style Male Voice | Supported |
| Chinese (Mandarin) | zh-cmn-Hans-CN | Intelligent Assistant | Xiaoyue | Xiaoyue | Ancient-style Male Voice | |
| Chinese (Mandarin) | zh-cmn-Hans-CN | Intelligent Assistant | Haoyu | Haoyu | Vibrant Male Voice | |
| Chinese (Mandarin) | zh-cmn-Hans-CN | Intelligent Assistant | Mengqi | Mengqi | Vibrant Female Voice | |
| Chinese (Mandarin) | zh-cmn-Hans-CN | Characterized Dubbing | Tongtong | Tongtong | Cute Childish (Boy) Voice | |
| Chinese (Mandarin) | zh-cmn-Hans-CN | Characterized Dubbing | Huiya | Huiya | Gentle Female Voice | |
| Chinese (Mandarin) | zh-cmn-Hans-CN | Characterized Dubbing | Mingyu | Mingyu | Gentle Male Voice |
Emotion/Style Configuration
Only voices that support multi-emotion and multi-style can call this ability.
Multi-emotion and multi-style need to be called through the configuration of emotion, while general emotion does not require the configuration of emotion.
The Voice platform currently supports 11 emotions and 15 styles. The emotion/style supported by each voice are not entirely consistent. For details, see the table below.
Emotion/Style:
pleased/sorry/annoyed/happy/sad/angry/scare(Scared) /hate(Discusted) /surprise(Surprised) /tear(Crying) /novel_dialog(Peaceful)/customer_service/professional/serious/narrator(Narrator - Relaxed) /narrator_immersive/comfort/lovey-dovey/conniving/tsundere/charming/storytelling/radio(Emotional Radio) /yoga/advertising/assistant
| Language | VoiceID | voice Name | Emotion/Style Configuration |
|---|---|---|---|
| Chinese (Mandarin) | Xiaohui | Xiaohui | pleased / sorry / annoyed / happy / sad / angry / scare / hate / surprise / tear / customer_service / professional / serious / comfort / lovey-dovey / conniving / tsundere / storytelling / radio / charming / yoga |
| Chinese (Mandarin) | Ruoxuan | Ruoxuan | happy / sad / angry / scare / hate / surprise / customer_service / comfort / storytelling / advertising / assistant |
| Chinese (Mandarin) | Mingcheng | Mingcheng | happy / sad / angry / scare / hate / surprise / tear / novel_dialog / narrator / narrator_immersive |
| Chinese (Mandarin) | Haoxuan | Haoxuan | happy / sad / angry / scare / hate / surprise / novel_dialog / narrator |
| Chinese (Mandarin) | Sida | Sida | happy / sad / angry / scare / hate / surprise / novel_dialog / narrator |
| Chinese (Mandarin) | Ziyi | Ziyi | happy / sad / angry / scare / hate / surprise / novel_dialog / narrator |
| Chinese (Mandarin) | Ziyang | Ziyang | happy / sad / angry / scare / hate / surprise / novel_dialog / narrator |