Development Guidelines

The Speech Synthesis service supports multiple languages, and you can specify the language of the text in your request. Our platform currently supports 3 synthesis languages: Japanese, English, and Chinese.

Voice

Currently, the Speech Synthesis service supports multiple voices in different languages. You can view the supported voices in the section below titled “Language and Voice Support.”

Sample Rate（sample rate）

The audio sample rate refers to the number of times a recording device samples an audio signal in one second. The higher the sample frequency, the more realistic and natural the sound reproduction will be. You can specify the sample rate for the synthesized audio. Currently, the speech synthesis service supports audio with sample rates of 8000Hz, 16000Hz, and 24000Hz.

Audio Format（format）

Supports PCM/WAV/MP3 formats. Note: WAV does not support streaming.

Practical Function

Voice Selection

Offering multiple languages and various voices to meet the synthetic needs in different scenarios. For the full list of voices, please refer to the "Voice List" below.

Speech Speed Adjustment

Supports custom playback speed, with a maximum of 4 times faster or slower than normal speed.

Pitch Adjustment

Supports setting the pitch of synthesized audio, with a maximum adjustment of up to 20 semitones higher or lower than the default output.

Volume Adjustment

Supports adjusting the volume of the synthesized audio, allowing the output volume to be increased up to 16db or decreased down to -96db.

Emotion/Style Selection

Supports adjustment of the emotion/style characteristics of the voice, such as happy, customer service, etc. For the full range of emotion/style options, please refer to the “Emotion/Style Configuration” below.

Language and Voice Support

The language code adopts the format of language-variant-script-region.

language：language (ISO 639-1), all lowercase, for example, Chinese is zh, English is en
variant (optional)：Pronunciation or dialect (ISO 639-3), all lowercase, for example, Mandarian is cmn，Cantonese is yue
script (optional)：Writing variants (ISO 15924), The first letter is capitalized, for example, Simplified Chinese is Hans, and Traditional Chinese is Hant.
region：The geographical area of language is used (ISO 3166), in all caps, such as China mainland CN, Hong Kong HK, and United State US.

The Speech Synthesis service currently supports the following languages and voices.

Voice List

Note: The Chinese voice mentioned above supports both Chinese and mixed Chinese-English scenarios. If you are using it for a purely English context, it is recommended to use an English voice.

Language	Language Code	Scenario	Voice Name	VoiceID	Description	Emotion/Style Configuration
Japanese	ja-JP	General Scenario	Yuko	Yuko	General Female Voice
Japanese	ja-JP	General Scenario	Norika	Norika	General Female Voice
Japanese	ja-JP	General Scenario	Yosuke	Yosuke	General Male Voice
American English	en-US	General Scenario	Julie	Julie	Vibrant Female Voice
American English	en-US	General Scenario	John	John	Vibrant Male Voice
Chinese (Mandarin)	zh-cmn-Hans-CN	General Scenario	Xiaohui	Xiaohui	General Female Voice	Supported
Chinese (Mandarin)	zh-cmn-Hans-CN	General Scenario	Ruoxuan	Ruoxuan	General Female Voice	Supported
Chinese (Mandarin)	zh-cmn-Hans-CN	General Scenario	Siyue	Siyue	General Male Voice
Chinese (Mandarin)	zh-cmn-Hans-CN	Audio Reading	Mingcheng	Mingcheng	General Male Voice	Supported
Chinese (Mandarin)	zh-cmn-Hans-CN	Audio Reading	Haoxuan	Haoxuan	General Male Voice	Supported
Chinese (Mandarin)	zh-cmn-Hans-CN	Audio Reading	Sids	Sida	Gentle Male Voice	Supported
Chinese (Mandarin)	zh-cmn-Hans-CN	Audio Reading	Ziyi	Ziyi	Gentle Male Voice	Supported
Chinese (Mandarin)	zh-cmn-Hans-CN	Audio Reading	Ziyang	Ziyang	Ancient-style Male Voice	Supported
Chinese (Mandarin)	zh-cmn-Hans-CN	Intelligent Assistant	Xiaoyue	Xiaoyue	Ancient-style Male Voice
Chinese (Mandarin)	zh-cmn-Hans-CN	Intelligent Assistant	Haoyu	Haoyu	Vibrant Male Voice
Chinese (Mandarin)	zh-cmn-Hans-CN	Intelligent Assistant	Mengqi	Mengqi	Vibrant Female Voice
Chinese (Mandarin)	zh-cmn-Hans-CN	Characterized Dubbing	Tongtong	Tongtong	Cute Childish (Boy) Voice
Chinese (Mandarin)	zh-cmn-Hans-CN	Characterized Dubbing	Huiya	Huiya	Gentle Female Voice
Chinese (Mandarin)	zh-cmn-Hans-CN	Characterized Dubbing	Mingyu	Mingyu	Gentle Male Voice

Emotion/Style Configuration

Only voices that support multi-emotion and multi-style can call this ability.

Multi-emotion and multi-style need to be called through the configuration of emotion, while general emotion does not require the configuration of emotion.

The Voice platform currently supports 11 emotions and 15 styles. The emotion/style supported by each voice are not entirely consistent. For details, see the table below.

Emotion/Style:

pleased / sorry / annoyed / happy / sad / angry / scare（Scared） / hate（Discusted） / surprise（Surprised） / tear（Crying） / novel_dialog（Peaceful）/ customer_service / professional / serious / narrator（Narrator - Relaxed） / narrator_immersive / comfort / lovey-dovey / conniving / tsundere / charming / storytelling / radio（Emotional Radio） / yoga / advertising / assistant

Language	VoiceID	voice Name	Emotion/Style Configuration
Chinese (Mandarin)	Xiaohui	Xiaohui	pleased / sorry / annoyed / happy / sad / angry / scare / hate / surprise / tear / customer_service / professional / serious / comfort / lovey-dovey / conniving / tsundere / storytelling / radio / charming / yoga
Chinese (Mandarin)	Ruoxuan	Ruoxuan	happy / sad / angry / scare / hate / surprise / customer_service / comfort / storytelling / advertising / assistant
Chinese (Mandarin）	Mingcheng	Mingcheng	happy / sad / angry / scare / hate / surprise / tear / novel_dialog / narrator / narrator_immersive
Chinese (Mandarin)	Haoxuan	Haoxuan	happy / sad / angry / scare / hate / surprise / novel_dialog / narrator
Chinese (Mandarin)	Sida	Sida	happy / sad / angry / scare / hate / surprise / novel_dialog / narrator
Chinese (Mandarin)	Ziyi	Ziyi	happy / sad / angry / scare / hate / surprise / novel_dialog / narrator
Chinese (Mandarin)	Ziyang	Ziyang	happy / sad / angry / scare / hate / surprise / novel_dialog / narrator

Basic Knowledge

Basic Terminology

Language