Logo

Development Guidelines

Basic Knowledge

Basic Terminology

Language

The Speech Synthesis service supports multiple languages, and you can specify the language of the text in your request. Our platform currently supports 3 synthesis languages: Japanese, English, and Chinese.

Voice

Currently, the Speech Synthesis service supports multiple voices in different languages. You can view the supported voices in the section below titled “Language and Voice Support.”

Sample Rate(sample rate)

The audio sample rate refers to the number of times a recording device samples an audio signal in one second. The higher the sample frequency, the more realistic and natural the sound reproduction will be. You can specify the sample rate for the synthesized audio. Currently, the speech synthesis service supports audio with sample rates of 8000Hz, 16000Hz, and 24000Hz.

Audio Format(format)

Supports PCM/WAV/MP3 formats. Note: WAV does not support streaming.

Practical Function

Voice Selection

Offering multiple languages and various voices to meet the synthetic needs in different scenarios. For the full list of voices, please refer to the "Voice List" below.

Speech Speed Adjustment

Supports custom playback speed, with a maximum of 4 times faster or slower than normal speed.

Pitch Adjustment

Supports setting the pitch of synthesized audio, with a maximum adjustment of up to 20 semitones higher or lower than the default output.

Volume Adjustment

Supports adjusting the volume of the synthesized audio, allowing the output volume to be increased up to 16db or decreased down to -96db.

Emotion/Style Selection

Supports adjustment of the emotion/style characteristics of the voice, such as happy, customer service, etc. For the full range of emotion/style options, please refer to the “Emotion/Style Configuration” below.

Language and Voice Support

The language code adopts the format of language-variant-script-region.

  • language:language (ISO 639-1), all lowercase, for example, Chinese is zh, English is en
  • variant (optional):Pronunciation or dialect (ISO 639-3), all lowercase, for example, Mandarian is cmn,Cantonese is yue
  • script (optional):Writing variants (ISO 15924), The first letter is capitalized, for example, Simplified Chinese is Hans, and Traditional Chinese is Hant.
  • region:The geographical area of language is used (ISO 3166), in all caps, such as China mainland CN, Hong Kong HK, and United State US.

The Speech Synthesis service currently supports the following languages and voices.

Voice List

Note: The Chinese voice mentioned above supports both Chinese and mixed Chinese-English scenarios. If you are using it for a purely English context, it is recommended to use an English voice.

LanguageLanguage CodeScenarioVoice NameVoiceIDDescriptionEmotion/Style Configuration
Japaneseja-JPGeneral ScenarioYukoYukoGeneral Female Voice
Japaneseja-JPGeneral ScenarioNorikaNorikaGeneral Female Voice
Japaneseja-JPGeneral ScenarioYosukeYosukeGeneral Male Voice
American Englishen-USGeneral ScenarioJulieJulieVibrant Female Voice
American Englishen-USGeneral ScenarioJohnJohnVibrant Male Voice
Chinese (Mandarin)zh-cmn-Hans-CNGeneral ScenarioXiaohuiXiaohuiGeneral Female VoiceSupported
Chinese (Mandarin)zh-cmn-Hans-CNGeneral ScenarioRuoxuanRuoxuanGeneral Female VoiceSupported
Chinese (Mandarin)zh-cmn-Hans-CNGeneral ScenarioSiyueSiyueGeneral Male Voice
Chinese (Mandarin)zh-cmn-Hans-CNAudio ReadingMingchengMingchengGeneral Male VoiceSupported
Chinese (Mandarin)zh-cmn-Hans-CNAudio ReadingHaoxuanHaoxuanGeneral Male VoiceSupported
Chinese (Mandarin)zh-cmn-Hans-CNAudio ReadingSidsSidaGentle Male VoiceSupported
Chinese (Mandarin)zh-cmn-Hans-CNAudio ReadingZiyiZiyiGentle Male VoiceSupported
Chinese (Mandarin)zh-cmn-Hans-CNAudio ReadingZiyangZiyangAncient-style Male VoiceSupported
Chinese (Mandarin)zh-cmn-Hans-CNIntelligent AssistantXiaoyueXiaoyueAncient-style Male Voice
Chinese (Mandarin)zh-cmn-Hans-CNIntelligent AssistantHaoyuHaoyuVibrant Male Voice
Chinese (Mandarin)zh-cmn-Hans-CNIntelligent AssistantMengqiMengqiVibrant Female Voice
Chinese (Mandarin)zh-cmn-Hans-CNCharacterized DubbingTongtongTongtongCute Childish (Boy) Voice
Chinese (Mandarin)zh-cmn-Hans-CNCharacterized DubbingHuiyaHuiyaGentle Female Voice
Chinese (Mandarin)zh-cmn-Hans-CNCharacterized DubbingMingyuMingyuGentle Male Voice

Emotion/Style Configuration

Only voices that support multi-emotion and multi-style can call this ability.

Multi-emotion and multi-style need to be called through the configuration of emotion, while general emotion does not require the configuration of emotion.

The Voice platform currently supports 11 emotions and 15 styles. The emotion/style supported by each voice are not entirely consistent. For details, see the table below.

Emotion/Style:

  • pleased / sorry / annoyed / happy / sad / angry / scare(Scared) / hate(Discusted) / surprise(Surprised) / tear(Crying) / novel_dialog(Peaceful)/ customer_service / professional / serious / narrator(Narrator - Relaxed) / narrator_immersive / comfort / lovey-dovey / conniving / tsundere / charming / storytelling / radio(Emotional Radio) / yoga / advertising / assistant
LanguageVoiceIDvoice NameEmotion/Style Configuration
Chinese (Mandarin)XiaohuiXiaohuipleased / sorry / annoyed / happy / sad / angry / scare / hate / surprise / tear / customer_service / professional / serious / comfort / lovey-dovey / conniving / tsundere / storytelling / radio / charming / yoga
Chinese (Mandarin)RuoxuanRuoxuanhappy / sad / angry / scare / hate / surprise / customer_service / comfort / storytelling / advertising / assistant
Chinese (Mandarin)MingchengMingchenghappy / sad / angry / scare / hate / surprise / tear / novel_dialog / narrator / narrator_immersive
Chinese (Mandarin)HaoxuanHaoxuanhappy / sad / angry / scare / hate / surprise / novel_dialog / narrator
Chinese (Mandarin)SidaSidahappy / sad / angry / scare / hate / surprise / novel_dialog / narrator
Chinese (Mandarin)ZiyiZiyihappy / sad / angry / scare / hate / surprise / novel_dialog / narrator
Chinese (Mandarin)ZiyangZiyanghappy / sad / angry / scare / hate / surprise / novel_dialog / narrator