使用マニュアル

Basic Knowledge

Product and Services

Voiceprint Recognition (VPR) is a service that analyzes audio to identify the speaker's identity. By using the VPR service, you can quickly integrate secure and efficient authentication capabilities, adding identity recognition features to your applications.

The Voiceprint Recognition service is described as follows:

VPR: Provides functions such as voiceprint registration, verification, and deregistration. It supports uploading a segment of audio, comparing it with the voiceprint records in the database to identify the speaker in the audio (only supports deployment on the Linux x86 platform).

Basic Terms

Sample Rate

The audio sample rate refers to the number of times the recording device samples the sound signal per second. The higher the sampling frequency, the more realistic and natural the sound reproduction.

When using the VPR service, you can specify the audio sample rate, and the actual sample rate of the audio being sent must match this parameter. The Voiceprint Recognition service currently supports a 16000 Hz sample rate for audio.

Sample Size

Also known as sample bit depth, it is a parameter that measures the fluctuations in sound. It refers to the number of binary bits used by the sound card when capturing and playing back sound files. The Voiceprint Recognition service currently only supports 16-bit depth audio.

Sound Channel

The sound channel (or channel) refers to the independently recorded or played audio signals in different spatial positions. The number of channels is the number of sound sources when recording or the number of corresponding speakers during playback. The Voiceprint Recognition service currently only supports mono (single-channel) audio.

Voiceprint Registration

Upload a segment of audio and provide a username as the ID for that voiceprint. After three successful registrations, the voiceprint will be recorded in the voiceprint database. When registering a voiceprint, you can choose to assign the voiceprint to a specific group for verification within the group during 1-v-N verification.

Voiceprint Verification

Upload a segment of audio, and compare the audio with the voiceprint records in the database to identify the speaker in the audio. Voiceprint verification includes two modes:

1-v-1 Verification: Compares one audio with a specified voiceprint record in the database and returns the match score.
1-v-N Verification: Compares one audio with all voiceprint records in the database and returns information on the top 3 records with the highest match scores.

In 1-v-N verification, you can also specify voiceprint records in one or more groups within the database for comparison.

Encoding Format	Description	Voiceprint Recognition
pcm	Uncompressed audio with 16-bit sample width, Little-Endian format; supports a 16 kHz sample rate. Commonly found in standard uncompressed WAV format (excluding the first 44-byte WAV header).	√
wav	Audio with a 16-bit sample width and a 16 kHz sample rate.	√