使用マニュアル
Basic Knowledge
Product and Services
Voiceprint Recognition (VPR) is a service that analyzes audio to identify the speaker's identity. By using the VPR service, you can quickly integrate secure and efficient authentication capabilities, adding identity recognition features to your applications.
The Voiceprint Recognition service is described as follows:
- VPR: Provides functions such as voiceprint registration, verification, and deregistration. It supports uploading a segment of audio, comparing it with the voiceprint records in the database to identify the speaker in the audio (only supports deployment on the Linux x86 platform).
Basic Terms
Sample Rate
The audio sample rate refers to the number of times the recording device samples the sound signal per second. The higher the sampling frequency, the more realistic and natural the sound reproduction.
When using the VPR service, you can specify the audio sample rate, and the actual sample rate of the audio being sent must match this parameter. The Voiceprint Recognition service currently supports a 16000 Hz sample rate for audio.
Sample Size
Also known as sample bit depth, it is a parameter that measures the fluctuations in sound. It refers to the number of binary bits used by the sound card when capturing and playing back sound files. The Voiceprint Recognition service currently only supports 16-bit depth audio.
Sound Channel
The sound channel (or channel) refers to the independently recorded or played audio signals in different spatial positions. The number of channels is the number of sound sources when recording or the number of corresponding speakers during playback. The Voiceprint Recognition service currently only supports mono (single-channel) audio.
Voiceprint Registration
Upload a segment of audio and provide a username as the ID for that voiceprint. After three successful registrations, the voiceprint will be recorded in the voiceprint database. When registering a voiceprint, you can choose to assign the voiceprint to a specific group for verification within the group during 1-v-N verification.
Voiceprint Verification
Upload a segment of audio, and compare the audio with the voiceprint records in the database to identify the speaker in the audio. Voiceprint verification includes two modes:
- 1-v-1 Verification: Compares one audio with a specified voiceprint record in the database and returns the match score.
- 1-v-N Verification: Compares one audio with all voiceprint records in the database and returns information on the top 3 records with the highest match scores.
In 1-v-N verification, you can also specify voiceprint records in one or more groups within the database for comparison.
Voiceprint Deregistration
Send the user_name of the voiceprint to be deregistered to remove the record from the voiceprint database.
Voiceprint Grouping
When registering a voiceprint, you can specify the group to which the voiceprint belongs. During 1-v-N verification, the comparison can be done with one or more specified groups.
Audio Encoding
Currently, the VPR service supports the following audio encoding formats. Set the format field to the corresponding encoding format.
| Encoding Format | Description | Voiceprint Recognition |
|---|---|---|
| pcm | Uncompressed audio with 16-bit sample width, Little-Endian format; supports a 16 kHz sample rate. Commonly found in standard uncompressed WAV format (excluding the first 44-byte WAV header). | √ |
| wav | Audio with a 16-bit sample width and a 16 kHz sample rate. | √ |