JavaScript
DolphinVoice JavaScript SDK
DolphinVoice SDK is used for speech recognition and synthesis. This SDK provides three main modules:
- Real-time Speech Recognition (ASR)
- Audio File Transcription (FileAsr)
- Text-to-Speech (TTS)
Documentation
Find more detailed documentation and guides about the DolphinVoice SDK in the following resources:
For technical support or any questions, please contact our [developer support team](mailto: voice.support@dolphin-ai.jp).
Installation
You can install the SDK directly from npm.
npm install @dolphinvoice/sdkor
pnpm install @dolphinvoice/sdkor
yarn add @dolphinvoice/sdkUsage
// Import ASR module and required type definitions
import { RealTimeAsrSDK } from '@dolphinvoice/sdk';
import type { RealTimeAsrOptions, RealTimeAsrEventData } from '@dolphinvoice/sdk';
// Initialize SDK with your credentials
const app_id = 'YOUR_APP_ID';
// Method 1: Use signature (recommended)
// To generate the signature, refer to https://developers.dolphinvoice.ai/en/docs/api/start/auth#15-signature-calculation-method
const authOptions = {
signature: 'SERVER_GENERATED_SIGNATURE',
timestamp: 1712345678,
};
// Create a new instance of the real-time speech recognition SDK
const sdk = new RealTimeAsrSDK(app_id, authOptions);
//// Method 2: Use appSecret directly (unsafe)
//const appSecret = 'YOUR_APP_SECRET';
//// Create a new instance of the real-time speech recognition SDK
//const sdk = new RealTimeAsrSDK(app_id, appSecret);
// Store microphone stream reference for later use
let micStream: MediaStream | null = null;
// Function to activate microphone-based speech recognition
const startMicRecognition = async () => {
// Configure recognition parameters
const options: RealTimeAsrOptions = {
lang_type: 'en-US', // Language code
};
// Request microphone access and set optimal audio settings
micStream = await navigator.mediaDevices.getUserMedia({
audio: {
echoCancellation: true, // Reduce echo interference
noiseSuppression: true, // Reduce background noise interference
sampleRate: 16000, // 16kHz sample rate, adjust according to recognition parameters for optimal results
},
});
// Initialize recognition session
await sdk.start(options);
// Event handling for different recognition statuses
// Recognition session started
sdk.on('TranscriptionStarted', (data: RealTimeAsrEventData) => {
console.log('TranscriptionStarted:', data);
// Recognition started, send microphone stream to SDK
if (micStream) {
sdk.sendStream(micStream);
}
});
// New sentence begins
sdk.on('SentenceBegin', (data: RealTimeAsrEventData) => {
console.log('SentenceBegin:', (data as any).payload?.result);
});
// Intermediate recognition results
sdk.on('TranscriptionResultChanged', (data: RealTimeAsrEventData) => {
console.log('TranscriptionResultChanged:', (data as any).payload?.result);
});
// Completed sentences
sdk.on('SentenceEnd', (data: RealTimeAsrEventData) => {
console.log('SentenceEnd:', (data as any).payload?.result);
});
// Session completed
sdk.on('TranscriptionCompleted', (data: RealTimeAsrEventData) => {
console.log('TranscriptionCompleted:', (data as any).payload?.result);
// Clean up microphone resources after completion
micStream?.getTracks().forEach((track) => track.stop());
});
// Warning event
sdk.on('Warning', (data: RealTimeAsrEventData) => {
console.log('Warning:', (data as any).header);
});
// Error event
sdk.on('Error', (data: RealTimeAsrEventData) => {
console.error('Error:', (data as any).header);
});
};
// Stop recognition and clean up resources
const stopMicRecognition = () => {
sdk.stop(); // Stop recognition session
// Release microphone resources
micStream?.getTracks().forEach((track) => track.stop());
};
// Start recognition by calling method
startMicRecognition();
// Stop recognition by calling method
stopMicRecognition();// Import FileAsr module and required type definitions
import { FileAsr } from '@dolphinvoice/sdk';
import type { FileAsrParams } from '@dolphinvoice/sdk';
// Initialize SDK with your credentials
const app_id = 'YOUR_APP_ID';
//// Set baseOptions if using VIP version of Audio File Transcription
//const baseOptions = {
// interface_version: 'vip',
//}
// Method 1: Use signature (recommended)
// To generate the signature, refer to https://developers.dolphinvoice.ai/en/docs/api/start/auth#15-signature-calculation-method
const authOptions = {
signature: 'SERVER_GENERATED_SIGNATURE',
timestamp: 1712345678,
};
// Create a new instance of the audio file transcription SDK
const sdk = new FileAsr(app_id, authOptions, baseOptions); // baseOptions is optional
//// Method 2: Use appSecret directly (unsafe)
//const appSecret = 'YOUR_APP_SECRET';
//// Create a new instance of the audio file transcription SDK
//const sdk = new FileAsr(app_id, appSecret, baseOptions); // baseOptions is optional
// Function to transcribe audio file
const transcribeAudioFile = () => {
// Configure transcription parameters
const params: FileAsrParams = {
"lang_type": "en-US", // Language code
"format": "mp3", // Audio file format
};
// Upload and transcribe file
sdk.upload(params, (progress, result) => {
// Progress callback, provides updates during transcription process
console.log(`Progress: ${progress}%`, result);
}).then(finalResult => {
// Success callback, includes final transcription result
console.log('Result:', finalResult);
}).catch(err => {
// Error handling
console.error('Error:', err);
});
}
// Start transcription by calling method
transcribeAudioFile(); // Import TTS module and required type definitions
import { ShortTtsSDK } from '@dolphinvoice/sdk';
import type { ShortTtsEventData, ShortTtsOptions } from '@dolphinvoice/sdk';
// Initialize SDK with your credentials
const app_id = 'YOUR_APP_ID';
//// Set baseOptions if using non-streaming API
//const baseOptions = {
// interface_mode: 'http',
//}
// Method 1: Use signature (recommended)
// To generate the signature, refer to https://developers.dolphinvoice.ai/en/docs/api/start/auth#15-signature-calculation-method
const authOptions = {
signature: 'SERVER_GENERATED_SIGNATURE',
timestamp: 1712345678,
};
// Create a new instance of the short text to speech SDK
const sdk = new ShortTtsSDK(app_id, authOptions, baseOptions); // baseOptions is optional
//// Method 2: Use appSecret directly (unsafe)
//const appSecret = 'YOUR_APP_SECRET';
//// Create a new instance of the short text to speech SDK
//const sdk = new ShortTtsSDK(app_id, appSecret, baseOptions); // baseOptions is optional
// Function to synthesize speech from text
const synthesizeSpeech = () => {
// Configure synthesis parameters
const options: ShortTtsOptions = {
"text": "The weather is nice, let's go for a walk.", // Text to synthesize
"lang_type": "en-US" // Language code
};
// Initialize speech synthesis session
sdk.start(options);
// Event handling for different synthesis statuses
// Synthesis started
sdk.on('SynthesisStarted', (data: ShortTtsEventData) => {
console.log('SynthesisStarted:', data)
});
// Synthesis duration
sdk.on('SynthesisDuration', (data: ShortTtsEventData) => {
// Provide total duration of synthesized audio
console.log('SynthesisDuration:', data)
});
// Synthesis timestamp
sdk.on('SynthesisTimestamp', (data: ShortTtsEventData) => {
// Provide timestamp information for synthesized text
console.log('SynthesisTimestamp:', data)
});
// Synthesized audio data chunk
sdk.on('BinaryData', (data: ShortTtsEventData) => {
// Audio data received during synthesis
console.log('BinaryData:', data)
// This is typically where audio data would be processed
// e.g., played back or saved to a file
});
// Synthesis completed
sdk.on('SynthesisCompleted', (data: ShortTtsEventData) => {
console.log('SynthesisCompleted:', data)
});
// Warning event
sdk.on('Warning', (data: ShortTtsEventData) => {
console.log('Warning:', data.header);
});
// Error event
sdk.on('Error', (data: ShortTtsEventData) => {
console.error('Error:', data.header);
});
}
// Start speech synthesis by calling method
synthesizeSpeech(); API Reference
The real-time speech recognition module is for processing real-time audio streams.
Methods
on(event: RealTimeAsrEventType, callback: RealTimeAsrEventCallback)- Register event handlers for recognition eventsstart(params: RealTimeAsrOptions)- Start a new recognition session with specified parametersstop()- Stop the current recognition session and release resourcessendStream(stream: MediaStream)- Send audio stream to recognition service
For complete API documentation, refer to DolphinVoice API Documentation.
Events
TranscriptionStarted- Triggered when recognition session startsSentenceBegin- Triggered when a new sentence is detectedTranscriptionResultChanged- Triggered when intermediate results are updatedSentenceEnd- Triggered when a sentence is completedTranscriptionCompleted- Triggered when the entire recognition session is completedWarning- Triggered when a non-fatal warning occursError- Triggered when an error occurs
The audio file transcription module is for processing pre-recorded audio files.
Methods
upload(params: FileAsrParams, onProgress: (progress: number, result?: any) => void)- Upload and transcribe audio files, providing progress updates
For complete API documentation, refer to DolphinVoice API Documentation.
- )
The text-to-speech module is for converting text into natural speech.
Methods
on(event: ShortTtsEventType, callback: ShortTtsEventCallback)- Register event handlers for synthesis eventsstart(params: ShortTtsOptions)- Start a new synthesis session with specified parameters
For complete API documentation, refer to DolphinVoice API Documentation.
Events
SynthesisStarted- Triggered when synthesis process startsSynthesisDuration- Provides total duration of synthesized audioSynthesisTimestamp- Provides timestamp information for synthesized textBinaryData- Audio data received during synthesis processSynthesisCompleted- Triggered when synthesis process is completedWarning- Triggered when a non-fatal warning occursError- Triggered when an error occurs
License
MIT