Logo

JavaScript

DolphinVoice JavaScript SDK

DolphinVoice SDK is used for speech recognition and synthesis. This SDK provides three main modules:

  • Real-time Speech Recognition (ASR)
  • Audio File Transcription (FileAsr)
  • Text-to-Speech (TTS)

Documentation

Find more detailed documentation and guides about the DolphinVoice SDK in the following resources:

For technical support or any questions, please contact our [developer support team](mailto: voice.support@dolphin-ai.jp).

Installation

You can install the SDK directly from npm.

npm install @dolphinvoice/sdk

or

pnpm install @dolphinvoice/sdk

or

yarn add @dolphinvoice/sdk

Usage

// Import ASR module and required type definitions
import { RealTimeAsrSDK } from '@dolphinvoice/sdk';
import type { RealTimeAsrOptions, RealTimeAsrEventData } from '@dolphinvoice/sdk';

// Initialize SDK with your credentials
const app_id = 'YOUR_APP_ID';

// Method 1: Use signature (recommended)
// To generate the signature, refer to https://developers.dolphinvoice.ai/en/docs/api/start/auth#15-signature-calculation-method
const authOptions = {
  signature: 'SERVER_GENERATED_SIGNATURE',
  timestamp: 1712345678,
};
// Create a new instance of the real-time speech recognition SDK
const sdk = new RealTimeAsrSDK(app_id, authOptions);

//// Method 2: Use appSecret directly (unsafe)
//const appSecret = 'YOUR_APP_SECRET'; 
//// Create a new instance of the real-time speech recognition SDK
//const sdk = new RealTimeAsrSDK(app_id, appSecret);

// Store microphone stream reference for later use
let micStream: MediaStream | null = null;

// Function to activate microphone-based speech recognition
const startMicRecognition = async () => {
  // Configure recognition parameters
  const options: RealTimeAsrOptions = {
    lang_type: 'en-US',  // Language code
  };

  // Request microphone access and set optimal audio settings
  micStream = await navigator.mediaDevices.getUserMedia({
    audio: {
      echoCancellation: true,     // Reduce echo interference
      noiseSuppression: true,     // Reduce background noise interference
      sampleRate: 16000,          // 16kHz sample rate, adjust according to recognition parameters for optimal results
    },
  });

  // Initialize recognition session
  await sdk.start(options);

  // Event handling for different recognition statuses
  // Recognition session started
  sdk.on('TranscriptionStarted', (data: RealTimeAsrEventData) => {
    console.log('TranscriptionStarted:', data);
    // Recognition started, send microphone stream to SDK
    if (micStream) {
      sdk.sendStream(micStream);
    }
  });

  // New sentence begins
  sdk.on('SentenceBegin', (data: RealTimeAsrEventData) => {
    console.log('SentenceBegin:', (data as any).payload?.result);
  });

  // Intermediate recognition results
  sdk.on('TranscriptionResultChanged', (data: RealTimeAsrEventData) => {
    console.log('TranscriptionResultChanged:', (data as any).payload?.result);
  });

  // Completed sentences
  sdk.on('SentenceEnd', (data: RealTimeAsrEventData) => {
    console.log('SentenceEnd:', (data as any).payload?.result);
  });

  // Session completed
  sdk.on('TranscriptionCompleted', (data: RealTimeAsrEventData) => {
    console.log('TranscriptionCompleted:', (data as any).payload?.result);
    // Clean up microphone resources after completion
    micStream?.getTracks().forEach((track) => track.stop());
  });

  // Warning event
  sdk.on('Warning', (data: RealTimeAsrEventData) => {
    console.log('Warning:', (data as any).header);
  });

  // Error event
  sdk.on('Error', (data: RealTimeAsrEventData) => {
    console.error('Error:', (data as any).header);
  });
};

// Stop recognition and clean up resources
const stopMicRecognition = () => {
  sdk.stop();  // Stop recognition session
  // Release microphone resources
  micStream?.getTracks().forEach((track) => track.stop());
};

// Start recognition by calling method
startMicRecognition();
// Stop recognition by calling method
stopMicRecognition();
// Import FileAsr module and required type definitions
import { FileAsr } from '@dolphinvoice/sdk'; 
import type { FileAsrParams } from '@dolphinvoice/sdk'; 

// Initialize SDK with your credentials
const app_id = 'YOUR_APP_ID'; 

//// Set baseOptions if using VIP version of Audio File Transcription
//const baseOptions = {
//  interface_version: 'vip',
//}

// Method 1: Use signature (recommended)
// To generate the signature, refer to https://developers.dolphinvoice.ai/en/docs/api/start/auth#15-signature-calculation-method
const authOptions = {
  signature: 'SERVER_GENERATED_SIGNATURE',
  timestamp: 1712345678,
};
// Create a new instance of the audio file transcription SDK
const sdk = new FileAsr(app_id, authOptions, baseOptions);  // baseOptions is optional

//// Method 2: Use appSecret directly (unsafe)
//const appSecret = 'YOUR_APP_SECRET'; 
//// Create a new instance of the audio file transcription SDK
//const sdk = new FileAsr(app_id, appSecret, baseOptions);  // baseOptions is optional

// Function to transcribe audio file
const transcribeAudioFile = () => { 
  // Configure transcription parameters
  const params: FileAsrParams = { 
      "lang_type": "en-US",  // Language code
      "format": "mp3",  // Audio file format
  }; 

  // Upload and transcribe file
  sdk.upload(params, (progress, result) => { 
    // Progress callback, provides updates during transcription process
    console.log(`Progress: ${progress}%`, result); 
  }).then(finalResult => { 
    // Success callback, includes final transcription result
    console.log('Result:', finalResult); 
  }).catch(err => { 
    // Error handling
    console.error('Error:', err);   
  }); 
} 

// Start transcription by calling method
transcribeAudioFile(); 
// Import TTS module and required type definitions
import { ShortTtsSDK } from '@dolphinvoice/sdk'; 
import type { ShortTtsEventData, ShortTtsOptions } from '@dolphinvoice/sdk'; 

// Initialize SDK with your credentials
const app_id = 'YOUR_APP_ID'; 

//// Set baseOptions if using non-streaming API
//const baseOptions = {
//  interface_mode: 'http',
//}

// Method 1: Use signature (recommended)
// To generate the signature, refer to https://developers.dolphinvoice.ai/en/docs/api/start/auth#15-signature-calculation-method
const authOptions = {
  signature: 'SERVER_GENERATED_SIGNATURE',
  timestamp: 1712345678,
};
// Create a new instance of the short text to speech SDK
const sdk = new ShortTtsSDK(app_id, authOptions, baseOptions);   // baseOptions is optional

//// Method 2: Use appSecret directly (unsafe)
//const appSecret = 'YOUR_APP_SECRET'; 
//// Create a new instance of the short text to speech SDK
//const sdk = new ShortTtsSDK(app_id, appSecret, baseOptions);   // baseOptions is optional

// Function to synthesize speech from text
const synthesizeSpeech = () => { 
  // Configure synthesis parameters
  const options: ShortTtsOptions = { 
      "text": "The weather is nice, let's go for a walk.",  // Text to synthesize
      "lang_type": "en-US"                // Language code
  }; 

  // Initialize speech synthesis session
  sdk.start(options); 

  // Event handling for different synthesis statuses
  // Synthesis started
  sdk.on('SynthesisStarted', (data: ShortTtsEventData) => { 
    console.log('SynthesisStarted:', data) 
  }); 

  // Synthesis duration
  sdk.on('SynthesisDuration', (data: ShortTtsEventData) => { 
    // Provide total duration of synthesized audio
    console.log('SynthesisDuration:', data) 
  }); 

  // Synthesis timestamp
  sdk.on('SynthesisTimestamp', (data: ShortTtsEventData) => { 
    // Provide timestamp information for synthesized text
    console.log('SynthesisTimestamp:', data) 
  }); 

  // Synthesized audio data chunk
  sdk.on('BinaryData', (data: ShortTtsEventData) => { 
    // Audio data received during synthesis
    console.log('BinaryData:', data) 
    // This is typically where audio data would be processed
    // e.g., played back or saved to a file
  }); 

  // Synthesis completed
  sdk.on('SynthesisCompleted', (data: ShortTtsEventData) => { 
    console.log('SynthesisCompleted:', data) 
  }); 

  // Warning event
  sdk.on('Warning', (data: ShortTtsEventData) => { 
    console.log('Warning:', data.header); 
  }); 

  // Error event
  sdk.on('Error', (data: ShortTtsEventData) => { 
    console.error('Error:', data.header); 
  }); 
} 

// Start speech synthesis by calling method
synthesizeSpeech(); 

API Reference

The real-time speech recognition module is for processing real-time audio streams.

Methods

  • on(event: RealTimeAsrEventType, callback: RealTimeAsrEventCallback) - Register event handlers for recognition events
  • start(params: RealTimeAsrOptions) - Start a new recognition session with specified parameters
  • stop() - Stop the current recognition session and release resources
  • sendStream(stream: MediaStream) - Send audio stream to recognition service
For complete API documentation, refer to DolphinVoice API Documentation.

Events

  • TranscriptionStarted - Triggered when recognition session starts
  • SentenceBegin - Triggered when a new sentence is detected
  • TranscriptionResultChanged - Triggered when intermediate results are updated
  • SentenceEnd - Triggered when a sentence is completed
  • TranscriptionCompleted - Triggered when the entire recognition session is completed
  • Warning - Triggered when a non-fatal warning occurs
  • Error - Triggered when an error occurs

The audio file transcription module is for processing pre-recorded audio files.

Methods

  • upload(params: FileAsrParams, onProgress: (progress: number, result?: any) => void) - Upload and transcribe audio files, providing progress updates
For complete API documentation, refer to DolphinVoice API Documentation.
  • )

The text-to-speech module is for converting text into natural speech.

Methods

  • on(event: ShortTtsEventType, callback: ShortTtsEventCallback) - Register event handlers for synthesis events
  • start(params: ShortTtsOptions) - Start a new synthesis session with specified parameters
For complete API documentation, refer to DolphinVoice API Documentation.

Events

  • SynthesisStarted - Triggered when synthesis process starts
  • SynthesisDuration - Provides total duration of synthesized audio
  • SynthesisTimestamp - Provides timestamp information for synthesized text
  • BinaryData - Audio data received during synthesis process
  • SynthesisCompleted - Triggered when synthesis process is completed
  • Warning - Triggered when a non-fatal warning occurs
  • Error - Triggered when an error occurs

License

MIT