Python

DolphinVoice SDK is used for speech recognition and synthesis. This SDK provides three main modules:

Real-time Speech Recognition (ASR)
Audio File Transcription (FileAsr)
Text to Speech (TTS)

Documentation

Find more detailed documentation and guides about the DolphinVoice SDK in the following resources:

For technical support or any questions, please contact our [developer support team](mailto: voice.support@dolphin-ai.jp).

Installation

You can install the SDK directly with pip.

pip install dolphinvoice

Usage


from dolphin_voice.speech_rec.callbacks import SpeechTranscriberCallback
from dolphin_voice import speech_rec
import time

class Callback(SpeechTranscriberCallback):
    def started(self, message):
        print('TranscriptionStarted: %s' % message)

    def result_changed(self, message):
        print('TranscriptionResultChanged: %s' % message)

    def sentence_begin(self, message):
        print('SentenceBegin: %s' % message)

    def sentence_end(self, message):
        print('SentenceEnd: %s' % message)

    def completed(self, message):
        print('TranscriptionCompleted: %s' % message)

    def task_failed(self, message):
        print('TaskFailed: %s' % message)

    def warning_info(self, message):
        print('Warning: %s' % message)

    def channel_closed(self):
        print('TranslationChannelClosed')

audio_path = 'demo.mp3'
client = speech_rec.SpeechClient(app_id='YOUR_APP_ID', app_secret='YOUR_APP_SECRET')

with client.create_transcriber(Callback()) as transcriber:
    transcriber.set_parameter({
        "lang_type": "en-US",
        "format": "mp3",
        "sample_rate": 16000,
    })
    transcriber.start()
    with open(audio_path, 'rb') as f:
        audio = f.read(7680)
        while audio:
            transcriber.send(audio)
            time.sleep(0.24)
            audio = f.read(7680)
    transcriber.stop()


from dolphin_voice import speech_rec

client = speech_rec.SpeechClient(app_id='YOUR_APP_ID', app_secret='YOUR_APP_SECRET')

asrfile = client.create_asrfile()

audio = 'demo.mp3'
data = {
    "lang_type": "en-US",
    "format": "mp3",
    "sample_rate": 16000
}
result = asrfile.transcribe_file(audio, data)
print(result)


from dolphin_voice.speech_syn.callbacks import SpeechSynthesizerCallback
from dolphin_voice import speech_syn

class MyCallback(SpeechSynthesizerCallback):
    def __init__(self, name):
        self._name = name
        self._fout = open(name, 'wb')

    def binary_data_received(self, raw):
        self._fout.write(raw)

    def on_message(self, message):
        print('Received : %s' % message)

    def started(self, message):
        print('MyCallback.OnSynthesizerStarted: %s' % message)

    def get_Timestamp(self,message):
        print('MyCallback.OnSynthesizerGetTimestamp: %s' % message)

    def get_Duration(self, message):
        print('MyCallback.OnSynthesizerGetDuration: %s' % message)

    def completed(self, message):
        print('MyCallback.OnSynthesizerCompleted: %s' % message)
        self._fout.close()

    def channel_closed(self):
        print('MyCallback.OnSynthesizerChannelClosed')

audio_name = 'syAudio.mp3'
client = speech_syn.SpeechClient(app_id='YOUR_APP_ID', app_secret='YOUR_APP_SECRET')
callback = MyCallback(audio_name)

with client.create_synthesizer(callback) as synthesizer:
    synthesizer.set_parameter({
        "text": "The weather is nice, let's go for a walk.",
        "lang_type": "en-US",
        "format": "mp3"
    })
    synthesizer.start()
    synthesizer.wait_completed()

API Reference

The real-time speech recognition module is for processing real-time audio streams.

Methods

create_transcriber(callback: SpeechSynthesizerCallback) - Registers event handlers for recognition events
set_parameter(params: Json) - Specifies parameters
start() - Starts a new recognition session
send(stream: Bytes) - Sends audio stream to the recognition service
stop() - Stops the current recognition session and releases resources

For complete API documentation, refer to DolphinVoice API Documentation.

Events

TranscriptionStarted - Triggered when recognition session starts
SentenceBegin - Triggered when a new sentence is detected
TranscriptionResultChanged - Triggered when intermediate results are updated
SentenceEnd - Triggered when a sentence is completed
TranscriptionCompleted - Triggered when the entire recognition session is completed
Warning - Triggered when a non-fatal warning occurs

The audio file transcription module is for processing pre-recorded audio files.

Methods

transcribe_file(audio: String, params: Json) - Uploads and transcribes the audio file

For complete API documentation, refer to DolphinVoice API Documentation.

The text-to-speech synthesis module is used to convert text into natural speech.

Methods

create_synthesizer(callback: SpeechSynthesizerCallback) - Registers event handlers for synthesis events
set_parameter(params: Json) - Specifies parameters
start() - Starts a new synthesis session

For complete API documentation, refer to DolphinVoice API Documentation.

Events

OnSynthesizerStarted - Triggered when synthesis process starts
OnSynthesizerGetDuration - Provides the total duration of the synthesized audio
OnSynthesizerGetTimestamp - Provides timestamp information for the synthesized text
OnSynthesizerCompleted - Triggered when synthesis process is completed

License

MIT

Documentation

Installation

Usage

API Reference

License

On this page