Python
DolphinVoice Python SDK
DolphinVoice SDK is used for speech recognition and synthesis. This SDK provides three main modules:
- Real-time Speech Recognition (ASR)
- Audio File Transcription (FileAsr)
- Text to Speech (TTS)
Documentation
Find more detailed documentation and guides about the DolphinVoice SDK in the following resources:
For technical support or any questions, please contact our [developer support team](mailto: voice.support@dolphin-ai.jp).
Installation
You can install the SDK directly with pip.
pip install dolphinvoiceUsage
from dolphin_voice.speech_rec.callbacks import SpeechTranscriberCallback
from dolphin_voice import speech_rec
import time
class Callback(SpeechTranscriberCallback):
def started(self, message):
print('TranscriptionStarted: %s' % message)
def result_changed(self, message):
print('TranscriptionResultChanged: %s' % message)
def sentence_begin(self, message):
print('SentenceBegin: %s' % message)
def sentence_end(self, message):
print('SentenceEnd: %s' % message)
def completed(self, message):
print('TranscriptionCompleted: %s' % message)
def task_failed(self, message):
print('TaskFailed: %s' % message)
def warning_info(self, message):
print('Warning: %s' % message)
def channel_closed(self):
print('TranslationChannelClosed')
audio_path = 'demo.mp3'
client = speech_rec.SpeechClient(app_id='YOUR_APP_ID', app_secret='YOUR_APP_SECRET')
with client.create_transcriber(Callback()) as transcriber:
transcriber.set_parameter({
"lang_type": "en-US",
"format": "mp3",
"sample_rate": 16000,
})
transcriber.start()
with open(audio_path, 'rb') as f:
audio = f.read(7680)
while audio:
transcriber.send(audio)
time.sleep(0.24)
audio = f.read(7680)
transcriber.stop()
from dolphin_voice import speech_rec
client = speech_rec.SpeechClient(app_id='YOUR_APP_ID', app_secret='YOUR_APP_SECRET')
asrfile = client.create_asrfile()
audio = 'demo.mp3'
data = {
"lang_type": "en-US",
"format": "mp3",
"sample_rate": 16000
}
result = asrfile.transcribe_file(audio, data)
print(result)
from dolphin_voice.speech_syn.callbacks import SpeechSynthesizerCallback
from dolphin_voice import speech_syn
class MyCallback(SpeechSynthesizerCallback):
def __init__(self, name):
self._name = name
self._fout = open(name, 'wb')
def binary_data_received(self, raw):
self._fout.write(raw)
def on_message(self, message):
print('Received : %s' % message)
def started(self, message):
print('MyCallback.OnSynthesizerStarted: %s' % message)
def get_Timestamp(self,message):
print('MyCallback.OnSynthesizerGetTimestamp: %s' % message)
def get_Duration(self, message):
print('MyCallback.OnSynthesizerGetDuration: %s' % message)
def completed(self, message):
print('MyCallback.OnSynthesizerCompleted: %s' % message)
self._fout.close()
def channel_closed(self):
print('MyCallback.OnSynthesizerChannelClosed')
audio_name = 'syAudio.mp3'
client = speech_syn.SpeechClient(app_id='YOUR_APP_ID', app_secret='YOUR_APP_SECRET')
callback = MyCallback(audio_name)
with client.create_synthesizer(callback) as synthesizer:
synthesizer.set_parameter({
"text": "The weather is nice, let's go for a walk.",
"lang_type": "en-US",
"format": "mp3"
})
synthesizer.start()
synthesizer.wait_completed()API Reference
The real-time speech recognition module is for processing real-time audio streams.
Methods
create_transcriber(callback: SpeechSynthesizerCallback)- Registers event handlers for recognition eventsset_parameter(params: Json)- Specifies parametersstart()- Starts a new recognition sessionsend(stream: Bytes)- Sends audio stream to the recognition servicestop()- Stops the current recognition session and releases resources
For complete API documentation, refer to DolphinVoice API Documentation.
Events
TranscriptionStarted- Triggered when recognition session startsSentenceBegin- Triggered when a new sentence is detectedTranscriptionResultChanged- Triggered when intermediate results are updatedSentenceEnd- Triggered when a sentence is completedTranscriptionCompleted- Triggered when the entire recognition session is completedWarning- Triggered when a non-fatal warning occurs
The audio file transcription module is for processing pre-recorded audio files.
Methods
transcribe_file(audio: String, params: Json)- Uploads and transcribes the audio file
For complete API documentation, refer to DolphinVoice API Documentation.
The text-to-speech synthesis module is used to convert text into natural speech.
Methods
create_synthesizer(callback: SpeechSynthesizerCallback)- Registers event handlers for synthesis eventsset_parameter(params: Json)- Specifies parametersstart()- Starts a new synthesis session
For complete API documentation, refer to DolphinVoice API Documentation.
Events
OnSynthesizerStarted- Triggered when synthesis process startsOnSynthesizerGetDuration- Provides the total duration of the synthesized audioOnSynthesizerGetTimestamp- Provides timestamp information for the synthesized textOnSynthesizerCompleted- Triggered when synthesis process is completed
License
MIT