Python API reference
Installing
pip install -U lmnt
will install the latest LMNT SDK.
Speech
The Speech
class is your primary touch-point with the LMNT API. Import it into your module with from lmnt.api import Speech
.
When you're done with the Speech
instance, make sure to clean up by calling its close()
method. Alternatively, you can use this class as an async context manager, which will call close()
for you:
What's new
Date | Description |
---|---|
Sep 15, 2023 | synthesize now accepts a length parameter to adjust audio output duration to the specified length |
__init__
__init__(api_key)
Constructing a Speech
object requires an API key. Create an API key by visiting your account page in our speech playground and signing up for a (free) plan.
close
async close()
Releases resources associated with this instance.
list_voices
async list_voices()
Returns the voices available for use in speech synthesis calls.
Return value
A dictionary of dictionaries (Dict[str, Dict[str, str]]
) that describe the available voices. Here's a sample object:
{
"shanti": {
"name": "Shanti",
"gender": "female",
"imageUrl": "https://api.lmnt.com/img/voice/shanti.webp"
}
}
Throws
lmnt.api.SpeechError
Notes
- The keys of the dictionary (e.g. "shanti") are used to specify the voice in the synthesize speech request below.
- Some voices may also have an
id
field; please ignore that field, it will be removed soon.
synthesize
async synthesize(text, voice, **kwargs)
Synthesizes speech for a supplied text string. Returns binary audio data in one of the supported audio formats.
Parameters
text
: the text to synthesizevoice
: which voice to render; id is found using thelist_voices
call**kwargs
format
(optional):aac
,mp3
,wav
; defaults towav
(24kHz 16-bit mono)speed
(optional): floating point value between 0.25 (slow) and 2.0 (fast); defaults to 1.0seed
(optional): random seed used to specify a different take; defaults to 0durations
: (optional) whether to include word durations detail; defaults to falselength
(optional): produce speech of this length in seconds; maximum 300.0 (5 minutes)
Return value
Unless otherwise specified below, the return value is a binary string containing the synthesized audio file.
Duration details
If durations=True
is specified, the response will be a dictionary with the following keys:
durations
: a list of dictionaries with keys as belowaudio
: the binary audio data containing the synthesized audio file
Each durations
entry is a dictionary with the following keys:
phonemes
: a list of the phonemes comprising the wordphoneme_durations
: a list of the durations of each phonemestart
: the starting duration of the wordduration
: the overall duration of the word
Duration units are given in sample counts at 24kHz.
Notes
- The
mp3
bitrate is 96kbps. The
length
parameter specifies how long you want the output speech to be. We will automatically speed up / slow down the speech as needed to fit this length.
synthesize_streaming
async synthesize_streaming(voice)
Creates a new, full-duplex streaming session. You can use the returned session object to concurrently stream text content to the server and receive speech data from the server.
Parameters
voice
: which voice to render; id is found using thelist_voices
call
Return value
A StreamingSynthesisConnection
instance, which you can use to stream data.
StreamingSynthesisConnection
This class represents a full-duplex streaming connection with the server. The expected use is to call append_text
as text is produced
and to iterate over the object to read audio. Make sure to call finish()
when you're done submitting the entire text snippet.
append_text
async append_text(text)
Sends additional text to synthesize to the server. The text can be split at any point. For example, the two snippets below are semantically equivalent:
await conn.append_text('This is a test of ')
await conn.append_text('the emergency broadcast system.')
await speech.append_text('This is a test of the eme')
await speech.append_text('rgency broadcast system.')
__aiter__ / __anext__
async __aiter__()
async __anext__()
Iterates over audio data from the server. See the notes below for details on the audio format. Here's a short snippet that shows how to iterate over the data:
conn = await speech.synthesize_streaming('mara-wilson')
async for msg in conn:
if msg.type == WSMsgType.CLOSE:
break
msg.data # contains a binary string with the audio data
Parameters
text
: some or all of the text to synthesize
Notes
- audio is returned as a 96kbps mono MP3 stream with a sampling rate of 24kHz
finish
async finish()
Call this function when you've written all the text you're expecting to submit. It will flush any remaining data on the server and return the last chunks of audio as described above.
Sample code
Standard synthesis
import asyncio
from lmnt.api import Speech
LMNT_API_KEY = ... # Your API key here
TEXT = 'This is a test of the emergency broadcast system.'
async def main():
async with Speech(LMNT_API_KEY) as speech:
voices = await speech.list_voices()
first_voice = list(voices.keys())[0]
wav_file = await speech.synthesize(TEXT, first_voice)
with open('output.wav', 'wb') as f:
f.write(wav_file)
if __name__ == '__main__':
asyncio.run(main())
Streaming synthesis + ChatGPT
import asyncio
import openai
from argparse import ArgumentParser
from aiohttp import WSMsgType
from lmnt.api import Speech
LMNT_API_KEY = ... # Your LMNT API key here
openai.api_key = ... # Your OpenAI API key here
MODEL = 'gpt-3.5-turbo'
DEFAULT_PROMPT = "Tell me a story about the people of Foundation."
async def reader_task(conn):
"""Streams audio data from the server and writes it to `output.raw`."""
with open(f'output.raw', 'wb') as f:
async for msg in conn:
if msg.type == WSMsgType.CLOSE:
break
f.write(msg.data)
async def writer_task(conn, prompt):
"""Streams text from ChatGPT to LMNT."""
response = await openai.ChatCompletion.acreate(
model=MODEL,
messages=[{'role': 'user', 'content': prompt}],
temperature=0,
stream=True
)
async for chunk in response:
if 'choices' not in chunk:
continue
choice = chunk['choices'][0]
if 'delta' not in choice or 'content' not in choice['delta']:
continue
await conn.append_text(choice['delta']['content'])
print(choice['delta']['content'], end='', flush=True)
await conn.finish()
async def main(args):
s = Speech(LMNT_API_KEY)
conn = await s.synthesize_streaming('mara-wilson')
t1 = asyncio.create_task(reader_task(conn))
t2 = asyncio.create_task(writer_task(conn, args.prompt))
await t1
await t2
await s.close()
if __name__ == '__main__':
parser = ArgumentParser()
parser.add_argument('prompt', default=DEFAULT_PROMPT, nargs='?')
asyncio.run(main(parser.parse_args()))