REST API reference
The LMNT Speech REST API provides facilities for synthesizing emotive, human-like speech audio from text input. There is support for creating custom voices from a user-provided audio input data set. Those voices can then be used for speech synthesis, in addition to those already provided as part of the LMNT system.
If you are interested in accessing other functionality via an API (e.g. alignment data, bulk synthesis) please contact us at feedback@lmnt.com.
What's new
Date | Description |
---|---|
Aug 28, 2023 | synthesize now accepts a length parameter to adjust audio output duration to the specified length |
Jul 13, 2023 | custom voice creation/cancellation API |
REST API
Authentication
All requests need an API key. The API key is sent in each request through the
X-API-Key
header. New API keys can be obtained at the
LMNT Account page.
List voices
Returns a list of the voices available for use in speech synthesis calls.
GET https://api.lmnt.com/speech/beta/voices
Parameters
- none
Sample request
Sample response (JSON)
{
"voices": {
"shanti": {
"name": "Shanti",
"gender": "female",
"imageUrl": "https://api.lmnt.com/img/voice/shanti.webp"
}
}
}
Notes
- The keys of the
voices
dictionary (e.g. "shanti") are used to specify the voice in the synthesize speech request below. - Some voices may also have an
id
field; please ignore that field, it will be removed soon.
Synthesize speech
Synthesizes speech for a supplied text string. Returns binary audio data in one of the supported audio formats.
POST https://api.lmnt.com/speech/beta/synthesize
Parameters
voice
(required): which voice to render; id is found using the "List voices" API calltext
(required): the text to synthesizeformat
(optional): eithermp3
orwav
; defaults towav
(16-bit mono)speed
(optional): floating point value between 0.25 (slow) and 2.0 (fast); defaults to 1.0seed
(optional): random seed used to specify a different take; defaults to 0length
(optional): produce speech of this length in seconds; maximum 300.0 (5 minutes)
Sample request
curl https://api.lmnt.com/speech/beta/synthesize -X POST \
-H "X-API-Key: <your-api-key-here>" \
-d "voice=shanti" \
-d "text=This is a test of the LMNT speech API." \
-o sample.wav
Sample response (binary)
HTTP/1.1 200 OK
Content-Type: audio/wav
X-Sample-Rate: 24000
X-Duration-Samples: 57000
<binary data>
Notes
- The content type will be
"audio/mpeg"
for mp3 output - The returned sample rate is in Hz
- The duration is in number of samples so you can divide
x-duration-samples
byx-sample-rate
to get the duration in seconds The
length
parameter specifies how long you want the output speech to be. We will automatically speed up / slow down the speech as needed to fit this length.
Create voice
Submits a request to create a new custom voice with a supplied voice configuration and a batch of input audio data.
POST https://api.lmnt.com/voice/beta/clone
Parameters
metadata
(required): described below; must be the first field in themultipart/form-data
requestfiles
(required): binary; one or more.wav
or.mp3
file attachments
Metadata
The metadata field is a JSON object containing the following fields:
name
(required): string; a unique display name for this voiceaccent
(required): string; specifies the accent corresponding to this voicefilter
(required): bool; run noise reduction on submitted files before cloning
Sample request
curl https://api.lmnt.com/voice/beta/clone -X POST \
-H "X-API-Key: <your_api_key>" \
-F 'metadata={"name": "Shanti", "accent": "Indian", "filter": false }' \
-F files=@filename1.wav \
-F files=@filename2.mp3
Sample response (JSON)
Cancel create voice
Cancels training a voice that is currently queued for custom voice creation. Once processing has begun for a voice, it can no longer be cancelled and this call will return a 400 error.
POST https://api.lmnt.com/voice/beta/cancel_clone
Parameters
voice-id
(required): a voice ID returned from the create voice request
Sample request
curl https://api.lmnt.com/voice/beta/cancel_clone -X POST \
-H "X-API-Key: <your_api_key>" \
-d 'voice-id=<your_voice_id>'
Sample response
Javascript API
There is a Javascript wrapper around the REST API that can be a good way to
quickly integrate with things in-browser. It extends the HTML5 audio
tag to
provide an extra synthesize
method. Here's an example of how to use it:
<html>
<head>
<script src="https://api.lmnt.com/js/lmnt-audio.js"></script>
</head>
<body>
<button id="play" onclick="play()">Play</button>
<audio is="lmnt-audio" api-key="<your-api-key-here>" autoplay>
<script>
function play() {
const audio = document.querySelector('audio');
audio.synthesize("This is a test of the LMNT speech API.", "shanti");
}
</script>
</body>
</html>