Skip to content

REST API reference

The LMNT Speech REST API provides facilities for synthesizing emotive, human-like speech audio from text input. There is support for creating custom voices from a user-provided audio input data set. Those voices can then be used for speech synthesis, in addition to those already provided as part of the LMNT system.

If you are interested in accessing other functionality via an API (e.g. alignment data, bulk synthesis) please contact us at feedback@lmnt.com.

What's new

Date Description
Aug 28, 2023 synthesize now accepts a length parameter to adjust audio output duration to the specified length
Jul 13, 2023 custom voice creation/cancellation API

REST API

Authentication

All requests need an API key. The API key is sent in each request through the X-API-Key header. New API keys can be obtained at the LMNT Account page.

List voices

Returns a list of the voices available for use in speech synthesis calls.

GET https://api.lmnt.com/speech/beta/voices

Parameters

  • none

Sample request

curl https://api.lmnt.com/speech/beta/voices -H "X-API-Key: <your-api-key-here>"

Sample response (JSON)

{
  "voices": {
    "shanti": {
      "name": "Shanti",
      "gender": "female",
      "imageUrl": "https://api.lmnt.com/img/voice/shanti.webp"
    }
  }
}

Notes

  • The keys of the voices dictionary (e.g. "shanti") are used to specify the voice in the synthesize speech request below.
  • Some voices may also have an id field; please ignore that field, it will be removed soon.

Synthesize speech

Synthesizes speech for a supplied text string. Returns binary audio data in one of the supported audio formats.

POST https://api.lmnt.com/speech/beta/synthesize

Parameters

  • voice (required): which voice to render; id is found using the "List voices" API call
  • text (required): the text to synthesize
  • format (optional): either mp3 or wav; defaults to wav (16-bit mono)
  • speed (optional): floating point value between 0.25 (slow) and 2.0 (fast); defaults to 1.0
  • seed (optional): random seed used to specify a different take; defaults to 0
  • 🆕 length (optional): produce speech of this length in seconds; maximum 300.0 (5 minutes)

Sample request

curl https://api.lmnt.com/speech/beta/synthesize -X POST \
    -H "X-API-Key: <your-api-key-here>" \
    -d "voice=shanti" \
    -d "text=This is a test of the LMNT speech API." \
    -o sample.wav

Sample response (binary)

HTTP/1.1 200 OK
Content-Type: audio/wav
X-Sample-Rate: 24000
X-Duration-Samples: 57000

<binary data>

Notes

  • The content type will be "audio/mpeg" for mp3 output
  • The returned sample rate is in Hz
  • The duration is in number of samples so you can divide x-duration-samples by x-sample-rate to get the duration in seconds
  • 🆕 The length parameter specifies how long you want the output speech to be. We will automatically speed up / slow down the speech as needed to fit this length.

Create voice

Submits a request to create a new custom voice with a supplied voice configuration and a batch of input audio data.

POST https://api.lmnt.com/voice/beta/clone

Parameters

  • metadata (required): described below; must be the first field in the multipart/form-data request
  • files (required): binary; one or more .wav or .mp3 file attachments

Metadata

The metadata field is a JSON object containing the following fields:

  • name (required): string; a unique display name for this voice
  • accent (required): string; specifies the accent corresponding to this voice
  • filter (required): bool; run noise reduction on submitted files before cloning

Sample request

curl https://api.lmnt.com/voice/beta/clone -X POST \
    -H "X-API-Key: <your_api_key>" \
    -F 'metadata={"name": "Shanti", "accent": "Indian", "filter": false }' \
    -F files=@filename1.wav \
    -F files=@filename2.mp3

Sample response (JSON)

{
  "id": "your_voice_id"
}

Cancel create voice

Cancels training a voice that is currently queued for custom voice creation. Once processing has begun for a voice, it can no longer be cancelled and this call will return a 400 error.

POST https://api.lmnt.com/voice/beta/cancel_clone

Parameters

  • voice-id (required): a voice ID returned from the create voice request

Sample request

curl https://api.lmnt.com/voice/beta/cancel_clone -X POST \
    -H "X-API-Key: <your_api_key>" \
    -d 'voice-id=<your_voice_id>'

Sample response

{
  "success": true
}

Javascript API

There is a Javascript wrapper around the REST API that can be a good way to quickly integrate with things in-browser. It extends the HTML5 audio tag to provide an extra synthesize method. Here's an example of how to use it:

<html>
  <head>
    <script src="https://api.lmnt.com/js/lmnt-audio.js"></script>
  </head>
  <body>
    <button id="play" onclick="play()">Play</button>
    <audio is="lmnt-audio" api-key="<your-api-key-here>" autoplay>
    <script>
      function play() {
        const audio = document.querySelector('audio');
        audio.synthesize("This is a test of the LMNT speech API.", "shanti");
      }
    </script>
  </body>
</html>