Neuphonic’s primary offering is its text-to-speech technology, which serves as the foundation for various features, including Agents. Visit our Playground to experiment with different models and voices, and then continue reading below to learn how to implement this in code.

Don’t forget to visit the Quickstart guide to obtain your API Key and get your environment set up.

Speech Synthesis

You can generate speech using the API in two ways: Server Side Events (SSE) and WebSockets.

SSE is a streaming protocol where you send a single request to our API to convert text into speech. Our API will then stream the generated audio back to you in real-time, providing the lowest possible latency. Below are some examples of how to use this endpoint.

The SDK examples demonstrate how to send a message to the API, receive the audio stream from the server, and play it through your device’s speaker.

# Replace <API_KEY> with your actual API key.
# To switch languages, replace the lang_code in the path parameter (e.g., /en) with the desired language code.
curl -N --request POST \
  --url https://eu-west-1.api.neuphonic.com/sse/speak/en \
  --header 'Content-Type: application/json' \
  --header 'X-API-KEY: <API_KEY>' \
  --header 'Accept: text/event-stream' \
  --data '{
  "text": "Hello, world!"
}'

The chosen voice needs to be available for the chosen model and language. Voices

Text-to-Speech Configuration

The settings for Text-to-Speech generation can include the following parameters.

lang_code
string
default:"en"
required

Language code for the desired language. Examples: 'en', 'es', 'de', 'nl', 'hi'

voice_id
string

The voice ID for the desired voice. Based on what voice_id you chose different models will be leveraged. Find all available voices here: Voices Examples: '8e9c4bc8-3979-48ab-8626-df53befc2090'

speed
float
default:1

Playback speed of the audio. Has to be in [0.7, 2.0] Examples: 0.7, 1.0, 1.5

sampling_rate
int
default:22050

Sampling rate of the audio returned from the server. Options: 8000, 16000, 22050

encoding
string
default:"pcm_linear"

Encoding of the audio returned from the server. Options: 'pcm_linear', 'pcm_mulaw'

More Examples

To see more examples, check out our Python SDK examples and JavaScript SDK examples on GitHub.