How to Use Agents
Create and manage intelligent multilingual AI assistants.
Agents allow you to create, manage, and interact with intelligent multilingual AI assistants. Follow the example below to create a simple agent and start speaking with it. Navigate the tabs to see how you can easily change from english speaking agents to any available language.
Arabic ar
is currently not available in Agents.
When you run the example, the agent will greet you with “Hi, how can I help you today?”.
Your spoken input is then converted into text by a speech recognition system.
The prompt
you set above is used as the system prompt for a GPT-4o
model, which generates the
responses.
These responses are then converted to audio using Neuphonic’s TTS and played back to you.
You can keep the conversation going for as long as you want!
After creating an agent with client.agents.create
, you can access and manage it in the
Voice Agents dashboard. The returned agent_id
allows you to
reuse this agent configuration across multiple sessions or applications. You can also modify the agent’s
prompt, greeting, and other settings directly through the dashboard interface for quick iterations
without changing your code.
Editing the Callback
The Agent
interface described above utilizes a WebSocket connection to facilitate real-time audio
streaming between your microphone and the server.
The server processes this audio and sends back responses, which can be categorized into the following
types:
This message contains the text transcription of your spoken input. It is sent by the server once the speech recognition system detects a pause long enough to consider your turn complete. You will receive one of these messages for each turn in the conversation.
This message contains the text transcription of your spoken input. It is sent by the server once the speech recognition system detects a pause long enough to consider your turn complete. You will receive one of these messages for each turn in the conversation.
This message is sent by the server when it detects that the user has started speaking.
You will always receive exactly one of these messages before every user_transcript
message.
This indicates that you should stop any playback from the agent.
This message includes the response generated by the language model based on your transcribed input. You will receive one of these messages for each turn in the conversation.
This message provides the audio bytes of the language model’s response. Since the audio is streamed, you might receive multiple messages of this type for a single turn in the conversation.
The Agent
class, by default, plays audio_response
messages and prints llm_response
and
user_transcript
messages to the console. To add custom behavior, you can attach a custom event handler
which perform extra actions on each event, as shown below.
If you need more control than the Agent class provides, you can implement a custom solution using the underlying websocket API. This approach gives you direct control over audio input/output through custom websocket event handlers, allowing for more advanced or specialized implementations. For an example on how to do this, see this example on the pyneuphonic GitHub examples section.
View Agents
To retrieve a list of all your existing agents
The previous response does not include prompt
and greeting
because they can be quite lengthy,
especially for complex agents with extensive system prompts.
The following request will return all details for the specified agent_id
.
More Examples
To see more examples with our Python SDK, head over to the GitHub repo agents examples section.