Realtime API
Realtime transport for Siesta AI agents
The Realtime API lets an external application create a short-lived Siesta AI agent session, open a WebSocket, and stream provider events through Siesta AI. Use it for voice-style experiences, live agent interfaces, public widgets, and clients that need to react to transcripts, tool execution, approval states, and persisted conversation events as they happen.
Public API v1 reference
POST
/api/v1/Agent/{agentId}/realtime-sessionCreate a one-time realtime session.GET
/api/v1/Agent/{agentId}/realtimeOpen the WebSocket with conversationId and sessionId.X-Api-Key and X-Org-Id only on a trusted server.sessionId is short-lived and consumed when the socket connects.gpt-realtime-2 and pcm16.What This API Does
The API creates a realtime session for a specific Siesta AI agent conversation and proxies traffic between your client and the configured realtime provider. Siesta AI injects the agent system instructions, backend tools, optional client tools, audio settings, transcription settings, and conversation persistence.
- Realtime conversation: stream audio and provider events over WebSocket after creating a one-time session.
- Backend tools: tools configured on the Siesta AI agent are executed by Siesta AI and reported through custom events.
- Client tools: your application can expose local functions to the model and execute them inside the client.
- Conversation persistence: transcripts, assistant responses, tool status, and sub-agent status can be persisted back to the Siesta AI conversation.
The realtime session token is short-lived and one-time use. Create the session immediately before opening the WebSocket.
Quick Start Flow
- Create a realtime session with
POST /api/v1/Agent/{agentId}/realtime-session. - Connect to
wss://{api-host}{webSocketPath}. - Send standard realtime provider client events.
- Route incoming messages by top-level
type. - Execute declared client tools locally.
- For backend tool approvals, send
approval.approveorapproval.reject. - If the session expires, is consumed, or disconnects, create a new session.
Authentication

webSocketPath to the browser.The HTTP session creation endpoint is authenticated with external API headers:
X-Api-Key: <external-api-key>
X-Org-Id: <organization-id>
The WebSocket endpoint does not use X-Api-Key. It is authorized by the short-lived one-time sessionId embedded in the returned webSocketPath.
Do not expose X-Api-Key in browser code. Browser clients should call a trusted backend, and that backend should create the realtime session. The browser should receive only the returned webSocketPath.
Create Realtime Session
POST
/api/v1/Agent/{agentId}/realtime-sessionReturns sessionId, conversationId, expiry metadata, supported formats, and webSocketPath.POST /api/v1/Agent/{agentId}/realtime-session
Content-Type: application/json
X-Api-Key: <external-api-key>
X-Org-Id: <organization-id>
If conversationId is omitted, Siesta AI creates a new conversation for the agent. If it is provided, it must belong to the same agent.
Request Body
| Property | Type | Default | Description |
|---|---|---|---|
conversationId | uuid | null | null | Existing conversation id. Omit to create a new conversation. |
inputAudioFormat | string | pcm16 | Requested input audio format. |
outputAudioFormat | string | pcm16 | Requested output audio format. |
voice | string | alloy | Provider voice used for output audio. |
additionalInstructions | string | null | null | Extra instructions appended after the agent system message for this session. |
clientTools | array | [] | Tools executed by your client, not by Siesta AI. |
Example Request
POST /api/v1/Agent/3f67ef24-3f96-4c20-a3b3-5fd0abef15a1/realtime-session HTTP/1.1
Host: api.siesta.ai
Content-Type: application/json
X-Api-Key: YOUR_EXTERNAL_API_KEY
X-Org-Id: 4bdaed95-19f8-47c2-bbf0-8a476cf0a527
{
"inputAudioFormat": "pcm16",
"outputAudioFormat": "pcm16",
"voice": "alloy",
"additionalInstructions": "Keep answers short and ask one question at a time.",
"clientTools": [
{
"name": "open_booking_calendar",
"description": "Open the booking calendar for a requested date.",
"parameters": {
"type": "object",
"properties": {
"date": {
"type": "string",
"description": "Date in YYYY-MM-DD format."
}
},
"required": ["date"]
}
}
]
}
Response Body
| Property | Type | Description |
|---|---|---|
sessionId | string | One-time token used to connect to the realtime WebSocket. |
conversationId | uuid | Conversation id used by this realtime session. |
expiresAt | datetime | UTC expiration time for opening the WebSocket. Default TTL is 60 seconds. |
webSocketPath | string | Relative WebSocket path to connect to. |
supportedInputAudioFormats | string[] | Input formats supported by the deployment. |
supportedOutputAudioFormats | string[] | Output formats supported by the deployment. |
maxSessionSeconds | number | Maximum realtime session duration returned to clients. Default: 1800. |
idleTimeoutSeconds | number | Idle timeout returned to clients. Default: 60. |
providerModelName | string | Provider realtime model name, usually gpt-realtime-2. |
{
"sessionId": "15f3c5c7b61c4a16893621b3ec969962",
"conversationId": "6ad53918-7055-4f10-bd81-b06f8fcfae2a",
"expiresAt": "2026-06-09T12:00:45.1234567Z",
"webSocketPath": "/api/v1/Agent/3f67ef24-3f96-4c20-a3b3-5fd0abef15a1/realtime?conversationId=6ad53918-7055-4f10-bd81-b06f8fcfae2a&sessionId=15f3c5c7b61c4a16893621b3ec969962",
"supportedInputAudioFormats": ["pcm16"],
"supportedOutputAudioFormats": ["pcm16"],
"maxSessionSeconds": 1800,
"idleTimeoutSeconds": 60,
"providerModelName": "gpt-realtime-2"
}
The generated session is consumed when the WebSocket connects. In multi-instance deployments, use sticky routing or shared session storage.
Realtime WebSocket
GET
/api/v1/Agent/{agentId}/realtime?conversationId={conversationId}&sessionId={sessionId}Upgrade to WebSocket on the same API host using the returned one-time session token.Connect to the returned webSocketPath on the same API host:
wss://{api-host}{webSocketPath}
The path has this shape:
GET /api/v1/Agent/{agentId}/realtime?conversationId={conversationId}&sessionId={sessionId}
After the socket is accepted, Siesta AI connects to the provider, sends session.update, and starts bidirectional proxying. Your client receives provider events and Siesta AI custom events on the same socket.
type.Raw provider events and Siesta AI custom events share the same socket, so keep event routing explicit.Backend Session Configuration
Clients do not send this configuration. Siesta AI sends it internally after connecting to the provider:
{
"type": "session.update",
"session": {
"type": "realtime",
"model": "gpt-realtime-2",
"output_modalities": ["audio"],
"instructions": "{agent system message}\n\n{additionalInstructions}",
"audio": {
"input": {
"format": {
"type": "audio/pcm",
"rate": 24000
},
"transcription": {
"model": "gpt-realtime-whisper"
},
"turn_detection": {
"type": "semantic_vad"
}
},
"output": {
"format": {
"type": "audio/pcm",
"rate": 24000
},
"voice": "alloy"
}
},
"tools": [
"{backend agent tools}",
"{client tools from realtime-session request}"
],
"tool_choice": "auto"
}
}
Client To Server Messages
The backend forwards most client WebSocket messages to the provider unchanged. Use standard realtime provider client event shapes.
| Message | Behavior |
|---|---|
| Standard realtime client events | Forwarded to provider unchanged. |
| Binary frames | Forwarded to provider unchanged, subject to max frame size. |
{ "type": "approval.approve", "callId": "..." } | Consumed by Siesta AI if the call is waiting for approval. |
{ "type": "approval.reject", "call_id": "..." } | Consumed by Siesta AI if the call is waiting for approval. Both callId and call_id are accepted. |
Incoming Events
Your client receives two categories of messages:
- raw provider events,
- Siesta AI custom events.
Route by the top-level type property.
Raw Provider Events
Provider messages are forwarded first and unchanged. Examples include:
response.createdresponse.doneresponse.audio.deltaresponse.audio_transcript.doneresponse.output_audio_transcript.doneresponse.output_text.doneresponse.content.doneconversation.item.input_audio_transcription.completedinput_audio.transcript.doneresponse.function_call_arguments.doneerror
Siesta AI listens to some provider events to persist conversation messages and execute backend tools, but the raw events still reach your client.
Persistence Trigger Events
| Provider event | Persisted role | Custom events |
|---|---|---|
conversation.item.input_audio_transcription.completed | User | message.created |
input_audio.transcript.done | User | message.created |
response.audio_transcript.done | Assistant | response.id, then response.completed |
response.output_audio_transcript.done | Assistant | response.id, then response.completed |
response.output_text.done | Assistant | response.id, then response.completed |
response.content.done | Assistant | response.id, then response.completed |
Siesta AI Custom Event Envelope
{
"type": "event.name",
"data": {
"property": "value"
}
}
Custom Event Reference
| Event | Data | When it is sent |
|---|---|---|
message.created | { chatbotId, id, role, content, createdAt } | User transcript was persisted as a conversation message. |
response.id | { chatbotId, id } | Assistant transcript was persisted. |
response.completed | { chatbotId } | Assistant response persistence completed. |
response.function_invocation.start | { id, callId, title, imageUrl, chatbotId, arguments, approvalRequired, functionName } | Backend tool execution started or is waiting for approval. |
response.function_invocation.done | { id, callId, text, title, imageUrl, button, buttonLabel, buttonLink, chatbotId, status, executionTimeSeconds } | Backend tool execution finished, failed, was rejected, or timed out. |
approval.waiting | { callId, messageId, timeoutSeconds } | Backend tool requires user approval before execution. |
approval.approved | { callId, messageId } | Approval was accepted and the backend is executing the tool. |
approval.expired | { callId, messageId } | Approval timeout elapsed. |
subagent.start | { id, name, icon, iconColor, callId } | Sub-agent invocation started. |
subagent.done | { chatbotId } | Sub-agent invocation completed. |
Tool execution status is serialized as a number by the current custom WebSocket serializer:
| Status | Meaning |
|---|---|
0 | Pending |
1 | Success |
2 | Failed |
3 | Pending approval |
Client Tools
Client tools are functions declared by your client during session creation. The model sees them as function tools, but Siesta AI does not execute or persist them. Your application must listen for tool calls, execute the local function, send the function output to the provider, and request the next response.
Declaration
{
"clientTools": [
{
"name": "open_booking_calendar",
"description": "Open the booking calendar for a requested date.",
"parameters": {
"type": "object",
"properties": {
"date": {
"type": "string",
"description": "Date in YYYY-MM-DD format."
},
"durationMinutes": {
"type": "integer",
"description": "Requested meeting duration in minutes."
}
},
"required": ["date"]
}
}
]
}
Rules
nameanddescriptionare required and cannot be empty.- Tool names are case-sensitive.
- Client tool names must be unique.
- Client tool names must not conflict with backend agent tool names.
parametersmust be an object. If omitted, an empty object schema is used.- Client tool execution is not persisted by Siesta AI.
- Client tools do not emit custom
response.function_invocation.*events.
Handling A Client Tool Call
When the provider calls a client tool, your client receives a raw provider event:
{
"type": "response.function_call_arguments.done",
"call_id": "call_open_calendar_01",
"name": "open_booking_calendar",
"arguments": "{\"date\":\"2026-06-10\",\"durationMinutes\":30}"
}
If name matches a tool you declared, execute it locally and send the function output:
{
"type": "conversation.item.create",
"item": {
"type": "function_call_output",
"call_id": "call_open_calendar_01",
"output": "{\"status\":\"success\",\"result\":\"Calendar opened for 2026-06-10.\"}"
}
}
Then request the model to continue:
{
"type": "response.create"
}
If the function call is not your declared client tool, wait for Siesta AI custom backend-tool events instead.
Approval Flow For Backend Tools
Some backend tools may require user approval. The model calls the backend tool, Siesta AI persists a pending approval, and your client must ask the user to approve or reject it.
- Client receives
response.function_invocation.startwithapprovalRequired: true. - Client receives
approval.waitingwithcallId,messageId, andtimeoutSeconds. - User approves or rejects.
- Client sends
approval.approveorapproval.rejectwith the samecallId. - Approved tools emit
approval.approvedandresponse.function_invocation.done. - Timed-out approvals emit
approval.expired.
{
"type": "approval.waiting",
"data": {
"callId": "call_send_email_01",
"messageId": "da3ba88e-22f5-49de-8421-e4f43834ba42",
"timeoutSeconds": 300
}
}
Approve:
{
"type": "approval.approve",
"callId": "call_send_email_01"
}
Reject:
{
"type": "approval.reject",
"call_id": "call_send_email_01"
}
Default approval timeout is 300 seconds. If approval expires, Siesta AI returns a timeout error result to the model.
Audio And Transport
| Setting | Default | Notes |
|---|---|---|
| Input format | pcm16 | Mapped to provider audio/pcm with sample rate 24000. |
| Output format | pcm16 | Mapped to provider audio/pcm with sample rate 24000. |
| Voice | alloy | Passed to provider session configuration. |
| Turn detection | semantic_vad | Configured by backend in session.update. |
| Input transcription model | gpt-realtime-whisper | Configured by backend in session.update. |
| Max WebSocket message size | 65536 bytes | Larger messages can close the target socket with MessageTooBig. |
Siesta AI does not transcode client audio. Use provider-compatible realtime event payloads for the selected format.
Implementation Example
In production, create the session on a trusted backend so your external API key is not exposed in a browser.
const apiBaseUrl = "https://api.siesta.ai";
const agentId = "3f67ef24-3f96-4c20-a3b3-5fd0abef15a1";
async function createRealtimeSession() {
const response = await fetch(`${apiBaseUrl}/api/v1/Agent/${agentId}/realtime-session`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-Api-Key": "YOUR_EXTERNAL_API_KEY",
"X-Org-Id": "4bdaed95-19f8-47c2-bbf0-8a476cf0a527"
},
body: JSON.stringify({
inputAudioFormat: "pcm16",
outputAudioFormat: "pcm16",
voice: "alloy",
clientTools: [
{
name: "open_booking_calendar",
description: "Open the booking calendar for a requested date.",
parameters: {
type: "object",
properties: {
date: { type: "string" }
},
required: ["date"]
}
}
]
})
});
if (!response.ok) {
throw new Error(await response.text());
}
return response.json();
}
function openRealtimeSocket(session) {
const wsUrl = new URL(session.webSocketPath, apiBaseUrl.replace(/^http/, "ws"));
const socket = new WebSocket(wsUrl);
socket.addEventListener("message", async event => {
const message = JSON.parse(event.data);
if (message.type === "response.function_call_arguments.done") {
await maybeHandleClientTool(socket, message);
return;
}
if (message.type === "approval.waiting") {
showApprovalDialog(socket, message.data);
return;
}
handleRealtimeEvent(message);
});
return socket;
}
async function maybeHandleClientTool(socket, event) {
if (event.name !== "open_booking_calendar") {
return;
}
const args = JSON.parse(event.arguments || "{}");
const result = await openBookingCalendar(args.date);
socket.send(JSON.stringify({
type: "conversation.item.create",
item: {
type: "function_call_output",
call_id: event.call_id,
output: JSON.stringify({ status: "success", result })
}
}));
socket.send(JSON.stringify({ type: "response.create" }));
}
function approve(socket, callId) {
socket.send(JSON.stringify({ type: "approval.approve", callId }));
}
function reject(socket, callId) {
socket.send(JSON.stringify({ type: "approval.reject", callId }));
}
const session = await createRealtimeSession();
const socket = openRealtimeSocket(session);
Sending Audio Or Text
Use the provider realtime event format. Siesta AI forwards these events unchanged.
{
"type": "input_audio_buffer.append",
"audio": "BASE64_PCM16_AUDIO_CHUNK"
}
{
"type": "input_audio_buffer.commit"
}
{
"type": "response.create"
}
Errors And Limits
Realtime domain errors return HTTP 400 with this shape:
{
"status": 400,
"detail": "Realtime session has expired.",
"errorCode": "RealtimeSessionExpired"
}
Error Codes
| Error code | Meaning | Client action |
|---|---|---|
RealtimeNotEnabled | Realtime is disabled for the deployment or access mode. | Disable realtime UI or contact the API owner. |
RealtimeUnsupportedConnection | The agent connection does not support realtime audio or is disabled by governance. | Use an agent with a supported OpenAI connection. |
RealtimeUnsupportedModel | The selected agent model does not support realtime audio or does not match the configured provider model. | Use an agent configured with a realtime-capable model. |
RealtimeAudioFormatUnsupported | Requested input or output audio format is not supported. | Use a format returned by session creation, usually pcm16. |
RealtimeSessionInvalid | Missing, unknown, mismatched, or otherwise invalid session token. | Create a new realtime session and reconnect. |
RealtimeSessionExpired | Session token expired before WebSocket connection. | Create a new realtime session and connect immediately. |
RealtimeSessionAlreadyUsed | One-time session token was already consumed. | Create a new realtime session. Do not retry the same token. |
Default Limits
| Limit | Default |
|---|---|
| Session TTL before WebSocket connect | 60 seconds |
| Max session duration | 1800 seconds |
| Idle timeout | 60 seconds |
| Approval timeout | 300 seconds |
| Max WebSocket message size | 65536 bytes |
| Provider connect timeout | 15 seconds |
If the WebSocket is opened with the wrong agentId, the one-time session may already be consumed before the mismatch is reported. Create a new session instead of retrying the same token.
Implementation Checklist
- Store
apiBaseUrl,agentId, and organization credentials in trusted server-side configuration. - Call
POST /api/v1/Agent/{agentId}/realtime-sessionimmediately before opening a WebSocket. - Build the WebSocket URL as
wss://{host}{webSocketPath}. - Do not send
X-Api-Keyto the WebSocket. - Parse every incoming text frame as JSON and route by top-level
type. - Handle raw provider audio and response events according to the provider realtime protocol.
- Handle Siesta AI custom events with the
{ type, data }envelope. - For backend tool events, update UI state but do not send function outputs yourself.
- For declared client tools, listen for
response.function_call_arguments.done, execute locally, sendconversation.item.create, then sendresponse.create. - For
approval.waiting, show approval UI and sendapproval.approveorapproval.rejectwith the samecallId. - On
RealtimeSessionExpiredorRealtimeSessionAlreadyUsed, create a new session instead of retrying the old one. - Keep individual WebSocket messages under
65536bytes unless your deployment config says otherwise.