Skip to main content

Realtime API

Realtime transport for Siesta AI agents

The Realtime API lets an external application create a short-lived Siesta AI agent session, open a WebSocket, and stream provider events through Siesta AI. Use it for voice-style experiences, live agent interfaces, public widgets, and clients that need to react to transcripts, tool execution, approval states, and persisted conversation events as they happen.

Public API v1 reference

POST

/api/v1/Agent/{agentId}/realtime-sessionCreate a one-time realtime session.

GET

/api/v1/Agent/{agentId}/realtimeOpen the WebSocket with conversationId and sessionId.
Server contractBackend-created sessionUse X-Api-Key and X-Org-Id only on a trusted server.
Session lifecycleOne-time tokenThe returned sessionId is short-lived and consumed when the socket connects.
Realtime defaultsModel and audioRealtime sessions default to gpt-realtime-2 and pcm16.

What This API Does

The API creates a realtime session for a specific Siesta AI agent conversation and proxies traffic between your client and the configured realtime provider. Siesta AI injects the agent system instructions, backend tools, optional client tools, audio settings, transcription settings, and conversation persistence.

  • Realtime conversation: stream audio and provider events over WebSocket after creating a one-time session.
  • Backend tools: tools configured on the Siesta AI agent are executed by Siesta AI and reported through custom events.
  • Client tools: your application can expose local functions to the model and execute them inside the client.
  • Conversation persistence: transcripts, assistant responses, tool status, and sub-agent status can be persisted back to the Siesta AI conversation.

The realtime session token is short-lived and one-time use. Create the session immediately before opening the WebSocket.

Quick Start Flow

  1. Create a realtime session with POST /api/v1/Agent/{agentId}/realtime-session.
  2. Connect to wss://{api-host}{webSocketPath}.
  3. Send standard realtime provider client events.
  4. Route incoming messages by top-level type.
  5. Execute declared client tools locally.
  6. For backend tool approvals, send approval.approve or approval.reject.
  7. If the session expires, is consumed, or disconnects, create a new session.

Authentication

Keep the external API key server-side.Create realtime sessions from your backend and return only webSocketPath to the browser.

The HTTP session creation endpoint is authenticated with external API headers:

X-Api-Key: <external-api-key>
X-Org-Id: <organization-id>

The WebSocket endpoint does not use X-Api-Key. It is authorized by the short-lived one-time sessionId embedded in the returned webSocketPath.

Do not expose X-Api-Key in browser code. Browser clients should call a trusted backend, and that backend should create the realtime session. The browser should receive only the returned webSocketPath.

Create Realtime Session

POST

/api/v1/Agent/{agentId}/realtime-sessionReturns sessionId, conversationId, expiry metadata, supported formats, and webSocketPath.
POST /api/v1/Agent/{agentId}/realtime-session
Content-Type: application/json
X-Api-Key: <external-api-key>
X-Org-Id: <organization-id>

If conversationId is omitted, Siesta AI creates a new conversation for the agent. If it is provided, it must belong to the same agent.

Request Body

PropertyTypeDefaultDescription
conversationIduuid | nullnullExisting conversation id. Omit to create a new conversation.
inputAudioFormatstringpcm16Requested input audio format.
outputAudioFormatstringpcm16Requested output audio format.
voicestringalloyProvider voice used for output audio.
additionalInstructionsstring | nullnullExtra instructions appended after the agent system message for this session.
clientToolsarray[]Tools executed by your client, not by Siesta AI.

Example Request

POST /api/v1/Agent/3f67ef24-3f96-4c20-a3b3-5fd0abef15a1/realtime-session HTTP/1.1
Host: api.siesta.ai
Content-Type: application/json
X-Api-Key: YOUR_EXTERNAL_API_KEY
X-Org-Id: 4bdaed95-19f8-47c2-bbf0-8a476cf0a527

{
"inputAudioFormat": "pcm16",
"outputAudioFormat": "pcm16",
"voice": "alloy",
"additionalInstructions": "Keep answers short and ask one question at a time.",
"clientTools": [
{
"name": "open_booking_calendar",
"description": "Open the booking calendar for a requested date.",
"parameters": {
"type": "object",
"properties": {
"date": {
"type": "string",
"description": "Date in YYYY-MM-DD format."
}
},
"required": ["date"]
}
}
]
}

Response Body

PropertyTypeDescription
sessionIdstringOne-time token used to connect to the realtime WebSocket.
conversationIduuidConversation id used by this realtime session.
expiresAtdatetimeUTC expiration time for opening the WebSocket. Default TTL is 60 seconds.
webSocketPathstringRelative WebSocket path to connect to.
supportedInputAudioFormatsstring[]Input formats supported by the deployment.
supportedOutputAudioFormatsstring[]Output formats supported by the deployment.
maxSessionSecondsnumberMaximum realtime session duration returned to clients. Default: 1800.
idleTimeoutSecondsnumberIdle timeout returned to clients. Default: 60.
providerModelNamestringProvider realtime model name, usually gpt-realtime-2.
{
"sessionId": "15f3c5c7b61c4a16893621b3ec969962",
"conversationId": "6ad53918-7055-4f10-bd81-b06f8fcfae2a",
"expiresAt": "2026-06-09T12:00:45.1234567Z",
"webSocketPath": "/api/v1/Agent/3f67ef24-3f96-4c20-a3b3-5fd0abef15a1/realtime?conversationId=6ad53918-7055-4f10-bd81-b06f8fcfae2a&sessionId=15f3c5c7b61c4a16893621b3ec969962",
"supportedInputAudioFormats": ["pcm16"],
"supportedOutputAudioFormats": ["pcm16"],
"maxSessionSeconds": 1800,
"idleTimeoutSeconds": 60,
"providerModelName": "gpt-realtime-2"
}

The generated session is consumed when the WebSocket connects. In multi-instance deployments, use sticky routing or shared session storage.

Realtime WebSocket

GET

/api/v1/Agent/{agentId}/realtime?conversationId={conversationId}&sessionId={sessionId}Upgrade to WebSocket on the same API host using the returned one-time session token.

Connect to the returned webSocketPath on the same API host:

wss://{api-host}{webSocketPath}

The path has this shape:

GET /api/v1/Agent/{agentId}/realtime?conversationId={conversationId}&sessionId={sessionId}

After the socket is accepted, Siesta AI connects to the provider, sends session.update, and starts bidirectional proxying. Your client receives provider events and Siesta AI custom events on the same socket.

Route incoming frames by type.Raw provider events and Siesta AI custom events share the same socket, so keep event routing explicit.

Backend Session Configuration

Clients do not send this configuration. Siesta AI sends it internally after connecting to the provider:

{
"type": "session.update",
"session": {
"type": "realtime",
"model": "gpt-realtime-2",
"output_modalities": ["audio"],
"instructions": "{agent system message}\n\n{additionalInstructions}",
"audio": {
"input": {
"format": {
"type": "audio/pcm",
"rate": 24000
},
"transcription": {
"model": "gpt-realtime-whisper"
},
"turn_detection": {
"type": "semantic_vad"
}
},
"output": {
"format": {
"type": "audio/pcm",
"rate": 24000
},
"voice": "alloy"
}
},
"tools": [
"{backend agent tools}",
"{client tools from realtime-session request}"
],
"tool_choice": "auto"
}
}

Client To Server Messages

The backend forwards most client WebSocket messages to the provider unchanged. Use standard realtime provider client event shapes.

MessageBehavior
Standard realtime client eventsForwarded to provider unchanged.
Binary framesForwarded to provider unchanged, subject to max frame size.
{ "type": "approval.approve", "callId": "..." }Consumed by Siesta AI if the call is waiting for approval.
{ "type": "approval.reject", "call_id": "..." }Consumed by Siesta AI if the call is waiting for approval. Both callId and call_id are accepted.

Incoming Events

Your client receives two categories of messages:

  • raw provider events,
  • Siesta AI custom events.

Route by the top-level type property.

Raw Provider Events

Provider messages are forwarded first and unchanged. Examples include:

  • response.created
  • response.done
  • response.audio.delta
  • response.audio_transcript.done
  • response.output_audio_transcript.done
  • response.output_text.done
  • response.content.done
  • conversation.item.input_audio_transcription.completed
  • input_audio.transcript.done
  • response.function_call_arguments.done
  • error

Siesta AI listens to some provider events to persist conversation messages and execute backend tools, but the raw events still reach your client.

Persistence Trigger Events

Provider eventPersisted roleCustom events
conversation.item.input_audio_transcription.completedUsermessage.created
input_audio.transcript.doneUsermessage.created
response.audio_transcript.doneAssistantresponse.id, then response.completed
response.output_audio_transcript.doneAssistantresponse.id, then response.completed
response.output_text.doneAssistantresponse.id, then response.completed
response.content.doneAssistantresponse.id, then response.completed

Siesta AI Custom Event Envelope

{
"type": "event.name",
"data": {
"property": "value"
}
}

Custom Event Reference

EventDataWhen it is sent
message.created{ chatbotId, id, role, content, createdAt }User transcript was persisted as a conversation message.
response.id{ chatbotId, id }Assistant transcript was persisted.
response.completed{ chatbotId }Assistant response persistence completed.
response.function_invocation.start{ id, callId, title, imageUrl, chatbotId, arguments, approvalRequired, functionName }Backend tool execution started or is waiting for approval.
response.function_invocation.done{ id, callId, text, title, imageUrl, button, buttonLabel, buttonLink, chatbotId, status, executionTimeSeconds }Backend tool execution finished, failed, was rejected, or timed out.
approval.waiting{ callId, messageId, timeoutSeconds }Backend tool requires user approval before execution.
approval.approved{ callId, messageId }Approval was accepted and the backend is executing the tool.
approval.expired{ callId, messageId }Approval timeout elapsed.
subagent.start{ id, name, icon, iconColor, callId }Sub-agent invocation started.
subagent.done{ chatbotId }Sub-agent invocation completed.

Tool execution status is serialized as a number by the current custom WebSocket serializer:

StatusMeaning
0Pending
1Success
2Failed
3Pending approval

Client Tools

Client tools are functions declared by your client during session creation. The model sees them as function tools, but Siesta AI does not execute or persist them. Your application must listen for tool calls, execute the local function, send the function output to the provider, and request the next response.

Declaration

{
"clientTools": [
{
"name": "open_booking_calendar",
"description": "Open the booking calendar for a requested date.",
"parameters": {
"type": "object",
"properties": {
"date": {
"type": "string",
"description": "Date in YYYY-MM-DD format."
},
"durationMinutes": {
"type": "integer",
"description": "Requested meeting duration in minutes."
}
},
"required": ["date"]
}
}
]
}

Rules

  • name and description are required and cannot be empty.
  • Tool names are case-sensitive.
  • Client tool names must be unique.
  • Client tool names must not conflict with backend agent tool names.
  • parameters must be an object. If omitted, an empty object schema is used.
  • Client tool execution is not persisted by Siesta AI.
  • Client tools do not emit custom response.function_invocation.* events.

Handling A Client Tool Call

When the provider calls a client tool, your client receives a raw provider event:

{
"type": "response.function_call_arguments.done",
"call_id": "call_open_calendar_01",
"name": "open_booking_calendar",
"arguments": "{\"date\":\"2026-06-10\",\"durationMinutes\":30}"
}

If name matches a tool you declared, execute it locally and send the function output:

{
"type": "conversation.item.create",
"item": {
"type": "function_call_output",
"call_id": "call_open_calendar_01",
"output": "{\"status\":\"success\",\"result\":\"Calendar opened for 2026-06-10.\"}"
}
}

Then request the model to continue:

{
"type": "response.create"
}

If the function call is not your declared client tool, wait for Siesta AI custom backend-tool events instead.

Approval Flow For Backend Tools

Some backend tools may require user approval. The model calls the backend tool, Siesta AI persists a pending approval, and your client must ask the user to approve or reject it.

  1. Client receives response.function_invocation.start with approvalRequired: true.
  2. Client receives approval.waiting with callId, messageId, and timeoutSeconds.
  3. User approves or rejects.
  4. Client sends approval.approve or approval.reject with the same callId.
  5. Approved tools emit approval.approved and response.function_invocation.done.
  6. Timed-out approvals emit approval.expired.
{
"type": "approval.waiting",
"data": {
"callId": "call_send_email_01",
"messageId": "da3ba88e-22f5-49de-8421-e4f43834ba42",
"timeoutSeconds": 300
}
}

Approve:

{
"type": "approval.approve",
"callId": "call_send_email_01"
}

Reject:

{
"type": "approval.reject",
"call_id": "call_send_email_01"
}

Default approval timeout is 300 seconds. If approval expires, Siesta AI returns a timeout error result to the model.

Audio And Transport

SettingDefaultNotes
Input formatpcm16Mapped to provider audio/pcm with sample rate 24000.
Output formatpcm16Mapped to provider audio/pcm with sample rate 24000.
VoicealloyPassed to provider session configuration.
Turn detectionsemantic_vadConfigured by backend in session.update.
Input transcription modelgpt-realtime-whisperConfigured by backend in session.update.
Max WebSocket message size65536 bytesLarger messages can close the target socket with MessageTooBig.

Siesta AI does not transcode client audio. Use provider-compatible realtime event payloads for the selected format.

Implementation Example

In production, create the session on a trusted backend so your external API key is not exposed in a browser.

const apiBaseUrl = "https://api.siesta.ai";
const agentId = "3f67ef24-3f96-4c20-a3b3-5fd0abef15a1";

async function createRealtimeSession() {
const response = await fetch(`${apiBaseUrl}/api/v1/Agent/${agentId}/realtime-session`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-Api-Key": "YOUR_EXTERNAL_API_KEY",
"X-Org-Id": "4bdaed95-19f8-47c2-bbf0-8a476cf0a527"
},
body: JSON.stringify({
inputAudioFormat: "pcm16",
outputAudioFormat: "pcm16",
voice: "alloy",
clientTools: [
{
name: "open_booking_calendar",
description: "Open the booking calendar for a requested date.",
parameters: {
type: "object",
properties: {
date: { type: "string" }
},
required: ["date"]
}
}
]
})
});

if (!response.ok) {
throw new Error(await response.text());
}

return response.json();
}

function openRealtimeSocket(session) {
const wsUrl = new URL(session.webSocketPath, apiBaseUrl.replace(/^http/, "ws"));
const socket = new WebSocket(wsUrl);

socket.addEventListener("message", async event => {
const message = JSON.parse(event.data);

if (message.type === "response.function_call_arguments.done") {
await maybeHandleClientTool(socket, message);
return;
}

if (message.type === "approval.waiting") {
showApprovalDialog(socket, message.data);
return;
}

handleRealtimeEvent(message);
});

return socket;
}

async function maybeHandleClientTool(socket, event) {
if (event.name !== "open_booking_calendar") {
return;
}

const args = JSON.parse(event.arguments || "{}");
const result = await openBookingCalendar(args.date);

socket.send(JSON.stringify({
type: "conversation.item.create",
item: {
type: "function_call_output",
call_id: event.call_id,
output: JSON.stringify({ status: "success", result })
}
}));

socket.send(JSON.stringify({ type: "response.create" }));
}

function approve(socket, callId) {
socket.send(JSON.stringify({ type: "approval.approve", callId }));
}

function reject(socket, callId) {
socket.send(JSON.stringify({ type: "approval.reject", callId }));
}

const session = await createRealtimeSession();
const socket = openRealtimeSocket(session);

Sending Audio Or Text

Use the provider realtime event format. Siesta AI forwards these events unchanged.

{
"type": "input_audio_buffer.append",
"audio": "BASE64_PCM16_AUDIO_CHUNK"
}
{
"type": "input_audio_buffer.commit"
}
{
"type": "response.create"
}

Errors And Limits

Realtime domain errors return HTTP 400 with this shape:

{
"status": 400,
"detail": "Realtime session has expired.",
"errorCode": "RealtimeSessionExpired"
}

Error Codes

Error codeMeaningClient action
RealtimeNotEnabledRealtime is disabled for the deployment or access mode.Disable realtime UI or contact the API owner.
RealtimeUnsupportedConnectionThe agent connection does not support realtime audio or is disabled by governance.Use an agent with a supported OpenAI connection.
RealtimeUnsupportedModelThe selected agent model does not support realtime audio or does not match the configured provider model.Use an agent configured with a realtime-capable model.
RealtimeAudioFormatUnsupportedRequested input or output audio format is not supported.Use a format returned by session creation, usually pcm16.
RealtimeSessionInvalidMissing, unknown, mismatched, or otherwise invalid session token.Create a new realtime session and reconnect.
RealtimeSessionExpiredSession token expired before WebSocket connection.Create a new realtime session and connect immediately.
RealtimeSessionAlreadyUsedOne-time session token was already consumed.Create a new realtime session. Do not retry the same token.

Default Limits

LimitDefault
Session TTL before WebSocket connect60 seconds
Max session duration1800 seconds
Idle timeout60 seconds
Approval timeout300 seconds
Max WebSocket message size65536 bytes
Provider connect timeout15 seconds

If the WebSocket is opened with the wrong agentId, the one-time session may already be consumed before the mismatch is reported. Create a new session instead of retrying the same token.

Implementation Checklist

  • Store apiBaseUrl, agentId, and organization credentials in trusted server-side configuration.
  • Call POST /api/v1/Agent/{agentId}/realtime-session immediately before opening a WebSocket.
  • Build the WebSocket URL as wss://{host}{webSocketPath}.
  • Do not send X-Api-Key to the WebSocket.
  • Parse every incoming text frame as JSON and route by top-level type.
  • Handle raw provider audio and response events according to the provider realtime protocol.
  • Handle Siesta AI custom events with the { type, data } envelope.
  • For backend tool events, update UI state but do not send function outputs yourself.
  • For declared client tools, listen for response.function_call_arguments.done, execute locally, send conversation.item.create, then send response.create.
  • For approval.waiting, show approval UI and send approval.approve or approval.reject with the same callId.
  • On RealtimeSessionExpired or RealtimeSessionAlreadyUsed, create a new session instead of retrying the old one.
  • Keep individual WebSocket messages under 65536 bytes unless your deployment config says otherwise.