Realtime API

Realtime transport for Siesta AI agents

The Realtime API lets an external application create a short-lived Siesta AI agent session, open a WebSocket, and stream provider events through Siesta AI. Use it for voice-style experiences, live agent interfaces, public widgets, and clients that need to react to transcripts, tool execution, approval states, and persisted conversation events as they happen.

Public API v1 reference

POST

/api/v1/Agent/{agentId}/realtime-sessionCreate a one-time realtime session.

GET

/api/v1/Agent/{agentId}/realtimeOpen the WebSocket with conversationId and sessionId.

Server contractBackend-created sessionUse X-Api-Key and X-Org-Id only on a trusted server.

Session lifecycleOne-time tokenThe returned sessionId is short-lived and consumed when the socket connects.

Realtime defaultsModel and audioRealtime sessions default to gpt-realtime-2 and pcm16.

What This API Does

The API creates a realtime session for a specific Siesta AI agent conversation and proxies traffic between your client and the configured realtime provider. Siesta AI injects the agent system instructions, backend tools, optional client tools, audio settings, transcription settings, and conversation persistence.

Realtime conversation: stream audio and provider events over WebSocket after creating a one-time session.
Backend tools: tools configured on the Siesta AI agent are executed by Siesta AI and reported through custom events.
Client tools: your application can expose local functions to the model and execute them inside the client.
Conversation persistence: transcripts, assistant responses, tool status, and sub-agent status can be persisted back to the Siesta AI conversation.

The realtime session token is short-lived and one-time use. Create the session immediately before opening the WebSocket.

Quick Start Flow

Create a realtime session with POST /api/v1/Agent/{agentId}/realtime-session.
Connect to wss://{api-host}{webSocketPath}.
Send standard realtime provider client events.
Route incoming messages by top-level type.
Execute declared client tools locally.
For backend tool approvals, send approval.approve or approval.reject.
If the session expires, is consumed, or disconnects, create a new session.

Authentication

Keep the external API key server-side.Create realtime sessions from your backend and return only webSocketPath to the browser.

The HTTP session creation endpoint is authenticated with external API headers:

X-Api-Key: <external-api-key>
X-Org-Id: <organization-id>

The WebSocket endpoint does not use X-Api-Key. It is authorized by the short-lived one-time sessionId embedded in the returned webSocketPath.

Do not expose X-Api-Key in browser code. Browser clients should call a trusted backend, and that backend should create the realtime session. The browser should receive only the returned webSocketPath.

Create Realtime Session

POST

/api/v1/Agent/{agentId}/realtime-sessionReturns sessionId, conversationId, expiry metadata, supported formats, and webSocketPath.

POST /api/v1/Agent/{agentId}/realtime-session
Content-Type: application/json
X-Api-Key: <external-api-key>
X-Org-Id: <organization-id>

If conversationId is omitted, Siesta AI creates a new conversation for the agent. If it is provided, it must belong to the same agent.

Request Body

Property	Type	Default	Description
`conversationId`	`uuid \| null`	`null`	Existing conversation id. Omit to create a new conversation.
`inputAudioFormat`	`string`	`pcm16`	Requested input audio format.
`outputAudioFormat`	`string`	`pcm16`	Requested output audio format.
`voice`	`string`	`alloy`	Provider voice used for output audio.
`additionalInstructions`	`string \| null`	`null`	Extra instructions appended after the agent system message for this session.
`clientTools`	`array`	`[]`	Tools executed by your client, not by Siesta AI.

Example Request

POST /api/v1/Agent/3f67ef24-3f96-4c20-a3b3-5fd0abef15a1/realtime-session HTTP/1.1
Host: api.siesta.ai
Content-Type: application/json
X-Api-Key: YOUR_EXTERNAL_API_KEY
X-Org-Id: 4bdaed95-19f8-47c2-bbf0-8a476cf0a527

{
  "inputAudioFormat": "pcm16",
  "outputAudioFormat": "pcm16",
  "voice": "alloy",
  "additionalInstructions": "Keep answers short and ask one question at a time.",
  "clientTools": [
    {
      "name": "open_booking_calendar",
      "description": "Open the booking calendar for a requested date.",
      "parameters": {
        "type": "object",
        "properties": {
          "date": {
            "type": "string",
            "description": "Date in YYYY-MM-DD format."
          }
        },
        "required": ["date"]
      }
    }
  ]
}

Response Body

Property	Type	Description
`sessionId`	`string`	One-time token used to connect to the realtime WebSocket.
`conversationId`	`uuid`	Conversation id used by this realtime session.
`expiresAt`	`datetime`	UTC expiration time for opening the WebSocket. Default TTL is 60 seconds.
`webSocketPath`	`string`	Relative WebSocket path to connect to.
`supportedInputAudioFormats`	`string[]`	Input formats supported by the deployment.
`supportedOutputAudioFormats`	`string[]`	Output formats supported by the deployment.
`maxSessionSeconds`	`number`	Maximum realtime session duration returned to clients. Default: `1800`.
`idleTimeoutSeconds`	`number`	Idle timeout returned to clients. Default: `60`.
`providerModelName`	`string`	Provider realtime model name, usually `gpt-realtime-2`.

{
  "sessionId": "15f3c5c7b61c4a16893621b3ec969962",
  "conversationId": "6ad53918-7055-4f10-bd81-b06f8fcfae2a",
  "expiresAt": "2026-06-09T12:00:45.1234567Z",
  "webSocketPath": "/api/v1/Agent/3f67ef24-3f96-4c20-a3b3-5fd0abef15a1/realtime?conversationId=6ad53918-7055-4f10-bd81-b06f8fcfae2a&sessionId=15f3c5c7b61c4a16893621b3ec969962",
  "supportedInputAudioFormats": ["pcm16"],
  "supportedOutputAudioFormats": ["pcm16"],
  "maxSessionSeconds": 1800,
  "idleTimeoutSeconds": 60,
  "providerModelName": "gpt-realtime-2"
}

The generated session is consumed when the WebSocket connects. In multi-instance deployments, use sticky routing or shared session storage.

Realtime WebSocket

GET

/api/v1/Agent/{agentId}/realtime?conversationId={conversationId}&sessionId={sessionId}Upgrade to WebSocket on the same API host using the returned one-time session token.

Connect to the returned webSocketPath on the same API host:

wss://{api-host}{webSocketPath}

The path has this shape:

GET /api/v1/Agent/{agentId}/realtime?conversationId={conversationId}&sessionId={sessionId}

After the socket is accepted, Siesta AI connects to the provider, sends session.update, and starts bidirectional proxying. Your client receives provider events and Siesta AI custom events on the same socket.

Route incoming frames by type.Raw provider events and Siesta AI custom events share the same socket, so keep event routing explicit.

Backend Session Configuration

Clients do not send this configuration. Siesta AI sends it internally after connecting to the provider:

{
  "type": "session.update",
  "session": {
    "type": "realtime",
    "model": "gpt-realtime-2",
    "output_modalities": ["audio"],
    "instructions": "{agent system message}\n\n{additionalInstructions}",
    "audio": {
      "input": {
        "format": {
          "type": "audio/pcm",
          "rate": 24000
        },
        "transcription": {
          "model": "gpt-realtime-whisper"
        },
        "turn_detection": {
          "type": "semantic_vad"
        }
      },
      "output": {
        "format": {
          "type": "audio/pcm",
          "rate": 24000
        },
        "voice": "alloy"
      }
    },
    "tools": [
      "{backend agent tools}",
      "{client tools from realtime-session request}"
    ],
    "tool_choice": "auto"
  }
}

Client To Server Messages

The backend forwards most client WebSocket messages to the provider unchanged. Use standard realtime provider client event shapes.

Message	Behavior
Standard realtime client events	Forwarded to provider unchanged.
Binary frames	Forwarded to provider unchanged, subject to max frame size.
`{ "type": "approval.approve", "callId": "..." }`	Consumed by Siesta AI if the call is waiting for approval.
`{ "type": "approval.reject", "call_id": "..." }`	Consumed by Siesta AI if the call is waiting for approval. Both `callId` and `call_id` are accepted.

Incoming Events

Your client receives two categories of messages:

raw provider events,
Siesta AI custom events.

Route by the top-level type property.

Raw Provider Events

Provider messages are forwarded first and unchanged. Examples include:

response.created
response.done
response.audio.delta
response.audio_transcript.done
response.output_audio_transcript.done
response.output_text.done
response.content.done
conversation.item.input_audio_transcription.completed
input_audio.transcript.done
response.function_call_arguments.done
error

Siesta AI listens to some provider events to persist conversation messages and execute backend tools, but the raw events still reach your client.

Persistence Trigger Events

Provider event	Persisted role	Custom events
`conversation.item.input_audio_transcription.completed`	User	`message.created`
`input_audio.transcript.done`	User	`message.created`
`response.audio_transcript.done`	Assistant	`response.id`, then `response.completed`
`response.output_audio_transcript.done`	Assistant	`response.id`, then `response.completed`
`response.output_text.done`	Assistant	`response.id`, then `response.completed`
`response.content.done`	Assistant	`response.id`, then `response.completed`

Siesta AI Custom Event Envelope

{
  "type": "event.name",
  "data": {
    "property": "value"
  }
}

Custom Event Reference

Event	Data	When it is sent
`message.created`	`{ chatbotId, id, role, content, createdAt }`	User transcript was persisted as a conversation message.
`response.id`	`{ chatbotId, id }`	Assistant transcript was persisted.
`response.completed`	`{ chatbotId }`	Assistant response persistence completed.
`response.function_invocation.start`	`{ id, callId, title, imageUrl, chatbotId, arguments, approvalRequired, functionName }`	Backend tool execution started or is waiting for approval.
`response.function_invocation.done`	`{ id, callId, text, title, imageUrl, button, buttonLabel, buttonLink, chatbotId, status, executionTimeSeconds }`	Backend tool execution finished, failed, was rejected, or timed out.
`approval.waiting`	`{ callId, messageId, timeoutSeconds }`	Backend tool requires user approval before execution.
`approval.approved`	`{ callId, messageId }`	Approval was accepted and the backend is executing the tool.
`approval.expired`	`{ callId, messageId }`	Approval timeout elapsed.
`subagent.start`	`{ id, name, icon, iconColor, callId }`	Sub-agent invocation started.
`subagent.done`	`{ chatbotId }`	Sub-agent invocation completed.

Tool execution status is serialized as a number by the current custom WebSocket serializer:

Status	Meaning
`0`	Pending
`1`	Success
`2`	Failed
`3`	Pending approval

Client Tools

Client tools are functions declared by your client during session creation. The model sees them as function tools, but Siesta AI does not execute or persist them. Your application must listen for tool calls, execute the local function, send the function output to the provider, and request the next response.

Declaration

{
  "clientTools": [
    {
      "name": "open_booking_calendar",
      "description": "Open the booking calendar for a requested date.",
      "parameters": {
        "type": "object",
        "properties": {
          "date": {
            "type": "string",
            "description": "Date in YYYY-MM-DD format."
          },
          "durationMinutes": {
            "type": "integer",
            "description": "Requested meeting duration in minutes."
          }
        },
        "required": ["date"]
      }
    }
  ]
}

Rules

name and description are required and cannot be empty.
Tool names are case-sensitive.
Client tool names must be unique.
Client tool names must not conflict with backend agent tool names.
parameters must be an object. If omitted, an empty object schema is used.
Client tool execution is not persisted by Siesta AI.
Client tools do not emit custom response.function_invocation.* events.

Handling A Client Tool Call

When the provider calls a client tool, your client receives a raw provider event:

{
  "type": "response.function_call_arguments.done",
  "call_id": "call_open_calendar_01",
  "name": "open_booking_calendar",
  "arguments": "{\"date\":\"2026-06-10\",\"durationMinutes\":30}"
}

If name matches a tool you declared, execute it locally and send the function output:

{
  "type": "conversation.item.create",
  "item": {
    "type": "function_call_output",
    "call_id": "call_open_calendar_01",
    "output": "{\"status\":\"success\",\"result\":\"Calendar opened for 2026-06-10.\"}"
  }
}

Then request the model to continue:

{
  "type": "response.create"
}

If the function call is not your declared client tool, wait for Siesta AI custom backend-tool events instead.

Approval Flow For Backend Tools

Some backend tools may require user approval. The model calls the backend tool, Siesta AI persists a pending approval, and your client must ask the user to approve or reject it.

Client receives response.function_invocation.start with approvalRequired: true.
Client receives approval.waiting with callId, messageId, and timeoutSeconds.
User approves or rejects.
Client sends approval.approve or approval.reject with the same callId.
Approved tools emit approval.approved and response.function_invocation.done.
Timed-out approvals emit approval.expired.

{
  "type": "approval.waiting",
  "data": {
    "callId": "call_send_email_01",
    "messageId": "da3ba88e-22f5-49de-8421-e4f43834ba42",
    "timeoutSeconds": 300
  }
}

Approve:

{
  "type": "approval.approve",
  "callId": "call_send_email_01"
}

Reject:

{
  "type": "approval.reject",
  "call_id": "call_send_email_01"
}

Default approval timeout is 300 seconds. If approval expires, Siesta AI returns a timeout error result to the model.

Audio And Transport

Setting	Default	Notes
Input format	`pcm16`	Mapped to provider `audio/pcm` with sample rate `24000`.
Output format	`pcm16`	Mapped to provider `audio/pcm` with sample rate `24000`.
Voice	`alloy`	Passed to provider session configuration.
Turn detection	`semantic_vad`	Configured by backend in `session.update`.
Input transcription model	`gpt-realtime-whisper`	Configured by backend in `session.update`.
Max WebSocket message size	`65536` bytes	Larger messages can close the target socket with `MessageTooBig`.

Siesta AI does not transcode client audio. Use provider-compatible realtime event payloads for the selected format.

Implementation Example

In production, create the session on a trusted backend so your external API key is not exposed in a browser.

const apiBaseUrl = "https://api.siesta.ai";
const agentId = "3f67ef24-3f96-4c20-a3b3-5fd0abef15a1";

async function createRealtimeSession() {
  const response = await fetch(`${apiBaseUrl}/api/v1/Agent/${agentId}/realtime-session`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "X-Api-Key": "YOUR_EXTERNAL_API_KEY",
      "X-Org-Id": "4bdaed95-19f8-47c2-bbf0-8a476cf0a527"
    },
    body: JSON.stringify({
      inputAudioFormat: "pcm16",
      outputAudioFormat: "pcm16",
      voice: "alloy",
      clientTools: [
        {
          name: "open_booking_calendar",
          description: "Open the booking calendar for a requested date.",
          parameters: {
            type: "object",
            properties: {
              date: { type: "string" }
            },
            required: ["date"]
          }
        }
      ]
    })
  });

  if (!response.ok) {
    throw new Error(await response.text());
  }

  return response.json();
}

function openRealtimeSocket(session) {
  const wsUrl = new URL(session.webSocketPath, apiBaseUrl.replace(/^http/, "ws"));
  const socket = new WebSocket(wsUrl);

  socket.addEventListener("message", async event => {
    const message = JSON.parse(event.data);

    if (message.type === "response.function_call_arguments.done") {
      await maybeHandleClientTool(socket, message);
      return;
    }

    if (message.type === "approval.waiting") {
      showApprovalDialog(socket, message.data);
      return;
    }

    handleRealtimeEvent(message);
  });

  return socket;
}

async function maybeHandleClientTool(socket, event) {
  if (event.name !== "open_booking_calendar") {
    return;
  }

  const args = JSON.parse(event.arguments || "{}");
  const result = await openBookingCalendar(args.date);

  socket.send(JSON.stringify({
    type: "conversation.item.create",
    item: {
      type: "function_call_output",
      call_id: event.call_id,
      output: JSON.stringify({ status: "success", result })
    }
  }));

  socket.send(JSON.stringify({ type: "response.create" }));
}

function approve(socket, callId) {
  socket.send(JSON.stringify({ type: "approval.approve", callId }));
}

function reject(socket, callId) {
  socket.send(JSON.stringify({ type: "approval.reject", callId }));
}

const session = await createRealtimeSession();
const socket = openRealtimeSocket(session);

Sending Audio Or Text

Use the provider realtime event format. Siesta AI forwards these events unchanged.

{
  "type": "input_audio_buffer.append",
  "audio": "BASE64_PCM16_AUDIO_CHUNK"
}

{
  "type": "input_audio_buffer.commit"
}

{
  "type": "response.create"
}

Errors And Limits

Realtime domain errors return HTTP 400 with this shape:

{
  "status": 400,
  "detail": "Realtime session has expired.",
  "errorCode": "RealtimeSessionExpired"
}

Error Codes

Error code	Meaning	Client action
`RealtimeNotEnabled`	Realtime is disabled for the deployment or access mode.	Disable realtime UI or contact the API owner.
`RealtimeUnsupportedConnection`	The agent connection does not support realtime audio or is disabled by governance.	Use an agent with a supported OpenAI connection.
`RealtimeUnsupportedModel`	The selected agent model does not support realtime audio or does not match the configured provider model.	Use an agent configured with a realtime-capable model.
`RealtimeAudioFormatUnsupported`	Requested input or output audio format is not supported.	Use a format returned by session creation, usually `pcm16`.
`RealtimeSessionInvalid`	Missing, unknown, mismatched, or otherwise invalid session token.	Create a new realtime session and reconnect.
`RealtimeSessionExpired`	Session token expired before WebSocket connection.	Create a new realtime session and connect immediately.
`RealtimeSessionAlreadyUsed`	One-time session token was already consumed.	Create a new realtime session. Do not retry the same token.

Default Limits

Limit	Default
Session TTL before WebSocket connect	60 seconds
Max session duration	1800 seconds
Idle timeout	60 seconds
Approval timeout	300 seconds
Max WebSocket message size	65536 bytes
Provider connect timeout	15 seconds

If the WebSocket is opened with the wrong agentId, the one-time session may already be consumed before the mismatch is reported. Create a new session instead of retrying the same token.

Implementation Checklist

Store apiBaseUrl, agentId, and organization credentials in trusted server-side configuration.
Call POST /api/v1/Agent/{agentId}/realtime-session immediately before opening a WebSocket.
Build the WebSocket URL as wss://{host}{webSocketPath}.
Do not send X-Api-Key to the WebSocket.
Parse every incoming text frame as JSON and route by top-level type.
Handle raw provider audio and response events according to the provider realtime protocol.
Handle Siesta AI custom events with the { type, data } envelope.
For backend tool events, update UI state but do not send function outputs yourself.
For declared client tools, listen for response.function_call_arguments.done, execute locally, send conversation.item.create, then send response.create.
For approval.waiting, show approval UI and send approval.approve or approval.reject with the same callId.
On RealtimeSessionExpired or RealtimeSessionAlreadyUsed, create a new session instead of retrying the old one.
Keep individual WebSocket messages under 65536 bytes unless your deployment config says otherwise.

What This API Does​

Quick Start Flow​

Authentication​

Create Realtime Session​

Request Body​

Example Request​

Response Body​

Realtime WebSocket​

Backend Session Configuration​

Client To Server Messages​

Incoming Events​

Raw Provider Events​

Persistence Trigger Events​

Siesta AI Custom Event Envelope​

Custom Event Reference​

Client Tools​

Declaration​

Rules​

Handling A Client Tool Call​

Approval Flow For Backend Tools​

Audio And Transport​

Implementation Example​

Sending Audio Or Text​

Errors And Limits​

Error Codes​

Default Limits​

Implementation Checklist​