Skip to main content

Data

Overview

The Data module serves to consolidate multiple data sources into a single logical unit. A data collection represents a central container of data that can subsequently be utilized in AI agents, workflows, or analytical tools in Siesta AI.

Each collection:

  • has its own name and description,
  • contains one or more data sources,
  • allows for the management and organization of data according to its purpose.

Overview of Data Collections

On the main screen of Data Collections, a list of all created collections is displayed in the form of a table.

Displayed columns:

  • Name – the name of the data collection
  • Description – a brief description of the collection's purpose
  • Data Sources – icons of connected data sources
  • Total files - total number of connected files
  • Total size - total size of all connected files
  • Created – date and time of creation
  • Actions – additional options for working with the collection, including viewing details and deleting the collection

At the top of the screen, the following is available:

  • collection search,
  • Create Collection button.

Overview of Data Collections

Creating a New Data Collection

After clicking on Create Collection, a dialog for creating a new collection will open.

Mandatory Fields

  • Name – a unique name for the data collection (e.g., Neurology Research Results – Big Data Collection).
  • Description – a brief description of the content and purpose of the collection.

Actions

  • Cancel – closes the dialog without saving
  • Create – creates a new data collection

Create Collection Dialog

Data Collection Detail

When a specific collection is opened, its detail page is displayed.

Displayed information:

  • collection name,
  • overview of connected data sources.

The page includes the Edit Data Collection button and Add Data Source.

Data Collection Detail

Adding a Data Source to the Collection

By clicking on Add Data Source, a selection of data source types will open.

Available Options

  • Manual Upload – manual file upload
  • Azure Storage Account
  • Jira
  • Firecrawl
  • Google Drive
  • Confluence

Data Source Type Selection

Configuration of Individual Data Sources

Jira

For the Jira data source, you will set:

  • Name and Description,
  • Connection ID (connected Jira connector),
  • Sync schedule,
  • Data collection (option to move the source to another collection),
  • Project Key,
  • sections Retriever and Processing.

Direct link to the connector: Jira.

Jira Data Source Configuration

Azure Storage Account

For the Azure Storage Account data source, you will set:

  • Name and Description,
  • Connection ID (connected Azure Storage connector),
  • Sync schedule,
  • Data collection (option to move the source to another collection),
  • list of Blobs (including adding/deleting items),
  • sections Retriever and Processing.

Direct link to the connector: Azure Storage Account.

Azure Storage Account Data Source Configuration

Confluence

For the Confluence data source, you will set:

  • Name and Description,
  • Connection ID (connected Confluence connector),
  • Sync schedule,
  • Data collection (option to move the source to another collection),
  • Space Key.

Direct link to the connector: Confluence.

Confluence Data Source Configuration

Firecrawl

For the Firecrawl data source, you will set:

  • Name and Description,
  • Connection ID (connected Firecrawl connector),
  • Sync schedule,
  • Data collection (option to move the source to another collection),
  • Scrape type,
  • Scrape URL,
  • Limit and optional regex filters (e.g., Include paths regex).

Direct link to the connector: Firecrawl.

Firecrawl Data Source Configuration

Google Drive

For the Google Drive data source, you will set:

  • Name and Description,
  • Connection ID (connected Google Drive connector),
  • Sync schedule,
  • list of Folders (including adding/deleting items),
  • Data collection (option to move the source to another collection),
  • sections Retriever and Processing.

Direct link to the connector: Google Drive.

Google Drive Data Source Configuration

Manual Upload

For the Manual Upload data source, you will set:

  • Name and Description,
  • section Upload files for uploading files,
  • Data collection (option to move the source to another collection),
  • sections Retriever and Processing.

Manual Upload Data Source Configuration

Retriever and Processing

In these sections, you can control the way data is retrieved and prepared for AI:

  • Retriever: Skip LLM query rewrite, Skip LLM re-ranking, Max results count
  • Processing: Chunking strategy, Advanced content extraction, JSON Features, JSON Metadata Definitions

The Retriever and Processing settings can also be changed for the Azure Storage Account, Manual Upload, and Google Drive sources.

Retriever and Processing Settings

Data Source Detail (Overview, Configuration, Files, Logs)

Each data source has its own tabs in the detail view:

  • Overview
  • Configuration
  • Files
  • Logs

Data Source Detail - Overview

Overview

The Overview tab provides a quick summary:

  • type of source,
  • status of the latest synchronizations,
  • number of files and total size,
  • basic statistics by status and type of files.

Configuration

In the Configuration tab, you can modify the settings of the data source (e.g., name, description, sync frequency, assigned collection). The Data collection field allows you to move the data source between collections at any time. There is also a Danger Zone for destructive actions such as re-ingesting all files.

Data Source Detail - Configuration

Files

In the Files tab, there is an overview of the files of the given source:

  • filtering by status,
  • file metadata (type, size, status, indexed/readable),
  • option to open the detail of a specific file.

Data Source Detail - Files

File Detail and Chunking

In the file detail, you can see how the document was chunked for AI indexing. The file status, metadata, and individual chunks that enter the retrieval pipeline are displayed.

File Detail - Chunks

Logs

In the Logs tab, you will find the history of synchronizations and processing:

  • start and completion dates,
  • counts of processed/skipped items,
  • trigger for synchronization (e.g., scheduled).

Data Source Detail - Logs

Connecting Data Collections to Agents

Data collections are subsequently assigned to agents in their settings. Details can be found in the section Agent Configuration.

Typical Uses of Data Collections

Data collections are primarily used for:

  • organizing a larger number of files,
  • consolidating data by topic or project,
  • creating a single source of truth for AI agents,
  • reusing the same data in different workflows,
  • scaling data work without the need for duplication.

Summary

Data collections in Siesta AI enable clear data management and effective utilization across the entire platform. Well-structured collections are the foundation for quality outputs from AI agents and automated workflows.