Data
Overview
The Data module serves to consolidate multiple data sources into a single logical unit. A data collection represents a central container of data that can subsequently be utilized in AI agents, workflows, or analytical tools in Siesta AI.
Each collection:
- has its own name and description,
- contains one or more data sources,
- allows for the management and organization of data according to its purpose.
Overview of Data Collections
On the main screen of Data Collections, a list of all created collections is displayed in the form of a table.
Displayed columns:
- Name – the name of the data collection
- Description – a brief description of the collection's purpose
- Data Sources – icons of connected data sources
- Total files - total number of connected files
- Total size - total size of all connected files
- Created – date and time of creation
- Actions – additional options for working with the collection, including viewing details and deleting the collection
At the top of the screen, the following is available:
- collection search,
- Create Collection button.

Creating a New Data Collection
After clicking on Create Collection, a dialog for creating a new collection will open.
Mandatory Fields
- Name – a unique name for the data collection (e.g., Neurology Research Results – Big Data Collection).
- Description – a brief description of the content and purpose of the collection.
Actions
- Cancel – closes the dialog without saving
- Create – creates a new data collection

Data Collection Detail
When a specific collection is opened, its detail page is displayed.
Displayed information:
- collection name,
- overview of connected data sources.
The page includes the Edit Data Collection button and Add Data Source.

Adding a Data Source to the Collection
By clicking on Add Data Source, a selection of data source types will open.
Available Options
- Manual Upload – manual file upload
- Azure Storage Account
- Jira
- Firecrawl
- Google Drive
- Confluence

Configuration of Individual Data Sources
Jira
For the Jira data source, you will set:
- Name and Description,
- Connection ID (connected Jira connector),
- Sync schedule,
- Data collection (option to move the source to another collection),
- Project Key,
- sections Retriever and Processing.
Direct link to the connector: Jira.

Azure Storage Account
For the Azure Storage Account data source, you will set:
- Name and Description,
- Connection ID (connected Azure Storage connector),
- Sync schedule,
- Data collection (option to move the source to another collection),
- list of Blobs (including adding/deleting items),
- sections Retriever and Processing.
Direct link to the connector: Azure Storage Account.

Confluence
For the Confluence data source, you will set:
- Name and Description,
- Connection ID (connected Confluence connector),
- Sync schedule,
- Data collection (option to move the source to another collection),
- Space Key.
Direct link to the connector: Confluence.

Firecrawl
For the Firecrawl data source, you will set:
- Name and Description,
- Connection ID (connected Firecrawl connector),
- Sync schedule,
- Data collection (option to move the source to another collection),
- Scrape type,
- Scrape URL,
- Limit and optional regex filters (e.g.,
Include paths regex).
Direct link to the connector: Firecrawl.

Google Drive
For the Google Drive data source, you will set:
- Name and Description,
- Connection ID (connected Google Drive connector),
- Sync schedule,
- list of Folders (including adding/deleting items),
- Data collection (option to move the source to another collection),
- sections Retriever and Processing.
Direct link to the connector: Google Drive.

Manual Upload
For the Manual Upload data source, you will set:
- Name and Description,
- section Upload files for uploading files,
- Data collection (option to move the source to another collection),
- sections Retriever and Processing.

Retriever and Processing
In these sections, you can control the way data is retrieved and prepared for AI:
- Retriever:
Skip LLM query rewrite,Skip LLM re-ranking,Max results count - Processing:
Chunking strategy,Advanced content extraction,JSON Features,JSON Metadata Definitions
The Retriever and Processing settings can also be changed for the Azure Storage Account, Manual Upload, and Google Drive sources.

Data Source Detail (Overview, Configuration, Files, Logs)
Each data source has its own tabs in the detail view:
- Overview
- Configuration
- Files
- Logs

Overview
The Overview tab provides a quick summary:
- type of source,
- status of the latest synchronizations,
- number of files and total size,
- basic statistics by status and type of files.
Configuration
In the Configuration tab, you can modify the settings of the data source (e.g., name, description, sync frequency, assigned collection). The Data collection field allows you to move the data source between collections at any time. There is also a Danger Zone for destructive actions such as re-ingesting all files.

Files
In the Files tab, there is an overview of the files of the given source:
- filtering by status,
- file metadata (type, size, status, indexed/readable),
- option to open the detail of a specific file.

File Detail and Chunking
In the file detail, you can see how the document was chunked for AI indexing. The file status, metadata, and individual chunks that enter the retrieval pipeline are displayed.

Logs
In the Logs tab, you will find the history of synchronizations and processing:
- start and completion dates,
- counts of processed/skipped items,
- trigger for synchronization (e.g., scheduled).

Connecting Data Collections to Agents
Data collections are subsequently assigned to agents in their settings. Details can be found in the section Agent Configuration.
Typical Uses of Data Collections
Data collections are primarily used for:
- organizing a larger number of files,
- consolidating data by topic or project,
- creating a single source of truth for AI agents,
- reusing the same data in different workflows,
- scaling data work without the need for duplication.
Summary
Data collections in Siesta AI enable clear data management and effective utilization across the entire platform. Well-structured collections are the foundation for quality outputs from AI agents and automated workflows.