Skip to main content

Increase of Azure AI Foundry Quota

If you need to increase your Azure AI Foundry quota, use this document, which summarizes the necessary information and links to the quota increase request form and the documentation on models and regions.

Why an Increase is Needed

Your AI agents run (or will run) directly in your Azure AI Foundry environment, so all AI workloads are subject to the limits of your Azure subscription (TPM/RPM).

Default quotas are mainly set for testing and PoC. In production deployment, especially during document ingestion and embedding generation, these limits often represent a bottleneck and significantly slow down processing.

Increasing the quota will allow for:

  • faster document ingestion and re-indexing,
  • higher throughput for embedding generation,
  • stable performance under concurrent user load,
  • lower latency and less throttling,
  • production scale and reliability.

Important: Increasing the quota does not change the price. It only increases throughput. Billing remains strictly based on consumed tokens — the price per token is the same.

This is a standard Azure process for production AI deployment. We will provide you with pre-filled parameters and a justification template to make the request quick and easy.

Data for Quota Increase Request

#FieldValue / Note
1Name (Authorized Representative of the Applicant)[CLIENT]
2Surname[CLIENT]
3Company Email (on company domain)[CLIENT]
4Company Name[CLIENT]
5Company Address[CLIENT]
6City[CLIENT]
7ZIP Code[CLIENT]
8Country[CLIENT]
9Subscription ID[CLIENT] or [SIESTA.AI], if we have access to your Azure subscription
10Justification (EXAMPLE)Below
11Model TypeAzure OpenAI
12Model Deployment QuotaModel Deployment (PTU/RPM/TPM)
13(Azure OpenAI) Quota Request TypeGlobal Standard
14Global Standard RegionEast US2 or Sweden Central
15(Azure OpenAI) Global Standard Modeltext-embedding-3-large
16Quota10000

Example Justification

We are building and operating a production AI SaaS platform focused on enterprise automation (document analysis, RAG agents, email triage, CRM integration, and automating internal processes for B2B clients). We are currently running in pilot and production deployments across industries (manufacturing, real estate, insurance, enterprise services). Typical workloads include:

  • high-frequency chat and API inference,
  • large pipelines for document ingestion and vectorization (PDF, DOCX, web crawling),
  • contextually demanding prompts with multi-step reasoning,
  • concurrent use by multiple enterprise users and teams.

Current quotas are already a bottleneck during peak load and testing. With the expansion of onboarding new customers and the introduction of additional agents and integrations (HubSpot, Gmail, Google Drive, Azure Storage, internal CRM), we expect a significant increase in token throughput. We need the quota increase to:

  • maintain stable latency during concurrent enterprise operations,
  • support batch processing of documents and continuous ingestion pipelines,
  • ensure production reliability and SLA,
  • eliminate throttling during load spikes from real business workflows.

This quota increase is critical for upcoming production deployments and commercial rollouts. Without higher capacity, our ability to scale customers and ensure consistent service quality will be limited. We are committed to responsible usage, cost monitoring, and effective optimization of prompts and tokens in accordance with Azure OpenAI best practices.