Increase of Azure AI Foundry Quota
If you need to increase your Azure AI Foundry quota, use this document, which summarizes the necessary information and links to the quota increase request form and the documentation on models and regions.
Why an Increase is Needed
Your AI agents run (or will run) directly in your Azure AI Foundry environment, so all AI workloads are subject to the limits of your Azure subscription (TPM/RPM).
Default quotas are mainly set for testing and PoC. In production deployment, especially during document ingestion and embedding generation, these limits often represent a bottleneck and significantly slow down processing.
Increasing the quota will allow for:
- faster document ingestion and re-indexing,
- higher throughput for embedding generation,
- stable performance under concurrent user load,
- lower latency and less throttling,
- production scale and reliability.
Important: Increasing the quota does not change the price. It only increases throughput. Billing remains strictly based on consumed tokens — the price per token is the same.
This is a standard Azure process for production AI deployment. We will provide you with pre-filled parameters and a justification template to make the request quick and easy.
Data for Quota Increase Request
| # | Field | Value / Note |
|---|---|---|
| 1 | Name (Authorized Representative of the Applicant) | [CLIENT] |
| 2 | Surname | [CLIENT] |
| 3 | Company Email (on company domain) | [CLIENT] |
| 4 | Company Name | [CLIENT] |
| 5 | Company Address | [CLIENT] |
| 6 | City | [CLIENT] |
| 7 | ZIP Code | [CLIENT] |
| 8 | Country | [CLIENT] |
| 9 | Subscription ID | [CLIENT] or [SIESTA.AI], if we have access to your Azure subscription |
| 10 | Justification (EXAMPLE) | Below |
| 11 | Model Type | Azure OpenAI |
| 12 | Model Deployment Quota | Model Deployment (PTU/RPM/TPM) |
| 13 | (Azure OpenAI) Quota Request Type | Global Standard |
| 14 | Global Standard Region | East US2 or Sweden Central |
| 15 | (Azure OpenAI) Global Standard Model | text-embedding-3-large |
| 16 | Quota | 10000 |
Example Justification
We are building and operating a production AI SaaS platform focused on enterprise automation (document analysis, RAG agents, email triage, CRM integration, and automating internal processes for B2B clients). We are currently running in pilot and production deployments across industries (manufacturing, real estate, insurance, enterprise services). Typical workloads include:
- high-frequency chat and API inference,
- large pipelines for document ingestion and vectorization (PDF, DOCX, web crawling),
- contextually demanding prompts with multi-step reasoning,
- concurrent use by multiple enterprise users and teams.
Current quotas are already a bottleneck during peak load and testing. With the expansion of onboarding new customers and the introduction of additional agents and integrations (HubSpot, Gmail, Google Drive, Azure Storage, internal CRM), we expect a significant increase in token throughput. We need the quota increase to:
- maintain stable latency during concurrent enterprise operations,
- support batch processing of documents and continuous ingestion pipelines,
- ensure production reliability and SLA,
- eliminate throttling during load spikes from real business workflows.
This quota increase is critical for upcoming production deployments and commercial rollouts. Without higher capacity, our ability to scale customers and ensure consistent service quality will be limited. We are committed to responsible usage, cost monitoring, and effective optimization of prompts and tokens in accordance with Azure OpenAI best practices.