Your AI bill is out of control. Cloudflare can fix it now.

AI summary · key takeaways

• Many enterprises have experienced runaway AI costs due to shared API keys that provide no visibility into who is consuming tokens or which models are being used • Cloudflare AI Gateway now offers spend limits based on dollar budgets (not just token limits) that can be scoped by model, provider, user, team, or application with configurable time windows • Identity-driven budgets integrate with existing identity providers through Cloudflare Access to automatically attribute AI usage to specific users and enforce per-user or per-team spending policies • Organizations can implement intelligent routing strategies that automatically downgrade to cheaper models when budgets are exceeded, preventing workflow disruption while controlling costs • Most AI tasks don't require frontier models, but without proper controls and visibility, users default to the most expensive options available

There isn't a CIO on the planet not worried about AI spend right now. CFOs are increasingly nervous, too. For fear of falling behind, many companies have pushed their employees to use AI as aggressively as possible.

The edict was clear: "Move fast, we'll figure out the bill later." And for the most part, it worked: AI has been genuinely transformational for the teams that leaned in. But the costs are real: we’ve heard countless horror stories of huge bills and painful overages on token spend.

Today, we're announcing spend controls in Cloudflare AI Gateway, and a closed beta for identity-driven budgets and routing using Cloudflare Access and your existing identity provider. As we’ve spoken with hundreds of companies about their AI strategy, we’ve seen a common story: The company gives every engineer access to frontier models through a shared API key.

Usage takes off. At the end of the month, finance pulls the invoice and nobody can explain where the money went. Was it the machine learning team training a new pipeline?

Was it an intern running Claude Opus on email triage? Was it a runaway continuous integration job that burned through 50 million tokens in a weekend? Nobody knows, because the API key doesn't tell you who used it.

Without guidelines, staff will generally reach for the biggest model available. And why wouldn't they? If there's no budget, no visibility, and no routing logic, the rational move is to use the most powerful model for everything.

The problem is that most tasks don't need a frontier model. A code review summary doesn't need the same model as a complex architecture refactor. A log parser doesn't need the same model as a customer-facing content generator.

It should be easy to select the right tool for the job, rather than defaulting to the most powerful and expensive one. And it should be simple to see where the spend is going. You can't calculate ROI on your AI spend without visibility on what you're spending, and you can't protect that ROI without controls.

Every other line item in a business has a budget and per-team attribution and AI spend should be no different. What AI Gateway is AI Gateway sits between your applications and AI providers. Instead of calling OpenAI, Anthropic, Google, or any other provider directly, your requests route through AI Gateway first.

This immediately gives you several useful tools: Unified billing to easily switch between different providers and models Logging across all providers — every request, token count, and cost in one place Response caching Rate limiting Content guardrails and the ability to block Personally Identifiable Information (PII) and secrets before they reach the model However, AI Gateway didn’t have an easy way to answer who is spending what or how you might set limits on AI spend.

You could see aggregate usage across your account. But you couldn't see that Jane from engineering burned through $2,000 on Claude this month while the entire data science team only used $400. You couldn't set a budget that said "engineering gets $5,000/month on frontier models, interns get $200/month on Kimi K2.

6." That changes today. Spend limits: budgets for AI usage AI Gateway now supports spend limits as a core feature.

These are true cost control measures in the form of budgets set in dollars, not tokens, that track cumulative spend across all requests, operating independently of traditional rate limiting. You can scope limits to any combination of dimensions: model, provider, or admin-defined custom attributes like user, team, or application. Windows can be fixed (resets on the first of the month, Monday, or midnight) or rolling, and set to daily, weekly, or monthly.

AI Gateway calculates cost per request based on the model's pricing, and tracks cumulative spend against your limit in real time. You can easily track your model spend on our analytics dashboard and filter by model, provider, or any custom attribute. You have options for what happens when the budget limit is reached.

AI Gateway will block further requests by default. Or you can set up rules through Dynamic Routes to route requests to a fallback model after you’ve hit a spend limit, so that a hard spending cap won’t kill your engineers’ workflow. We’re working to add the capability for you to also send alerts when a limit is reached.

Spend limits are available in open beta today for all AI Gateway users across all plans. Configure them in your gateway settings in the dashboard or via the API. We use this ourselves We're tracking token costs inside Cloudflare already.

Every Cloudflare employee uses AI tools daily, routing millions of requests and billions of tokens per month through AI Gateway. We faced the same question every company faces at this scale: who's using what, and how do we budget for it? We solved this by enabling AI Gateway to add identity to every request.

When an employee authenticates via Cloudflare Access, we extract their identity from the JSON Web Token (JWT) and attach it as metadata on the AI Gateway request. This makes per-user token consumption, team-level usage breakdowns, and cost attribution across the organization all visible in one place. Identity-driven budgets and policies (closed beta) In addition to spend limits, today we’re also announcing identity-driven budgets and policies as a closed beta.

Spend limits in AI Gateway let you set budgets by model, provider, or custom attributes. But your application has to pass that metadata, and AI Gateway trusts whatever it receives. For verified, automatic attribution, you need identity.

When combined with Cloudflare Access , AI Gateway can see who is making each request — not just which account, but which employee, which identity provider (IdP) group, which service, etc. Here's what that looks like in practice. You can set per-user budgets, say $500/month for individual contributors and $2,000 for senior engineers.

Originally published at blog.cloudflare.com

#AI Cost Management #API Gateway #Cloudflare #Cloud Infrastructure #Edge Compute #Identity And Access Management #Security

Your AI bill is out of control. Cloudflare can fix it now.

Talk to an architect about applying this to your stack.

More from the journal

How the 2026 World Cup affected Internet traffic

Cloudflare Internal DNS is now generally available

AWS Weekly Roundup: One-click Lambda setup prompt, OpenAI GPT-5.6 models on Bedrock, and more (July 20, 2026)