Audit Claude Rate Limits in 90 Minutes (Apr 2026)

For three weeks, the most viral piece of content in Claude developer circles wasn’t a model launch. It was a Reddit thread titled “I tracked my actual API cost on a $100/month Max plan. $565 in 7 days.” Theo Browne shipped a 123,000-view YouTube video called “We need to talk about the Claude Code rate limits.” MindStudio published a piece titled “Anthropic’s Compute Shortage” that got passed around DM groups all month.

And then on April 25, 2026, Anthropic shipped a Rate Limits API — quietly, without a launch tweet, just a new section in the Claude Platform docs. It’s a small piece of plumbing. But it’s the first thing Anthropic has shipped that actually closes the loop on the trust gap that started with the April 23 Claude Code postmortem. For the first time, you can ask Anthropic, in code, “what limits am I actually working with?” — and get a structured answer back instead of having to scrape the Console.

This guide is the 3-step Monday-morning audit every Claude customer should run before they hit another 429. We’ll cover what the API actually returns (the surprises), the exact curl and Python you need, how to cross-reference against your real usage, and the alert recipe that catches throttling before it eats your sprint.

What Is the Rate Limits API, Really?

In one sentence: it’s a read-only Admin API that returns the same data you’d see on the Limits page in the Claude Console — but in JSON, callable from your gateway, your CI, your Slackbot, anywhere.

In plainer English for anyone who hasn’t lived inside the Anthropic Console: every Claude organization has a set of rate limits — caps on how many requests, input tokens, and output tokens you can push through per minute. Hit the cap, get a 429 error, wait, retry. Until April 25, the only way to know your numbers was to log into the Console and read them off a page. If you ran a multi-tenant gateway, you hardcoded the values. When Anthropic adjusted them — which has been happening more often in 2026 — your hardcoded values silently drifted out of sync with reality.

The Rate Limits API fixes the drift. Two endpoints, both GET:

Endpoint	What it returns
`/v1/organizations/rate_limits`	Org-level limits across all rate-limit groups
`/v1/organizations/workspaces/{workspace_id}/rate_limits`	Workspace-level overrides (anything not overridden inherits the org limits)

Both accept three optional query params: model (org endpoint only — pulls the entry for a specific model ID or alias), group_type (filters to one of model_group, batch, token_count, files, skills, web_search), and page (pagination — currently always one page, but loop on next_page so you don’t have to refactor later).

The catch every dev finds in the first ten seconds: this API requires an Admin API key (the kind that starts with sk-ant-admin...), not your standard sk-ant-... key. Only org members with the admin role can provision admin keys, and individual accounts can’t use the Admin API at all. If you’re a solo developer on a personal account, this guide isn’t for you yet — though the underlying rate limit headers on every Messages API response give you a similar (less structured) view.

If you’re an admin in a team or enterprise org, provision the admin key first (Console → Settings → Admin Keys), store it in your secret manager, and read on.

Why Anthropic Shipped This Now

The Rate Limits API is the second piece of a two-step trust restoration after a brutal six weeks. Step one was the April 23 postmortem: Anthropic publicly named three product changes that hurt Claude Code quality (default reasoning dropped from “high” to “medium” on March 4; a bug discarding reasoning history mid-session on March 26; a 25-word response cap between tool calls on April 16) and reset usage limits for every Claude Code subscriber. Step two, two days later, is the Rate Limits API: instead of asking customers to trust that limits behave the way the Console says, give them the introspection tool to verify it programmatically.

Both moves matter because the underlying complaint was never “limits are too tight” — it was “limits are unpredictable.” A user-trust crisis is rarely about the number; it’s about the gap between expected and actual. The Rate Limits API doesn’t make limits more generous. It makes them legible.

That’s a real fix. It’s also incomplete — we’ll get to what’s still missing in a moment.

The 3-Step Audit Every Claude Customer Should Run This Week

Block 90 minutes on a Monday. By the end of it, you’ll have a JSON snapshot of your actual limits, a rough comparison against your real usage, and an alert that fires before you cross 75% of any limit. You’ll also have a clearer view of whether you should stay on Claude, downgrade your tier, upgrade, or migrate part of your workload to a cheaper alternative.

Step 1 — Pull Your Current Limits (15 min)

Get your admin key into an env var, then list everything:

export ANTHROPIC_ADMIN_KEY="sk-ant-admin-..."

curl "https://api.anthropic.com/v1/organizations/rate_limits" \
  --header "anthropic-version: 2023-06-01" \
  --header "x-api-key: $ANTHROPIC_ADMIN_KEY"

A typical response looks like this (real shape, illustrative numbers):

{
  "data": [
    {
      "type": "rate_limit",
      "group_type": "model_group",
      "models": [
        "claude-opus-4-5",
        "claude-opus-4-6",
        "claude-opus-4-7"
      ],
      "limits": [
        { "type": "requests_per_minute", "value": 4000 },
        { "type": "input_tokens_per_minute", "value": 2000000 },
        { "type": "output_tokens_per_minute", "value": 400000 }
      ]
    },
    {
      "type": "rate_limit",
      "group_type": "batch",
      "models": null,
      "limits": [{ "type": "enqueued_batch_requests", "value": 500000 }]
    }
  ],
  "next_page": null
}

Two things to notice in your own response:

One — every Opus version shares one rate limit. This is the single most-misunderstood piece of Claude rate limiting. Whether you call claude-opus-4-7 or claude-opus-4-5, every request and token counts against the same model_group bucket. If half your traffic is on 4.7 and half on 4.5 because you’re A/B testing, you’re sharing one limit, not two. Same is true for the Sonnet 4.x family across 4, 4.5, and 4.6.

Two — your tier is whatever the numbers in the response say it is. Tier 1 caps Opus at 50 RPM and 30,000 ITPM; Tier 4 hits 4,000 RPM and 2,000,000 ITPM. The response above is Tier 4. Check yours against the official tier table so you know whether you’re closer to Tier 1’s runway or Tier 4’s.

If you run multiple workspaces, pull each one too:

curl "https://api.anthropic.com/v1/organizations/workspaces/wrkspc_01JwQvzr7rXLA5AGx3HKfFUJ/rate_limits" \
  --header "anthropic-version: 2023-06-01" \
  --header "x-api-key: $ANTHROPIC_ADMIN_KEY"

Workspace responses only include overrides. If a group is missing from data, the workspace inherits the org-level limit — it’s not unlimited. This is the second-most-misunderstood thing about Claude rate limits, and the one that bites enterprise teams most often: a missing entry doesn’t mean “no cap,” it means “use the org cap.”

In Python (drop into a claude_limits.py you can rerun on a cron):

import os, requests

headers = {
    "x-api-key": os.environ["ANTHROPIC_ADMIN_KEY"],
    "anthropic-version": "2023-06-01",
}

def org_limits():
    r = requests.get("https://api.anthropic.com/v1/organizations/rate_limits",
                     headers=headers, timeout=10)
    r.raise_for_status()
    return r.json()["data"]

if __name__ == "__main__":
    for entry in org_limits():
        models = entry.get("models") or [entry["group_type"]]
        for limit in entry["limits"]:
            print(f"{models[0]:30s}  {limit['type']:30s}  {limit['value']:>12,}")

Run it. Save the output. That’s your current ground truth. Diff it against last week’s run on a weekly cadence — if Anthropic adjusts your tier, you’ll see it in the diff, not in a 429.

Step 2 — Cross-Reference Against Your Actual Usage (30 min)

Knowing your limit is half the audit. The other half is knowing what fraction of it you actually consume on a peak day.

Anthropic gives you two complementary tools:

Tool A — the Usage and Cost API returns historical token usage in structured JSON. Pull a week’s worth, group by minute, and find your peak input-tokens-per-minute and peak output-tokens-per-minute across the seven days. Compare those peaks against the limits you just pulled in Step 1.

Tool B — the Usage page in the Console has two charts that do this for you visually: Rate Limit – Input Tokens and Rate Limit – Output Tokens. Both show your hourly maximum vs. your configured limit, plus your cache-read percentage on the input chart.

The cache percentage matters more than most teams realize. For most current Claude models, cache_read_input_tokens does not count against your ITPM rate limit. That’s a structural advantage other AI APIs don’t share — and it means a team with a 70% cache hit rate is effectively running on a 3.3× higher ITPM ceiling than the raw number suggests. If your audit shows you’re at 90% of your ITPM cap, the first move isn’t to ask sales for a higher tier. It’s to add cache breakpoints to your system prompts and tool definitions.

Set yourself a two-column scorecard:

Limit	Your peak (last 7 days)	Headroom
Opus RPM	_______	_______
Opus ITPM (uncached)	_______	_______
Opus OTPM	_______	_______
Sonnet RPM / ITPM / OTPM	_______	_______
Batch RPM	_______	_______
Files / Skills / Web search	_______	_______

Anything below 50% headroom is fine. Anything in the 50–75% band is a watch-list item. Anything above 75% is a budget item — and where Step 3 kicks in.

Step 3 — Set Up an Alert Before You Hit 75% (45 min)

The cheapest version of this is a daily cron job that pulls usage from the Cost API, computes peak-minute figures, divides by the org limit pulled from the Rate Limits API, and posts to Slack if any ratio crosses 0.75. The whole thing fits in roughly 80 lines of Python and a GitHub Actions workflow:

# .github/workflows/claude-rate-limit-audit.yml
name: Claude rate limit audit
on:
  schedule:
    - cron: "0 14 * * *"   # 2pm UTC daily
  workflow_dispatch:
jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install requests
      - env:
          ANTHROPIC_ADMIN_KEY: ${{ secrets.ANTHROPIC_ADMIN_KEY }}
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
        run: python scripts/claude_limit_audit.py

The Python script pseudo-code:

limits = pull_rate_limits_api()
usage = pull_usage_cost_api(start=yesterday, granularity="1m")

for group, group_limit in limits.items():
    peak = max_per_minute(usage, group)
    ratio = peak / group_limit
    if ratio >= 0.75:
        slack_post(
            f":rotating_light: {group} at {ratio:.0%} of limit "
            f"(peak {peak:,} vs cap {group_limit:,}) on {yesterday}"
        )

Two numbers to tune for your team:

Threshold (0.75 default). Lower if you have spiky traffic; higher if your usage is steady. Most teams settle on 0.75 within a month.
Window (peak-per-minute by default). If your bursts last seconds, swap to peak-per-second by oversampling. If you’re steady-state, peak-per-hour is plenty and quieter.

For runtime self-throttling, you don’t even need this script — every Messages API response carries an anthropic-ratelimit-input-tokens-remaining header (and friends) that tells you, request-by-request, how close you are to the cap. Wire those into your existing tracing, and you’ll see throttling forming before it lands.

What the Rate Limits API Doesn’t Do

This is where honesty about scope matters more than enthusiasm about the launch.

It doesn’t include consumer plans. Pro, Max, and Team subscribers — the tens of thousands of users on $20/month and $100-200/month plans whose rate-limit complaints fueled the trust crisis in the first place — can’t read their limits from this API. The endpoint is Admin-only and lives on the developer platform side. If you’re paying $200/month and hitting your weekly cap inside 20 minutes, the Rate Limits API is not (yet) for you.

It doesn’t include Managed Agents. Limits for Claude Managed Agents (300 RPM creates, 600 RPM reads, per org) are documented but not exposed through this API.

It doesn’t let you change limits. Read-only by design. To set workspace overrides, open the workspace in the Console and use the Limits tab. To raise an org-level limit, you advance tiers (automatic at deposit thresholds) or contact sales.

It doesn’t predict the future. The numbers it returns are the configured ceilings. If Anthropic adjusts them — and 2026 has shown they sometimes do — the API reflects the new value the moment it changes, but it can’t warn you that a change is coming.

For now, the pragmatic combo is Rate Limits API + Usage and Cost API + your own alerting. That stack covers about 90% of what teams actually wanted when they started writing those Reddit threads.

What This Means for You

If you’re a solo developer on a personal Claude API account: This API isn’t reachable from your account, but the per-request anthropic-ratelimit-* response headers give you almost the same visibility — cheaper to wire up, no admin role needed. Log them, graph them in any dashboard tool, and you’ll see saturation forming before it bites. If you’re frustrated with cost on a Pro or Max plan, the API doesn’t help — but our Claude Code Reliability Audit course walks through the broader audit framework.

If you run a team gateway in front of Claude: Pull the API on startup and on a 6-hour schedule. Stop hardcoding limits in config — they drift. Read the org and per-workspace endpoints, build a per-tenant rate budget, and route traffic with awareness of which workspace is closest to its cap. This is the highest-leverage single use case for the API.

If you’re an enterprise admin evaluating whether to keep paying: The combination of Apr 23 postmortem + Apr 25 Rate Limits API is the most concrete signal Anthropic has given that they’re treating reliability as a roadmap line item, not just a marketing line. If the trust gap was the reason you were considering migrating part of your workload, this is reasonable evidence to wait one quarter and re-evaluate. Pair the audit above with a quarterly cost-and-headroom review and you’ll have a defensible decision either way.

If you’ve never used Claude’s API and you’re wondering why this matters: Rate limits are the way every hosted AI provider — OpenAI, Anthropic, Google — protects shared infrastructure from one customer eating everyone else’s capacity. They become a problem the day you scale past prototyping. The fact that Anthropic shipped this API at all, and shipped it during an active trust crisis, is the kind of vendor behavior that should make Claude more attractive for new builds, not less. Just budget headroom and instrument from day one.

The Bottom Line

Anthropic’s Rate Limits API isn’t a victory lap. It’s a half-loaf — and that’s fine, because the half-loaf is the one that was missing.

For the first time, every team paying for Claude can run a structured, scriptable audit of what they’re actually working with, cross-reference it against what they’re actually using, and set up alerts that fire before throttling hits. The whole audit fits in a Monday morning. If you’re a Claude customer and you don’t run it this week, you’re choosing to keep flying blind on a cost line that’s been the loudest user-feedback signal in your inbox for a month.

The other half of the loaf — visibility for Pro and Max consumer plans, predictive signals about upcoming limit changes, programmatic limit updates — is the work to watch for next. The Apr 23 → Apr 25 cadence suggests Anthropic is shipping reliability infrastructure faster than they were six weeks ago. Whether that pace holds is the question every customer should be asking by the end of May.

In the meantime: get your admin key, run the audit, file the headroom report, and stop arguing with your gateway logs.

Sources:

Audit Your Claude Rate Limits in 90 Minutes With Anthropic's API

Table of Contents