Anthropic API Pricing: Token Costs

The Claude Model Family: Understanding the Tier Structure

Anthropic structures the Claude API around three model tiers, each optimised for a different balance of cost, speed, and capability. Understanding which tier is appropriate for each use case is the foundation of rational Anthropic API cost management.

Claude Haiku 4.5: The fastest and most cost-effective tier at $1.00 per million input tokens and $5.00 per million output tokens. Haiku is designed for high-volume, latency-sensitive applications — classification, triage, simple extraction, customer-facing chat with constrained response requirements. For applications that process millions of short interactions, Haiku is the correct economic choice. Using Sonnet or Opus for Haiku-appropriate workloads is one of the most common and costly API procurement mistakes we encounter.

Claude Sonnet 4.6: The balanced tier at $3.00 per million input tokens and $15.00 per million output tokens. Sonnet is Anthropic's recommended general-purpose model for enterprise workloads — complex analysis, code generation, document drafting, and agentic tasks that require multi-step reasoning. The price-to-capability ratio at Sonnet is strong relative to market alternatives.

Claude Opus 4.6: The premium reasoning tier at $5.00 per million input tokens and $25.00 per million output tokens. Opus represents a significant reduction from the Opus 4.1 era pricing of $15/$75 per million tokens — a 67% reduction that makes Opus more accessible for enterprise workloads that genuinely require frontier reasoning capability. Opus should be reserved for tasks where reasoning depth demonstrably improves outcome quality: complex legal analysis, scientific literature review, executive-level synthesis requiring deep contextual understanding.

Rate Limits by Tier: What They Mean in Practice

Rate limits are one of the most frequently misunderstood dimensions of the Anthropic API for enterprise buyers. They affect application architecture, reliability planning, and whether your production deployment will meet the throughput requirements you scoped in your business case.

Anthropic enforces three categories of rate limits: requests per minute (RPM), tokens per minute (TPM), and tokens per day (TPD). The specific values depend on your usage tier — Build, Scale, and custom Enterprise arrangements — and the model you are using.

At the Build tier (standard pay-as-you-go access), rate limits are typically restrictive for production enterprise workloads. A typical configuration allows approximately 5 requests per minute, 20,000 tokens per minute, and 300,000 tokens per day on Sonnet. For a single-user research tool, these limits are adequate. For a production application serving hundreds of concurrent users, they represent a hard architectural constraint that will manifest as request failures and user experience degradation at peak load.

Progression to higher tiers requires demonstrated usage history and, at the Scale tier and above, commercial engagement with Anthropic's enterprise sales team. Monthly active spend thresholds apply for automatic tier advancement. For enterprise deployments that require guaranteed throughput from day one, the correct approach is to negotiate custom rate limit agreements through Anthropic's enterprise channel — not to assume that tier progression will happen organically at the speed your deployment requires.

The practical implication for procurement: when building the business case for an Anthropic API deployment, model your peak tokens-per-minute requirement explicitly and verify that your contracted tier supports it before you commit to the deployment timeline. Rate limit failures in production — particularly in customer-facing applications — are an immediate escalation risk.

"Anthropic is removing API discounts that historically delivered 10–15% in cost relief for many customers. Negotiate a most-favoured-nation clause before this transition completes."

Enterprise Discounts: How They Work and What Is Changing

Anthropic's enterprise discount structure has been in transition. Historically, enterprise customers negotiating volume commitments received API discounts of 10 to 15% off published rate card pricing. Anthropic is removing these API discounts as it moves toward a model where Enterprise plan seat fees cover access and all usage is billed separately at API rates. This transition is material for enterprise procurement teams that have modelled Anthropic costs based on previous discount assumptions.

Current enterprise discount mechanics operate through several channels. First, volume commitment discounts are available for customers that can commit to monthly minimum API spend — typically starting at meaningful levels above $10,000/month. The discount percentage scales with committed volume and is confirmed through enterprise agreement negotiation. Second, the batch API discount provides a flat 50% reduction on both input and output tokens for asynchronous workloads — requests that do not require real-time responses. This is not a negotiated enterprise discount; it is a structural pricing feature available to all API users for eligible workloads. Third, prompt caching discounts reduce cost by up to 90% for repeated content — system prompts, document context, few-shot examples — that appears across many API calls. Cache read pricing is significantly lower than standard input token pricing.

The enterprise subscription model — Claude Team at $25/seat/month and Claude Enterprise at custom pricing — operates differently from API pricing. Enterprise plan seats cover access to Claude through the chat interface and API at bundled rates. Understanding which cost model (subscription or API) is more economical for your use case mix requires modelling based on your interaction archetypes and user population — not vendor guidance.

Cost Optimisation: The Two Levers That Deliver the Most Savings

Two Anthropic API features deliver disproportionate cost savings and should be in every enterprise buyer's architectural plan before committing to significant API spend:

Prompt caching: When your AI application sends the same large context repeatedly — a long system prompt, a reference document, a set of examples — prompt caching stores that context server-side and charges a significantly reduced rate (approximately 10% of standard input cost) for cache reads. The savings for applications with large, repetitive context are dramatic. We have modelled deployments where prompt caching reduces total API cost by 60 to 70% for document-heavy workflows. Implementation requires architectural adjustment — the cache must be maintained and refreshed — but the return on that engineering investment is reliably positive for qualifying workloads.

Batch API: For workloads that do not require real-time responses — document processing, overnight analysis runs, batch summarisation, non-interactive classification — the Anthropic batch API delivers a flat 50% discount on all token costs. The constraint is latency: batch jobs complete within 24 hours, not in real time. For the significant proportion of enterprise AI workloads that are non-interactive, this is the single highest-ROI cost optimisation available in the Anthropic API. Combining batch API with prompt caching can reduce costs by up to 95% for qualifying workloads versus the pay-as-you-go standard rate.

Azure OpenAI vs Direct Anthropic: Key Pricing Differences

Enterprise buyers evaluating Anthropic API access frequently face a second decision: direct access via api.anthropic.com versus access through AWS Bedrock or Google Cloud Vertex AI (where Claude models are available). Understanding the pricing and contractual differences is material to total cost management.

Direct Anthropic API access offers the broadest model availability, the latest model releases first, and direct contractual relationship with Anthropic for enterprise agreements. It is the natural choice for organisations that have made Anthropic their primary AI provider and want maximum commercial flexibility with Anthropic directly.

AWS Bedrock access to Claude models allows organisations to route Anthropic consumption through their existing AWS enterprise discount programme (EDP). For organisations with large AWS committed spend, this can make Claude more economical than direct API access — particularly if AWS EDP discounts of 15% or more apply to Bedrock consumption. The trade-off is that AWS controls the release schedule for new Claude models on Bedrock, which typically lags the direct API by weeks to months. Enterprises that require access to the latest models immediately should weight this factor heavily.

OpenAI enterprise agreements — the most common point of comparison — have lock-in provisions that always require scrutiny. Always flag these in your evaluation. Anthropic's enterprise agreements, while not without their own commercial complexities, do not carry the same degree of platform lock-in as OpenAI's dedicated deployment model. Consumption billing creates budget unpredictability in both cases; the mechanisms differ, and understanding both before committing is essential.

What to Negotiate in an Anthropic Enterprise Agreement

Enterprise buyers committing meaningful spend to the Anthropic API should negotiate the following provisions before signing:

Most-favoured-nation pricing: As Anthropic continues to reduce published API prices with each model generation, your contracted rate should automatically update to match published pricing improvements.
Custom rate limits: Specify your throughput requirements — tokens per minute, requests per second at peak — and have them confirmed in the agreement, not as a feature that can be changed.
Data residency and zero-data-retention options: Confirm whether Anthropic retains your API inputs and outputs for model training, under what conditions, and what zero-data-retention options are available for sensitive workloads.
SLA commitments: The Anthropic API is an infrastructure dependency. Negotiate specific uptime SLAs — Anthropic announced 99.99% SLA for enterprise customers in March 2026 — and confirm the credit mechanism for SLA failures.
Model continuity provisions: As Anthropic deprecates older models, ensure your agreement includes a defined migration timeline and a commitment that pricing parity or improvement applies to replacement models.

Reviewing an Anthropic enterprise agreement?

Our GenAI advisory team has reviewed Anthropic, OpenAI, Google, and AWS AI contracts for enterprise buyers across regulated industries.

Talk to an Advisor →

Anthropic API Pricing: Token Costs, Rate Limits & Enterprise Discounts Explained