Anthropic has reset Claude pricing more than once since launch. This report tracks the moves, from the early Claude releases to the 2026 Opus, Sonnet, and Haiku ladder, and shows where the cheaper sticker is not the cheaper bill.
Anthropic has reset Claude pricing more than once since launch. This report tracks the moves across input output rates, model tiers, and context windows, and shows where the cheaper sticker is not the cheaper bill.
About this report
This report tracks Anthropic Claude pricing as a directional benchmark, not a live rate card. It draws on three inputs.
We report bands and directions, not precise per token figures, because the public list changes more often than any printed table can keep up with. Always confirm the live rate on the Anthropic pricing page before signing.
Claude pricing in 2026 is a per million token rate card split across a three tier model family. Opus sits at the premium end, Sonnet in the middle, and Haiku at the floor. Output tokens are charged at roughly five times the input rate at every tier.
The structure looks tidy on paper. In practice it has reset on every major model release since 2023, with rate moves, tier renames, and the arrival of long context and prompt caching as separately priced features. The published rate card is a snapshot, not a contract floor.
Enterprise customers add another layer. Volume commitments, prompt caching, and reserved capacity each shift the effective rate. The headline list price is the opening position. The realized rate after a structured commitment can sit well below it.
Claude launched in early 2023 with a single model and a public per million rate. The Claude 2 release later that year expanded the context window to 100K and adjusted the rate card. The Claude 3 release in 2024 introduced the three tier family. The Claude 3.5 era through 2025 widened the gap between tiers.
By 2026 the family had settled into the Opus, Sonnet, Haiku ladder with a 200K standard window, a 1M enterprise option, and prompt caching as a configurable discount. That is the current state. The path here matters because the next move will probably look like one of the previous moves.
The 2026 Anthropic pricing page lists Opus, Sonnet, and Haiku as the three model tiers, each with a per million input rate and a per million output rate. Long context calls are priced separately on the top tier. Prompt caching, when used, discounts repeat input.
Two implications follow. First, the unit of cost is the token, not the call, and the output token does most of the damage on agent and long form workloads. Second, the rate card alone is incomplete. A buyer who only reads the input and output cells understates the bill by the long context premium and the prompt caching discount.
A typical enterprise Claude bill in 2026 splits roughly along these lines. Mid tier usage dominates by call volume and mid tier cost dominates by spend share. Top tier is a small share of calls and a meaningful share of spend. Cheap tier is the largest share of calls and a small share of spend.
Read your own bill against this shape. A heavy skew toward one tier usually points to a routing policy that has not been written down, not to a genuine workload mix that calls for that skew.
Most enterprise Claude spend flows through one of three channels. Direct Anthropic API is the cleanest path. AWS Bedrock is the second. Google Vertex AI is the third. Each has its own commercial model and its own discounting hooks.
The direct Anthropic channel publishes the headline rate. Bedrock and Vertex apply their own enterprise discount programs on top, often without showing the underlying Claude rate change. The same Claude model can land at a different effective cost in each channel.
Buyers running large estates routinely split usage across two channels. The split is not about resilience. It is about pricing leverage, because each channel reprices on its own cadence and against its own commitments.
The single largest driver on most enterprise estates is the output token count, not the input. Agent loops, long form drafting, and multi step planning all produce a large output share. Knowing the output share before the negotiation lets the buyer model a realistic ceiling.
The second driver is context size on long retrieval workloads. Once a workload routinely crosses 200K, the long context premium becomes the next line item to model. The third driver is tier mix, which is downstream of the routing policy.
A per token model gives the buyer two levers a per seat model does not. The first lever is routing. Smaller cheaper models for smaller tasks shifts the bill without renegotiating anything. The second lever is prompt design. A shorter prompt that produces the same answer is a real cost cut.
Both levers sit inside engineering, not procurement. The buyers who control Anthropic cost best treat the rate card as a constraint and operate inside it with disciplined prompts and tier routing, instead of waiting for the next renewal.
The headline answer is that the top tier output rate has trended down in steps since the first Claude releases, while the model family widened into three tiers. The cheapest tier rate fell faster in percentage terms but matters less to the bill on hard workloads.
The chart below tracks the top tier output rate as a relative index against the original Claude release, with each major release reset shown as a step. The line stepped down through 2023 and 2024, then flattened as the product split into Opus, Sonnet, and Haiku.
Claude launched into a market where OpenAI set the rate expectation. Anthropic priced close enough to OpenAI to be a real alternative on the top tier and slightly cheaper on the cheap variant. The pricing posture in the first six months was not a price war. It was a pure positioning move.
That posture set the pattern that has held. Anthropic does not undercut on headline rate at the top tier. It competes on quality, on context window, and on enterprise commercial terms. The buyer side read is that Claude pricing is a quality bet, not a discount bet.
The Claude 2 release was the first time Anthropic led the market on context window. The 100K window was a meaningful product differentiator, and the rate stayed flat across it. That move set the template that the family would later expand twice more, to 200K and to 1M.
For buyers the practical effect is that long context is part of the Claude story, not an add on. A buyer who needs long context as a core capability will find the Anthropic family well shaped for it. A buyer who does not need long context is paying for a capability that does not matter to the workload.
Claude launched in early 2023 with a single model and a public per million rate. The output rate at the top tier was several times the input rate from the start. The first model refresh later in 2023 cut the top tier rate and introduced an instant model at a lower price.
That early move set the pattern. Anthropic has reset rates at every major model release, sometimes by renaming the tier, sometimes by lowering the price on the existing tier, sometimes by both. The Anthropic news feed is the cleanest dated record of those changes.
The 2024 Claude 3 release reorganized the family into three tiers. Opus took the top spot. Sonnet replaced the prior mid tier. Haiku arrived as the cheap fast tier. Each tier got its own per million input and output rate, and the cost ratio across the ladder widened.
That release was the first time the published ladder looked like a deliberate product, not a single model with an instant variant. The Claude product family page anchors the current naming.
Through 2025 Anthropic kept the three tier shape and pushed quality at each tier, often without changing the headline rate. The result is that the same Opus rate now buys a noticeably better model than it did at launch, while Sonnet has closed much of the quality gap to the prior Opus.
For buyers this matters more than the rate card. A flat rate with rising quality is a real price drop per useful output, even when the printed number does not move.
Between the 2024 Claude 3 release and the 2025 refresh, the mid tier moved more in quality than in rate. The Sonnet line absorbed workloads that had needed Opus six months earlier. Buyers who repointed their routing policies at the new Sonnet typically cut cost by a meaningful share without losing output quality on standard tasks.
The lesson for forecasting is simple. A Claude tier label is not a fixed point. The same name at a different release date can be a different product on quality, even when the rate is unchanged. Lock the routing to a quality target, not to a tier name.
For audit and budgeting purposes, three classes of dated event matter. Rate card changes are the obvious one. Tier renames are the second. Feature launches that carry a separate rate, like prompt caching and the 1M context tier, are the third. Track all three on a single dated log.
Buyers who keep this log usually catch the next change in time to act. Buyers who do not catch it on the renewal quote, which is too late.
The three tier ladder has settled. Opus is the premium. Sonnet is the default. Haiku is the floor. Within that ladder the cost ratio from floor to top runs roughly 10x to 15x depending on whether you weight input or output more heavily.
The exact published rates change. The ratio between tiers is what stays roughly stable, and the ratio is what should drive your routing strategy.
The cheaper tier wins only when it resolves the task in one pass. Once the cheaper tier triggers a retry, a chain of clarifying calls, or a longer prompt to get the same answer, the saving evaporates. The only fair comparison is cost per useful output on a representative task set.
Buyers who instrument this comparison routinely find Sonnet is the best default on standard work and Opus is the best default on harder reasoning. Haiku is the right tool on cheap to verify, cheap to retry tasks, and a poor choice everywhere else.
The table below sets the relative multiples on the Haiku floor. Read it as a routing tool, not a billing forecast. The output column dominates on agent and long form work, the input column dominates on retrieval heavy tasks.
Claude tier rate card structure (2026), expressed as relative multiples of the Haiku floor.
| Tier | Relative input rate | Relative output rate | Typical use case |
|---|---|---|---|
| Haiku | 1x | 1x | Cheap fast tasks, classification, summary, routing |
| Sonnet | 3x | 5x | Mid weight reasoning, drafting, default agent |
| Opus | 15x | 15x | Hard reasoning, long context, multi step planning |
Picking a tier is a routing decision, not a procurement one. The savings from running the right tier on a workload exceed the savings from negotiating a few percent off the rate card, on every estate we benchmark.
Teams that route by task type, with a clear policy for which queries go to Opus and which go to Haiku, pay less per useful output. Teams that default to a single mid tier pay more because they overpay on the easy tasks and underdeliver on the hard ones.
At launch the ladder was narrower. The mid tier sat closer to the top, and the cheap tier was a smaller saving. The Claude 3 release widened the ladder. The cheap tier got cheaper. The top tier held its rate.
The practical effect is that the saving from routing easy work to Haiku is bigger now than it was in 2023, and the cost of getting the routing wrong is larger.
A routing policy is a written rule that decides which tier handles which class of task. The rule should be simple enough that an engineer can implement it in code and a finance lead can audit it in a meeting. Start with three buckets and refine from there.
Measure the cost per useful output on each bucket. Move the bucket boundaries as the data lands. The policy improves through measurement, not through a single up front decision.
For budgeting, model expected spend at the bucket level rather than the call level. A reasonable starting mix on a typical enterprise estate is 10 percent Opus, 60 percent Sonnet, 30 percent Haiku, by call volume. Cost weight shifts the mix on the bill toward Opus.
The mix is the single most useful input for a Claude budget. Get it wrong on the high side and the forecast looks scary. Get it wrong on the low side and the renewal conversation arrives without preparation.
The Claude context window has expanded by an order of magnitude since launch. At launch the window was 100K tokens. The current default is 200K across the family. The premium tier supports up to 1M tokens for enterprise customers, billed as a long context call at a higher rate.
The size of the window is not the only thing that changed. So did the way Anthropic charges for using it. Long context calls are priced as a premium, prompt caching is priced as a discount, and the same prompt can land in a different rate band depending on which the buyer uses.
The context window is a structural feature, not a tweak. Each expansion reflects model architecture and serving cost improvements that took months of engineering work. Anthropic moves the window when the underlying numbers support it, not on a marketing calendar. That is why the moves are infrequent but durable.
For buyers this means the published window is a stable planning input. The number does not move every quarter. When it does move, it is usually in a meaningful step up rather than a small adjustment. Treat window planning as a multi year decision.
The original 100K window was the largest in the market at the time of launch. It made Claude the first model that could ingest a long document in one call. The rate was flat across the window. There was no separate long context premium.
For buyers this was a simple billing model. One rate, one window. Most workloads fit inside it.
The 2024 Claude 3 release doubled the window to 200K across all three tiers. The rate stayed flat across the window. This made the cost of long context calls predictable, with the same per token price whether the prompt was 1K or 199K tokens.
The 200K window remains the practical default in 2026. It covers most enterprise retrieval, agent, and long form workloads without triggering a separate billing path.
The 1M context tier arrived as an enterprise feature, not a default. Calls above 200K route into a separate billing band at a higher per token rate. Prompt caching helps offset the cost on repeat queries.
This is the first time Anthropic has explicitly split context size from base rate. Buyers running 1M context workloads see a step change in per call cost the first time the window is crossed. Modeling that step into the consumption forecast is now a required part of planning.
Prompt caching lets the buyer mark a stable input prefix as cached. Subsequent calls that share the prefix are billed at a fraction of the input rate. On retrieval augmented agents and long system prompts, the saving can be a wide band when the cache hit rate is high.
The catch is configuration discipline. A cache miss reverts the call to the full input rate. Workloads that look cacheable on paper but route through a constantly shifting system prompt produce a cache hit rate that disappoints. Test the hit rate against a real traffic sample before pricing the saving into the forecast.
Most enterprise workloads do not need 1M context. Document grounding fits inside 200K when the retrieval layer is doing its job. Long context is most often a sign of weak retrieval, not a sign that a bigger window is needed.
Buyers who invest in retrieval discipline often find they can cut the average prompt size in half without losing answer quality. That is a direct rate card saving with no negotiation involved. The cheapest way to pay less is to send fewer tokens.
The 1M context tier is unlikely to be the last step. Anthropic and its competitors have signaled longer windows in research papers and selected previews. By late 2027 a 2M context tier on a flagship model is a credible possibility, with a corresponding premium rate.
That trajectory matters for buyers who design retrieval strategies around the current ceiling. A retrieval design that assumes 200K may need to be reread when the practical ceiling doubles. Plan retrieval as a tunable, not as a fixed point against the current window.
Anthropic, OpenAI, and Google all publish per million token rates with tiered model families. The three vendors are close enough on headline rate that comparing the published numbers is a poor guide to total cost.
The differences that matter for the bill sit elsewhere. They are in the enterprise discount program, the volume commitment terms, the prompt caching mechanic, and the way long context calls are billed. The Anthropic enterprise page is the cleanest reference for the Anthropic side of that picture.
Anthropic, OpenAI, and Google relative pricing posture in 2026, directional bands, not exact rates.
| Vendor | Top tier per million | Mid tier per million | Cheap tier per million | Long context |
|---|---|---|---|---|
| Anthropic Claude (Opus, Sonnet, Haiku) | Highest of the three | Mid market | Competitive | Premium tier supports 1M with separate rate |
| OpenAI GPT family | Premium | Mid market | Competitive | 1M on selected models, separate rate |
| Google Gemini | Aggressive on enterprise discount | Cheapest at mid | Cheapest at floor | 1M to 2M depending on model and channel |
Comparing only the published per million rate across Anthropic, OpenAI, and Google produces a ranking that does not survive contact with the bill. Each vendor publishes different bundles, different cache mechanics, and different long context economics. The comparison that matters is total cost on a representative workload, not the rate card alone.
Run the comparison on a small sample of your real traffic. Use the same prompts on each vendor. Measure tokens in, tokens out, latency, and answer quality. The ranking that comes out of that test is usually different from the ranking the published rates suggest.
On direct API the published rate is what most buyers pay. The route to a meaningful discount runs through volume commitments, reserved capacity, or a Bedrock or Vertex enterprise discount program. The mix of channels matters more than the printed rate.
The largest single saving we see is from prompt caching on workloads with stable input prefixes. Teams running retrieval augmented agents, long system prompts, or repeated document grounding can cut effective input cost by a wide band when caching is configured correctly.
Bedrock and Vertex add channel level discounts on top of the underlying Claude rate. The discount is rarely visible as a Claude rate change. It shows up as an enterprise commitment in the cloud bill, against which Claude consumption draws down.
Buyers with existing AWS or Google Cloud commitments can route Claude through the cloud channel and have the spend count against the parent commitment. This is the single biggest commercial reason to split usage across channels.
On AWS Bedrock the Claude model rate is the same as the direct rate at the published level. The discount lives in the enterprise discount program, the private offer, or the existing AWS commitment. Buyers with an active EDP find that Claude consumption can be made to count, which moves the negotiation to AWS rather than Anthropic.
This is a quietly large lever. An enterprise with a tight AWS commitment can route Claude through Bedrock and gain pricing leverage that direct API spend does not offer. The buyer side move is to model the two paths side by side before signing either commitment.
Google Vertex AI offers a similar dynamic. Claude is available on Vertex, with its own enterprise commercial model running through Google Cloud. Buyers running Vertex against an existing Google Cloud commitment can use it to soften the Anthropic conversation, or to split traffic across channels for pricing leverage.
The trade off is operational. Two channels is twice the operational surface and twice the rate volatility, even when both channels are billed against an existing commitment. Most buyers settle on a primary channel and use the second as leverage, not as half the production load.
The standard view is that picking the cheapest Claude tier saves money. We disagree as a default. In the deployments we benchmark, a cheaper tier asked to handle a complex task often costs more per useful output than the more expensive tier doing the same work cleanly, because retries, longer context, and lower first pass quality compound. The buyer side move is to benchmark cost per outcome across tiers, not per token, and route each query to the tier that resolves it in one pass rather than defaulting to the lowest sticker.
Source: Redress Compliance advisory engagement file, 2024 to 2025.
Anthropic sets the rate card. The tier you route to sets the bill. The buyer who knows both, and benchmarks cost per outcome, pays the floor.
The pattern of the past three years is the best guide to the next two. Expect a new model release roughly twice a year, each one with a quiet rate reset on the cheap and mid tiers, quality moves at the top, and a widening of the enterprise discount program.
The pricing direction is set by the competitive read against OpenAI and Google. Anthropic does not need to be the cheapest. It needs to be defensible on quality and predictable on rate. Buyers should plan accordingly.
A budget that assumes the current rate card holds is a budget that breaks at the next release. Build the forecast as a range, not a number. Use the historical reset cadence as the input. Plan for at least one mid year tier rename and one feature launch with a separate rate.
The reliable structure is a base case at the current published rates, a downside case assuming a 10 to 20 percent rise on output, and an upside case assuming a quality move that lets the buyer drop one tier on a share of the workload. The three together cover most of the realistic 2027 paths.
A volume commitment is worth signing when the baseline is well measured and the downside is bounded. The commitment locks the rate against a known floor of usage. The risk is paying for unused capacity if the buyer overshoots the forecast or if a quality move on the cheaper tier shifts the workload mix.
The buyer side response is to negotiate term length, swap rights between tiers, and a clean exit if the model family changes shape. A flat commitment without swap rights is a one way bet on the current product. The cleaner version is a commitment that travels with the family.
The top tier rate has held flat since 2024 and is likely to keep holding. Quality at the top tier will move instead. The mid and cheap tier rates are more likely to drift down, in step with similar moves at the other frontier vendors.
The bigger change will be in how enterprise commitments are structured. Expect a more developed ladder of volume commitment bands, a more formal reserved capacity option, and clearer published terms on prompt caching.
The presence of Claude on Bedrock and Vertex is not just a distribution story. It is a quiet competitive constraint on Anthropic direct pricing. Each channel can deepen enterprise discounts independently, and the direct channel has to stay in range to keep its share.
The 2027 outlook should assume the discount gap between direct and channel grows rather than narrows. Buyers who only quote direct Anthropic without a channel comparison are leaving leverage unused. Bring both quotes to the table.
Both Anthropic and its competitors have moved toward reasoning style models that spend more compute per query for harder problems. These models are priced higher than the base tier of the same family. The reasoning premium is the new variable for forecasters.
Buyers should assume that a portion of 2027 spend on hard tasks will route through a reasoning variant of the top tier, at a per token rate above the published Opus rate. The forecast band on hard workloads should sit above the current rate.
The current Anthropic enterprise program is bespoke. Each commitment is negotiated against the buyer baseline. As the product matures, expect a more standard set of bands with public discounts at each, similar to the cloud reserved instance pattern.
That is good for predictability and bad for buyers who could previously negotiate above the band. The window to lock in bespoke terms is shrinking.
The buyer side response to a vendor that resets pricing on every major model release is a renewal calendar, a routing strategy, and a benchmark, not a one off negotiation. Use the steps below as the working sequence for the next twelve months.
The work below is not a one off project. It is a twelve month operating rhythm that compounds. The first quarter is measurement. The second is policy. The third is negotiation. The fourth is review. Each quarter feeds the next.
Treat the first round as the slow round. The savings show up in months three to six as the routing policy beds in and the caching configuration matures. The negotiation in month nine lands on a baseline the vendor cannot dispute, because the buyer has the data.
Anthropic Claude is not a single product with a single rate. It is a three tier family with a rate card that has reset on every major model release. The published numbers are a snapshot. The structure underneath is what stays stable, and the structure is what should drive the buyer side response.
The buyer who treats this as a structural conversation, not a price conversation, is the buyer who controls the bill. Measure first. Set the routing policy. Use prompt caching where it fits. Bring a benchmarked target to the negotiation. The bill follows the discipline, not the other way round.
Anthropic prices Claude on a per million token basis, split between input and output, across a three tier family. Opus is the premium. Sonnet is the default mid tier. Haiku is the cheap floor. Output is roughly five times input at each tier. The rate card resets with every major model release.
Claude pricing has reset at every major model release. The 2024 Claude 3 release reorganized the family into Opus, Sonnet, and Haiku. The 2025 Claude 3.5 era held headline rates and pushed quality. Long context and prompt caching arrived as separately priced features. The structure has settled.
Claude runs a three tier ladder. Opus sits at the top, Sonnet in the middle, Haiku at the floor. The relative cost ratio from Haiku to Opus is roughly 10x to 15x depending on whether you weight input or output. Sonnet sits 3x to 5x above Haiku. Tier choice is a routing decision, not a procurement one.
The standard context window is 200K tokens across the family in 2026, priced at the base tier rate. Long context calls above 200K route into a separate premium tier with a higher per token rate. Prompt caching can offset the cost on workloads with stable input prefixes by discounting repeat input.
All three vendors publish per million token rates with tiered model families and are close on headline rate. The differences that matter for the bill sit in enterprise discount programs, volume commitments, prompt caching mechanics, and how long context calls are billed. Comparing only the published rates is a poor guide to total cost.
On hard tasks that Sonnet cannot resolve in one pass, yes. The cost of retries, longer prompts, and lower first pass quality on Sonnet often exceeds the cost of routing the same task to Opus once. On tasks Sonnet can handle cleanly, Opus is an overpay. Benchmark cost per useful output, not per token.
Prompt caching lets the buyer mark a stable input prefix as cached. Subsequent calls that share the cached prefix are billed at a fraction of the input rate. On retrieval augmented agents and long system prompts the saving can be a wide band. Configuration discipline matters because a cache miss reverts to the full input rate.
Expect the top tier rate to hold flat with quality moving instead. The mid and cheap tier rates are likely to drift down in step with the market. Reasoning variants of the top tier will price above Opus on hard tasks. Volume commitments will move from bespoke to banded. The window to lock bespoke enterprise terms is shrinking.
Build a baseline of current usage by tier and channel before opening a conversation. Benchmark cost per useful output across tiers and use the result to size the commitment. Negotiate term length, capped rate increases, swap rights between tiers, and clean exit terms. Decide channel mix between direct Anthropic, Bedrock, and Vertex against existing cloud commitments before signing.
It depends on the existing cloud commitment shape and the workload mix. Direct Anthropic offers the cleanest path to the headline rate and the prompt caching discount. Bedrock and Vertex apply channel level enterprise discounts on top of the underlying Claude rate. Many enterprise buyers split usage across two channels to keep pricing leverage on both sides.
The dated rate cards, the tier ratios, the context window economics, the volume commitment bands, and the renewal clause checklist that holds the bill predictable.
Used across more than five hundred enterprise engagements. Independent. Buyer side. Built for procurement and finance leaders running the next Anthropic commitment.
Anthropic sets the rate card. The tier you route to sets the bill. The buyer who knows both, and benchmarks cost per outcome, pays the floor.