Contents
- Why You Are Asking the Wrong Question
- The Seven Dimensions That Actually Matter
- Dimension 1: Commercial Architecture
- Dimension 2: Total Cost of Ownership
- Dimension 3: Contractual Flexibility
- Dimension 4: Ecosystem Alignment
- Dimension 5: Data Governance and Compliance
- Dimension 6: Multi-Vendor Optionality
- Dimension 7: Vendor Maturity and Durability
- Applying the Framework: Three Scenarios
- Running the Selection Process
- FAQ
Enterprise AI vendor selection is being driven by the wrong data. Benchmark scores (MMLU, HumanEval, GSM8K) measure model performance on academic tasks, not on your organisation’s specific workloads. Models that win benchmarks one month lose them the next. A vendor’s flagship model that leads in March may be surpassed by a competitor’s update in April. Meanwhile, the contract you signed in March governs your spend, your data, and your flexibility for the next 12–36 months — long after the benchmark leaderboard has reshuffled three times over. The most durable competitive advantage in enterprise AI is not model quality (which is fleeting) but commercial structure (which is contractual and persistent).
1. Why You Are Asking the Wrong Question
The standard enterprise AI evaluation process runs roughly like this: a technical team evaluates model quality across several providers, scores them on accuracy, latency, and capability, and recommends the provider with the highest technical score. AI procurement checklist then negotiates the contract for that pre-selected vendor, with limited leverage because the decision has already been made on technical grounds.
This process has three fundamental flaws in 2026.
Model quality has converged. Claude Sonnet 4.5, GPT-4o, and Gemini 2.5 Pro are all extraordinarily capable. For 90% of enterprise use cases — drafting documents, summarising reports, answering questions, classifying data, generating code — any of them will do the job well. The differences between them are real but marginal for production workloads. Selecting a vendor solely on the basis of model quality is like choosing a car manufacturer based on which engine produces 2% more horsepower when you will never exceed the speed limit.
Commercial structures have diverged. While models converge, the commercial models behind them have gone in radically different directions. Anthropic offers a simple two-channel model (subscriptions plus API). how OpenAI is reshaping enterprise procurement layers credit-based consumption on top of per-seat subscriptions. Google fragments AI across five distinct billing streams. AWS wraps AI into a cloud committed-spend vehicle with its own discount mechanics. Each of these structures creates different cost dynamics, different AI vendor lock-in risk assessment mechanisms, and different negotiation leverage points. Two vendors offering functionally equivalent AI capabilities can produce total cost differences of 30–50% depending on how their commercial models interact with your organisation’s deployment pattern.
Switching costs are real but overestimated. Most enterprise AI workloads can be migrated between providers with moderate engineering effort (weeks, not months). This means your selection is not irrevocable — but your contract is. A three-year agreement with unfavourable terms traps you far more effectively than any technical integration. The framework below evaluates providers on the dimensions that determine whether you are well-served or trapped.
2. The Seven Dimensions That Actually Matter
This framework evaluates enterprise AI providers across seven dimensions, weighted by their impact on the procurement outcome rather than the technical evaluation:
Dimension 1: Commercial Architecture — How is the AI priced, packaged, and billed? What cost model governs your spend?
Dimension 2: Total Cost of Ownership — What will you actually pay across all cost layers over 12–36 months?
Dimension 3: Contractual Flexibility — Can you scale up, scale down, switch models, and exit without penalty?
Dimension 4: Ecosystem Alignment — Does the AI provider integrate naturally with your existing productivity, cloud, and security infrastructure?
Dimension 5: Data Governance and Compliance — Does the vendor’s data handling meet your regulatory and policy requirements?
Dimension 6: Multi-Vendor Optionality — Does this vendor help or hinder your ability to use competing AI providers in parallel?
Dimension 7: Vendor Maturity and Durability — Will this vendor still be a viable enterprise partner in three years?
Model quality is not a separate dimension because it is a qualifying criterion, not a differentiator. Any provider whose models do not meet your quality threshold for the intended use cases should be eliminated before the framework is applied. Once qualified, model quality differences between providers are insufficient to override the commercial and contractual factors that determine the success of the deployment.
3. Dimension 1: Commercial Architecture
The single most important question in AI vendor selection is not “how good is the model?” but “how does the pricing work?” Commercial architecture determines whether your costs are predictable, whether you can optimise spend, and whether you have leverage at renewal.
Anthropic (Claude)
Two-channel model: per-seat subscriptions (Team, Enterprise) plus pay-per-token API. The simplest commercial structure in the market. Subscription pricing is flat per-seat with dynamic usage limits (no visible credits or consumption metering). API pricing is straightforward per-token with batch discounts and prompt caching. The downside: “dynamic usage limits” on subscriptions are opaque — you do not know exactly what you are buying per seat.
OpenAI (ChatGPT)
Per-seat subscriptions plus a credit-based consumption layer. Business and Enterprise plans include per-seat limits plus shared workspace credit pools for advanced features. Enterprise contracts bundle credits, API access, and integration fees. The credit system introduces variable cost exposure but provides more transparency than Anthropic’s opaque limits. The complexity: the credit system requires active management, budget tracking, and usage governance.
Google (Gemini)
Five distinct billing streams: Workspace-embedded AI (mandatory, no opt-out), Gemini Enterprise platform (separate per-seat), Gemini API via Vertex AI (per-token), Code Assist (separate per-seat), and consumer plans. The most fragmented commercial architecture. The opportunity: granular control over which channels you activate and for which users. The risk: overlapping capabilities across channels, difficulty tracking total AI spend, and the forced Workspace AI surcharge.
AWS (Bedrock/SageMaker)
Consumption-based: pay-per-token on Bedrock, pay-per-instance-hour on SageMaker, both integrated into your broader AWS Private Pricing Agreement (PPA/EDP). No per-seat subscriptions for AI. The advantage: AI costs are variable and scale precisely with usage. The risk: AI spend is volatile and hard to forecast, which complicates committed-spend structures. No subscription component means no cost ceiling per user.
Microsoft (Azure OpenAI/Copilot)
Per-seat Copilot subscriptions bundled with Microsoft licensing knowledge hub 365, plus consumption-based Azure OpenAI Service for API access, both integrated into the Microsoft Enterprise Agreement and MACC. The advantage for Microsoft-native organisations: consolidated billing, deep Office integration, familiar contracting vehicle. The risk: Copilot pricing adds $30/user/month on top of existing M365 costs, and Azure OpenAI consumption can be difficult to cap.
For each provider, ask: “If our AI adoption doubles in the next 12 months, how does our cost change?” The answer reveals the commercial architecture’s behaviour at scale. Flat per-seat models cap cost but may throttle usage. Per-token models scale linearly. Credit-based models create variable exposure. Layered models multiply costs across channels.
4. Dimension 2: Total Cost of Ownership
Total cost of ownership (TCO) for enterprise AI is never the sticker price. For every provider, the actual cost includes multiple layers that most evaluations undercount.
Direct licensing costs: Per-seat fees, per-token consumption, provisioned throughput, and platform fees. This is what appears in the vendor’s pricing table.
Cloud infrastructure costs: Compute, storage, networking, and data transfer required to support AI workloads. For Bedrock/SageMaker, this is the dominant cost layer. For subscription-based providers, infrastructure costs are lower but still exist (e.g., integration middleware, data pipelines, SSO/identity management).
Integration and implementation costs: SSO configuration, API integration development, data connector setup, custom agent development, security review, and user training. These first-year costs can exceed the first year of licensing for complex enterprise deployments.
Governance and management costs: Usage monitoring, cost allocation, policy enforcement, model evaluation, and vendor management. These are ongoing operational costs that scale with the number of users and use cases.
Shelfware and waste costs: Unused licences, underutilised seats, idle provisioned capacity, and over-provisioned editions. Industry data consistently shows 30–50% of enterprise AI licences are underutilised within the first year. A 500-seat deployment where 200 seats are underused represents $60,000–$180,000/year in waste depending on the provider and tier.
Build a three-year TCO model for each shortlisted vendor that includes all five layers. The vendor with the lowest sticker price rarely has the lowest TCO. Providers with higher per-seat rates but inclusive features (no add-ons, no variable consumption charges, no mandatory cloud infrastructure) can be cheaper at scale than providers with lower sticker prices but layered cost structures.
5. Dimension 3: Contractual Flexibility
In a market where models, capabilities, and competitive dynamics shift every quarter, contractual flexibility is a strategic asset. Evaluate each vendor on five flexibility dimensions:
Term length and exit provisions: Can you exit or reduce scope mid-contract? What are the penalties? A one-year term with auto-renewal is far more flexible than a three-year lock-in, even if the three-year price is lower. Negotiate termination-for-convenience clauses with 90–180 day notice periods.
Seat scaling: Can you add seats at the negotiated rate during the term? More critically, can you remove seats (true-down)? Most vendors allow easy scale-up but resist scale-down. Negotiate the right to reduce seats by 10–15% annually without penalty.
Edition and model mobility: Can you move users between tiers or editions (e.g., from Enterprise Standard to Enterprise Premium, or vice versa) without renegotiating the entire contract? Can you switch between models (e.g., from Claude Sonnet to Claude Haiku) without pricing changes? Lock-in can occur at the edition level, not just the vendor level.
Pricing predictability: Is your price locked for the contract term, or can the vendor change pricing mid-term? Ensure your agreement includes a price-hold clause that fixes rates for the full term and a cap on annual price increases at renewal (typically 3–5%).
Data portability: Can you export all data, configurations, agent logic, and customisations at termination? Without contractual data portability guarantees, your exit costs are unbounded. Evaluate what format data is exported in and the timeline for export completion.
6. Dimension 4: Ecosystem Alignment
Enterprise AI does not exist in isolation. It integrates with your productivity suite, your cloud infrastructure, your identity management, and your security tools. Ecosystem alignment reduces integration costs, simplifies governance, and improves adoption.
Google Workspace organisations have a natural path to Gemini. AI is embedded in the tools employees already use, at no incremental licensing cost beyond the existing Workspace subscription. Adding Gemini Enterprise for power users extends the ecosystem rather than introducing a new one. Google Vertex AI provides API access within the same cloud environment. The ecosystem alignment is strongest when your entire stack is Google-native.
Microsoft 365 organisations have a natural path to OpenAI via Copilot and Azure OpenAI Service. AI integrates directly into Outlook, Teams, Word, Excel, and PowerPoint. Azure OpenAI consumption counts towards existing MACC commitments. If your identity management, endpoint security, and compliance tools are all Microsoft, adding a non-Microsoft AI provider creates integration friction that Microsoft does not.
AWS-first organisations have a natural path to Bedrock for API workloads. Bedrock integrates with existing IAM, VPC security, CloudWatch monitoring, and PPA/EDP billing. For subscription-style AI (interactive assistant), AWS does not have a direct equivalent — most AWS-first enterprises use Claude or ChatGPT for the interactive channel and Bedrock for the API channel.
Need Expert AI Vendor Evaluation Support?
Redress Compliance provides independent GenAI licensing advisory services — fixed-fee, no vendor affiliations. Our specialists help enterprises evaluate AI providers, negotiate competitive terms, and build multi-vendor strategies that maximise flexibility.
Explore Advisory Services →Multi-cloud or cloud-agnostic organisations have the most flexibility and the most complexity. Anthropic’s Claude is available via direct API, AWS Bedrock, Google Vertex AI, and Microsoft Foundry, making it the most cloud-portable of the major AI providers. This portability is a significant advantage for organisations that want to avoid cloud-specific AI lock-in.
Ecosystem alignment is a genuine advantage, but it can also be a rationalisation for avoiding competitive evaluation. “We are a Microsoft shop so we should use Copilot” is a starting hypothesis, not a conclusion. The vendor that aligns with your ecosystem may not offer the best commercial terms. Always evaluate at least one provider outside your ecosystem to maintain competitive leverage and to validate that the ecosystem premium you are paying is justified.
7. Dimension 5: Data Governance and Compliance
For regulated industries and data-sensitive organisations, data governance is not a checkbox — it is a gating criterion that can eliminate vendors before the commercial evaluation begins.
Data training exclusion: All major enterprise AI providers now commit to not using customer data for model training. However, the granularity varies. Verify that the exclusion covers all tiers, all models, and all use cases — some vendors have exceptions for lower-tier plans, free-tier API access, or certain evaluation features.
Data residency: Where is your data processed and stored? Anthropic offers US-only inference at a 10% pricing premium. Google offers data regions as part of Workspace Enterprise and Gemini Enterprise Standard/Plus. Azure OpenAI processes data within the Azure region you select. Bedrock processes data within the AWS region you deploy to. For EU organisations, verify that inference (not just storage) can be constrained to EU regions.
Compliance certifications: SOC 2 Type II is table stakes. Beyond that, evaluate for ISO 27001/27017/27018/27701, HIPAA configurability (with BAA), FedRAMP authorisation, and any industry-specific requirements (PCI-DSS, HITRUST). OpenAI holds the broadest set of ISO certifications. Google offers FedRAMP High on Gemini Enterprise Standard and Plus. Anthropic provides HIPAA-ready enterprise plans with BAA.
Encryption and access controls: Evaluate encryption at rest (AES-256 is standard) and in transit (TLS 1.2+). For advanced requirements, evaluate support for customer-managed encryption keys (CMEK), VPC Service Controls (Google), PrivateLink (AWS), or Private Endpoints (Azure) to prevent data from traversing the public internet.
Audit and transparency: Enterprise plans should provide audit logs, access transparency, and compliance APIs. Evaluate the granularity of audit data (does it log individual prompts and responses, or only session metadata?) and the retention period for audit logs.
8. Dimension 6: Multi-Vendor Optionality
The smartest procurement strategy in the current AI market is to avoid single-vendor dependency. Models improve unpredictably, pricing shifts frequently, and the competitive landscape reshuffles quarterly. Maintaining the ability to use multiple AI providers — or to switch providers — is a strategic asset with concrete commercial value.
Evaluate portability: How difficult is it to migrate workloads from this provider to a competitor? API-based workloads (Bedrock, Vertex AI, direct API) are relatively portable because the API patterns are similar across providers. Subscription-based workloads (custom agents, knowledge bases, fine-tuned models) are less portable because they rely on provider-specific configurations. Evaluate the cost and timeline of migrating your three most critical AI workloads to an alternative provider.
Evaluate complementarity: Some providers work well in combination. The most common multi-vendor pattern in 2026 is a subscription provider for interactive AI (Claude Enterprise or ChatGPT Enterprise for knowledge workers) plus a cloud-native provider for API workloads (Bedrock or Vertex AI for production applications). This pattern gives you competitive leverage with both providers and avoids dependency on either.
Evaluate contract terms that enable optionality: Avoid exclusivity clauses that prevent you from using competing AI services. Ensure your data is portable and that agent configurations can be exported. Negotiate the right to maintain pilot deployments on competing platforms without triggering contractual penalties.
The enterprise that can credibly tell any AI vendor “we have a production alternative ready to scale” will consistently negotiate 15–30% better terms than one that is perceived as locked in.
9. Dimension 7: Vendor Maturity and Durability
Enterprise AI is a multi-year commitment. Evaluate whether each vendor will still be a viable enterprise partner when your contract is up for renewal.
Financial stability: OpenAI (valued at $300B+, backed by Microsoft), Anthropic (valued at $60B+, backed by Amazon and Google), Google (parent Alphabet, $2T+ market cap), and AWS/Microsoft (trillion-dollar parent companies) are all financially durable in the medium term. Smaller or niche AI providers may offer compelling capabilities but carry higher risk of acquisition, pivot, or discontinuation.
Enterprise go-to-market maturity: Evaluate the vendor’s enterprise sales, support, and account management capabilities. Does the vendor have dedicated enterprise account teams? Is enterprise support available with contractual SLAs? Can you escalate issues to named contacts? OpenAI and Google have built enterprise sales organisations over the past two years. Anthropic is earlier in this journey but growing rapidly. AWS leverages its existing enterprise sales infrastructure.
Product roadmap predictability: AI vendors iterate rapidly, which is both an advantage (you get improved capabilities) and a risk (model deprecation can break production workloads). Evaluate each vendor’s track record for backward compatibility, deprecation notice periods, and migration support. Negotiate contractual model deprecation protections (6–12 months minimum notice) regardless of vendor.
📊 Free Assessment Tool
Ready to compare AI vendors side by side? Our free calculator shows what OpenAI, Anthropic, Google, and AWS actually cost — including the infrastructure layers most comparisons leave out.
Take the Free Assessment →Ecosystem investment: Vendors that are investing heavily in enterprise features (governance, compliance, admin controls, third-party integrations) are signalling long-term commitment to the enterprise market. Vendors that focus primarily on consumer features or developer tooling may deprioritise enterprise needs over time.
10. Applying the Framework: Three Scenarios
Scenario A: 500-Person Consulting Firm, Google Workspace
A professional services firm running entirely on Google Workspace needs AI for email drafting, document summarisation, meeting notes, client research, and proposal generation. They need 50 power users with advanced AI capabilities and 450 users with basic embedded AI.
Highest-scoring provider: Google (Gemini). The 450 standard users already have embedded Gemini in Workspace at no incremental cost. The 50 power users get Gemini Enterprise Business at $21/user/month. Total incremental AI cost: approximately $12,600/year plus existing Workspace spend. Ecosystem alignment is perfect. The risk: Gemini’s interactive AI quality still lags Claude and ChatGPT for complex writing and analysis tasks. Mitigation: evaluate Claude Enterprise for the 50 power users at $40–$60/user — the higher cost may be justified by superior output quality for high-stakes client deliverables.
Scenario B: 2,000-Person Financial Services Firm, Microsoft 365
A regulated financial institution on Microsoft 365 needs AI for compliance document analysis, client communications, internal research, and code generation. HIPAA and SOC 2 are required. Data residency in the US is mandatory.
Highest-scoring provider: Split deployment. Microsoft Copilot for the 1,500 users who need AI embedded in Office applications (familiar interface, consolidated M365 billing, existing Azure security infrastructure). Claude Enterprise for the 200 analysts and compliance specialists who need superior document analysis and reasoning (Claude’s quality advantage on complex analytical tasks is well-documented in this segment). Bedrock for the 50 developers building AI-powered compliance tools (consumption-based pricing, multi-model flexibility, existing AWS infrastructure). Total estimated cost: Copilot $540,000/year + Claude Enterprise $120,000/year + Bedrock $80,000/year = $740,000/year. The multi-vendor approach provides competitive leverage, best-in-class capabilities per user segment, and no single-vendor dependency.
Scenario C: 300-Person Technology Company, AWS-Native
An AWS-native SaaS company needs AI for product features (customer-facing chatbot, document processing), internal productivity (engineering, sales, marketing), and development tooling (code completion, code review).
Highest-scoring provider: AWS Bedrock + Anthropic Claude. Bedrock for all production API workloads (integrated with existing AWS infrastructure, PPA/EDP discount applies, multi-model access for routing and fallback). Claude Team for 100 internal knowledge workers at $20–$25/user/month. Gemini Code Assist or Claude Code for 80 engineers. Total estimated cost: Bedrock $120,000–$200,000/year (consumption) + Claude Team $30,000/year + code assist $18,000/year = $168,000–$248,000/year. The Bedrock/Claude combination provides both API-grade infrastructure and interactive-quality AI without ecosystem conflict.
11. Running the Selection Process
Step 1: Qualify on Model Quality (2 Weeks)
Run a lightweight technical evaluation to confirm that each shortlisted provider’s models meet your quality threshold for the intended use cases. This is a pass/fail gate, not a scoring exercise. Any provider that passes proceeds to the commercial evaluation. Do not score or rank providers on technical grounds at this stage.
Step 2: Score on the Seven Dimensions (3–4 Weeks)
Evaluate each qualified provider across all seven dimensions. Assign weights based on your organisation’s priorities (cost-sensitive organisations weight Dimensions 1–3 more heavily; regulated organisations weight Dimension 5 more heavily; multi-cloud organisations weight Dimension 6 more heavily). Score each dimension on a 1–5 scale with documented evidence for each score.
Step 3: Build Parallel TCO Models (2 Weeks)
Build a detailed three-year TCO model for the top two or three providers. Include all five cost layers described in Dimension 2. Use realistic adoption curves (not vendor projections). Model both a base case and a downside scenario where adoption is 40% below projection. The provider with the lowest downside-case TCO is often a better choice than the provider with the lowest base-case TCO.
Step 4: Negotiate in Parallel (4–6 Weeks)
Issue parallel RFPs or negotiation requests to at least two providers simultaneously. Do not reveal your preferred vendor. The existence of a credible alternative is the single most powerful negotiation lever in enterprise AI procurement. Evaluate not only the pricing offered but the contractual flexibility, the willingness to accommodate true-down provisions, and the quality of the commercial response.
Step 5: Select and Sign (2 Weeks)
Select the provider with the highest weighted score across the seven dimensions, validated by the lowest risk-adjusted TCO, and supported by the best negotiated contract terms. After signing, maintain the runner-up relationship at a pilot or evaluation scale to preserve competitive leverage for the renewal cycle.
12. FAQ
Should model quality be a factor in vendor selection?
Model quality is a qualifying criterion, not a differentiating factor. Any provider whose models do not meet your quality threshold should be eliminated. Among qualified providers, the marginal differences in model quality are far less impactful on your enterprise outcome than the commercial, contractual, and ecosystem factors described in this framework. Model quality changes quarterly; contract terms last years.
How many vendors should we evaluate?
Three to four for the technical qualification, narrowing to two or three for the commercial evaluation and parallel negotiation. Evaluating fewer than two eliminates competitive leverage. Evaluating more than four creates evaluation fatigue without improving decision quality.
Should we use a single vendor or multiple vendors?
Most enterprises in 2026 benefit from a multi-vendor approach: one provider for interactive AI (subscriptions for knowledge workers), one for API workloads (cloud-native consumption for production applications), and optionally one for specialised tooling (code assist, data analysis). This pattern delivers best-in-class capabilities per use case, competitive leverage with every vendor, and no single point of commercial dependency.
How do we maintain competitive leverage after selection?
Keep the runner-up active at a pilot or evaluation scale. Maintain at least one production workload on an alternative provider. Begin renewal planning 6–12 months before contract expiration with a fresh competitive evaluation. The credible threat of switching is the most effective pricing lever at renewal.
What is the biggest mistake enterprises make in AI vendor selection?
Allowing the technical evaluation to predetermine the vendor before procurement has evaluated the commercial terms. When engineering selects a vendor on technical grounds alone and hands procurement a single-vendor mandate, the organisation negotiates from its weakest possible position. Run technical and commercial evaluations in parallel with equal weight.
How long should the selection process take?
Eleven to fourteen weeks from requirements definition through contract signature. Faster processes sacrifice thoroughness. Longer processes risk decision paralysis and vendor fatigue. The timeline above (2 weeks technical + 4 weeks scoring + 2 weeks TCO + 5 weeks negotiation + 2 weeks finalisation) balances rigour with urgency.
Where can we get independent help with this process?
Redress Compliance provides independent advisory on enterprise AI vendor selection, covering commercial evaluation, TCO modelling, contract negotiation, and ongoing vendor management across Anthropic, OpenAI, Google, AWS, and Microsoft. We apply this framework with our clients to ensure that vendor selection decisions are driven by commercial reality, not benchmark scores. Learn more about our independent GenAI advisory services Negotiation Services →