← Back to Dashboard

Compute CPI Methodology

Version 2.0 • February 2026

What It Measures

Compute CPI tracks the changing cost of a representative "basket" of AI work, expressed as an index (base=100) and an inflation rate (MoM/YoY).

Like the Consumer Price Index measures the cost of household goods over time, Compute CPI measures the cost of common AI workloads. This enables organizations to understand AI cost trends, forecast budgets, and benchmark procurement decisions.

The Basket

The index is built on six workload categories, each representing a common pattern of AI usage:

Category Input Tokens Output Tokens Weight
Chat / Drafting 2,000 500 20%
Summarization 10,000 500 25%
Classification 500 50 20%
Coding 3,000 1,000 15%
Judgment / Reasoning 5,000 2,000 10%
Long-Context Synthesis 50,000 1,000 10%

These weights reflect typical institutional usage patterns. The Civic CPI (coming Q2 2026) will use different weights optimized for public sector workloads.

Calculation

For each workload category i, we calculate cost as:

Cost(i) = (Tin/1M × $/1Min) + (Tout/1M × $/1Mout)
Where Tin = input tokens, Tout = output tokens

The weighted basket cost is:

Basket Cost = Σ(weighti × Cost(i))

The index value is calculated relative to the baseline period:

Compute CPI(t) = 100 × BasketCost(t) / BasketCost(base)
Where base = February 2026

Model Selection

Each workload category uses prices from representative models in the appropriate tier:

Tier Used For Representative Models
Budget Classification GPT-4o-mini, Gemini Flash, Claude Haiku
General Chat, Summarization GPT-4o, Gemini Pro, Claude Sonnet
Frontier Coding GPT-4o, Claude Sonnet, Gemini Pro
Reasoning Judgment o1, o3-mini, DeepSeek-R1
Long-Context Synthesis Gemini Pro (2M), Claude (200K)

Costs are averaged across models in each tier to reduce sensitivity to any single provider's pricing decisions.

Subindices

Ticker Name What It Tracks
$JUDGE Judgment CPI Cost of reasoning-intensive workloads
$LCTX LongContext CPI Cost of high-context workloads
$BULK Budget Tier Cost of cheapest throughput models
$FRONT Frontier Tier Cost of best capability models

Index Series

The index series allows comparison against multiple base periods, providing different perspectives on AI cost trends:

Series Ticker Base Period Use Case
Since Launch $CPI-L February 2025 Long-term trend analysis
Year-over-Year $CPI-Y 365 days ago Annual comparison
Quarter-to-Date $CPI-Q Start of current quarter Quarterly budgeting
Month-to-Date $CPI-M Start of current month Monthly tracking
Week-over-Week $CPI-W 7 days ago Short-term changes

All series use the same basket and methodology—only the comparison period changes. A value of 100 means costs are unchanged from the base period; values below 100 indicate deflation.

Methodology Variants

Different organizations have different workload mixes. Methodology variants apply alternative weightings to the same basket categories:

Variant Ticker Focus
General Purpose $CPI-GEN Balanced workload mix (default weights)
Frontier Heavy $CPI-FRO Emphasis on coding (35%) and reasoning (25%)
Budget Optimized $CPI-BUD Cost-conscious: classification (30%), chat (30%)
Reasoning Focus $CPI-REA Heavy reasoning (45%), long context (15%)
Enterprise Mix $CPI-ENT Summarization (30%), classification (25%)

Compare your organization's workload mix to these variants to find the most relevant inflation measure for your use case.

Trend Analysis

Beyond point-in-time measurements, we provide trend analysis to help forecast future costs:

  • Direction: Deflating, stable, or inflating based on recent velocity
  • Velocity: Rate of change in CPI points per month
  • Acceleration: Change in velocity (is deflation speeding up or slowing?)
  • 30-day Forecast: Projected CPI value if current trend continues
  • Confidence: High/medium/low based on data availability
Velocity = (ΔCPI / Δdays) × 30
Normalized to monthly rate

Direction thresholds: velocity < -1 = deflating, velocity > +1 = inflating.

Yield Curve

The yield curve shows annualized deflation (or inflation) rates at different time horizons—similar to bond yield curves but measuring AI compute cost changes:

Horizon Calculation
1W, 1M, 3M, 6M, 1Y ((current - historical) / historical) × (365 / days) × 100

Interpretation: Negative rates indicate deflation (costs falling), positive rates indicate inflation. A "normal" curve shows steeper deflation at shorter horizons that moderates over longer periods.

Note: The yield curve requires sufficient historical data. Points may be unavailable during the initial benchmarking period.

Spreads

Spreads measure the premium paid for specific capabilities:

Ticker Name Calculation
$COG-P Cognition Premium $FRONT − $BULK
$JDG-P Judgment Premium $JUDGE − $FRONT
$CTX-P Context Premium $LCTX − $FRONT

Exchange Rates

Cognitive Exchange Rates show the relative cost between model tiers, expressed as token equivalents. This makes opportunity cost instantly visible—like forex cross-rates for AI compute.

Base Currency: $UTIL (Utility Token)

The base is Gemini Flash, representing cheap utility compute. All other tiers are expressed as multiples of this base cost.

Calculation:

Rate = TierBlendedCost / BaseBlendedCost
BlendedCost = (InputCost × 0.75) + (OutputCost × 0.25)

A rate of "1 $FRONT = 64 $UTIL" means one frontier token costs as much as 64 utility tokens. This helps teams understand the opportunity cost of using expensive models for tasks that could run on cheaper ones.

Limitations: Exchange rates measure cost only, not capability. A task that requires frontier reasoning cannot simply be run on 64× more utility tokens.

Build Cost Index (Persona Baskets)

Different teams have different workload mixes. The Build Cost Index tracks inflation for three representative build patterns, each with its own basket weightings.

Persona Ticker Description
Startup Builder $START Building AI-first products: 50% coding, 30% RAG context, 20% routing
Agentic Team $AGENT Running autonomous agents: 70% reasoning, 20% tool use, 10% final output
Throughput $THRU High-volume processing: 80% extraction, 20% classification

Each persona sees different inflation depending on which model tiers they rely on most heavily. A team building agents (heavy reasoning) will see different cost pressure than a team doing high-volume extraction (mostly budget tier).

Benchmarking Period: For the first 30 days after launch, persona MoM changes show "Benchmarking" as we accumulate historical data.

Data Sources

Source Data Purpose
OpenRouter API Live prices Primary source for current spot rates
LiteLLM Database Comprehensive pricing 2000+ models for coverage depth
pydantic/genai-prices Historical prices Backfill for MoM/YoY calculations
simonw/llm-prices Historical archive Cross-reference and validation
All data sources are publicly accessible. The methodology is fully auditable. Anyone can verify our calculations using the same inputs.

Historical Methodology

Baseline: February 2026 = 100. This baseline is immutable once set.

Historical Reconstruction: To enable MoM and YoY calculations from day one, we backfilled historical data using archived prices from pydantic/genai-prices and simonw/llm-prices.

Model Substitution: When exact historical models aren't available, we use the closest equivalent from the same provider and tier. For example, historical data may use gemini-1.5-pro where current data uses gemini-2.5-pro.

Reconstructed Flag: Historical snapshots are marked with reconstructed: true to distinguish them from live calculations.

Update Schedule

  • Daily: Index values updated at 06:00 UTC
  • Quarterly: Full reports with analysis and commentary
  • As needed: Methodology revisions (versioned and documented)

Independence

Occupant does not accept referral fees, sponsored rankings, or payments from model providers. The index is funded independently and maintained as a public resource.

Our incentive is accuracy and utility, not revenue from recommendations.

Limitations

  • Basket assumptions: The workload definitions and weights reflect our best estimate of typical usage. Your organization's actual usage may differ.
  • Model selection: We track major commercial models. Open-source and self-hosted options are not included in the current methodology.
  • Quality normalization: We group models by tier but do not adjust for quality differences within tiers.
  • Latency and throughput: The index measures cost only, not performance characteristics like speed or availability.

Future Development

  • Civic CPI: Weights optimized for public sector workloads (intake, eligibility, appeals, compliance)
  • Quality-adjusted indices: Incorporating capability scores into cost calculations
  • Regional indices: Tracking price differences across deployment regions
  • API access: Programmatic access for researchers and governance teams

Contact

Questions about methodology? Suggestions for improvement? Interested in collaboration?

research@occupant.ee