Methodology | Occupant Compute CPI

What It Measures

Compute CPI tracks the changing cost of a representative "basket" of AI work, expressed as an index (base=100) and an inflation rate (MoM/YoY).

Like the Consumer Price Index measures the cost of household goods over time, Compute CPI measures the cost of common AI workloads. This enables organizations to understand AI cost trends, forecast budgets, and benchmark procurement decisions.

The Basket

The index is built on six workload categories, each representing a common pattern of AI usage:

Category	Input Tokens	Output Tokens	Weight
Chat / Drafting	2,000	500	20%
Summarization	10,000	500	25%
Classification	500	50	20%
Coding	3,000	1,000	15%
Judgment / Reasoning	5,000	2,000	10%
Long-Context Synthesis	50,000	1,000	10%

These weights reflect typical institutional usage patterns. The Civic CPI (coming Q2 2026) will use different weights optimized for public sector workloads.

Calculation

For each workload category i, we calculate cost as:

Cost(i) = (T_in/1M × $/1M_in) + (T_out/1M × $/1M_out)

Where T_in = input tokens, T_out = output tokens

The weighted basket cost is:

Basket Cost = Σ(weight_i × Cost(i))

The index value is calculated relative to the baseline period:

Compute CPI(t) = 100 × BasketCost(t) / BasketCost(base)

Where base = February 2026

Model Selection

Each workload category uses prices from representative models in the appropriate tier:

Tier	Used For	Representative Models
Budget	Classification	GPT-4o-mini, Gemini Flash, Claude Haiku
General	Chat, Summarization	GPT-4o, Gemini Pro, Claude Sonnet
Frontier	Coding	GPT-4o, Claude Sonnet, Gemini Pro
Reasoning	Judgment	o1, o3-mini, DeepSeek-R1
Long-Context	Synthesis	Gemini Pro (2M), Claude (200K)

Costs are averaged across models in each tier to reduce sensitivity to any single provider's pricing decisions.

Subindices

Ticker	Name	What It Tracks
`$JUDGE`	Judgment CPI	Cost of reasoning-intensive workloads
`$LCTX`	LongContext CPI	Cost of high-context workloads
`$BULK`	Budget Tier	Cost of cheapest throughput models
`$FRONT`	Frontier Tier	Cost of best capability models

Index Series

The index series allows comparison against multiple base periods, providing different perspectives on AI cost trends:

Series	Ticker	Base Period	Use Case
Since Launch	`$CPI-L`	February 2025	Long-term trend analysis
Year-over-Year	`$CPI-Y`	365 days ago	Annual comparison
Quarter-to-Date	`$CPI-Q`	Start of current quarter	Quarterly budgeting
Month-to-Date	`$CPI-M`	Start of current month	Monthly tracking
Week-over-Week	`$CPI-W`	7 days ago	Short-term changes

All series use the same basket and methodology—only the comparison period changes. A value of 100 means costs are unchanged from the base period; values below 100 indicate deflation.

Methodology Variants

Different organizations have different workload mixes. Methodology variants apply alternative weightings to the same basket categories:

Variant	Ticker	Focus
General Purpose	`$CPI-GEN`	Balanced workload mix (default weights)
Frontier Heavy	`$CPI-FRO`	Emphasis on coding (35%) and reasoning (25%)
Budget Optimized	`$CPI-BUD`	Cost-conscious: classification (30%), chat (30%)
Reasoning Focus	`$CPI-REA`	Heavy reasoning (45%), long context (15%)
Enterprise Mix	`$CPI-ENT`	Summarization (30%), classification (25%)

Compare your organization's workload mix to these variants to find the most relevant inflation measure for your use case.

Trend Analysis

Beyond point-in-time measurements, we provide trend analysis to help forecast future costs:

Direction: Deflating, stable, or inflating based on recent velocity
Velocity: Rate of change in CPI points per month
Acceleration: Change in velocity (is deflation speeding up or slowing?)
30-day Forecast: Projected CPI value if current trend continues
Confidence: High/medium/low based on data availability

Velocity = (ΔCPI / Δdays) × 30

Normalized to monthly rate

Direction thresholds: velocity < -1 = deflating, velocity > +1 = inflating.

Yield Curve

The yield curve shows annualized deflation (or inflation) rates at different time horizons—similar to bond yield curves but measuring AI compute cost changes:

Horizon	Calculation
1W, 1M, 3M, 6M, 1Y	((current - historical) / historical) × (365 / days) × 100

Interpretation: Negative rates indicate deflation (costs falling), positive rates indicate inflation. A "normal" curve shows steeper deflation at shorter horizons that moderates over longer periods.

Note: The yield curve requires sufficient historical data. Points may be unavailable during the initial benchmarking period.

Spreads

Spreads measure the premium paid for specific capabilities:

Ticker	Name	Calculation
`$COG-P`	Cognition Premium	$FRONT − $BULK
`$JDG-P`	Judgment Premium	$JUDGE − $FRONT
`$CTX-P`	Context Premium	$LCTX − $FRONT

Exchange Rates

Cognitive Exchange Rates show the relative cost between model tiers, expressed as token equivalents. This makes opportunity cost instantly visible—like forex cross-rates for AI compute.

Base Currency: $UTIL (Utility Token)

The base is Gemini Flash, representing cheap utility compute. All other tiers are expressed as multiples of this base cost.

Calculation:

Rate = TierBlendedCost / BaseBlendedCost
BlendedCost = (InputCost × 0.75) + (OutputCost × 0.25)

A rate of "1 $FRONT = 64 $UTIL" means one frontier token costs as much as 64 utility tokens. This helps teams understand the opportunity cost of using expensive models for tasks that could run on cheaper ones.

Limitations: Exchange rates measure cost only, not capability. A task that requires frontier reasoning cannot simply be run on 64× more utility tokens.

Build Cost Index (Persona Baskets)

Different teams have different workload mixes. The Build Cost Index tracks inflation for three representative build patterns, each with its own basket weightings.

Persona	Ticker	Description
Startup Builder	`$START`	Building AI-first products: 50% coding, 30% RAG context, 20% routing
Agentic Team	`$AGENT`	Running autonomous agents: 70% reasoning, 20% tool use, 10% final output
Throughput	`$THRU`	High-volume processing: 80% extraction, 20% classification

Each persona sees different inflation depending on which model tiers they rely on most heavily. A team building agents (heavy reasoning) will see different cost pressure than a team doing high-volume extraction (mostly budget tier).

Benchmarking Period: For the first 30 days after launch, persona MoM changes show "Benchmarking" as we accumulate historical data.

Data Sources

Source	Data	Purpose
OpenRouter API	Live prices	Primary source for current spot rates
LiteLLM Database	Comprehensive pricing	2000+ models for coverage depth
pydantic/genai-prices	Historical prices	Backfill for MoM/YoY calculations
simonw/llm-prices	Historical archive	Cross-reference and validation

All data sources are publicly accessible. The methodology is fully auditable. Anyone can verify our calculations using the same inputs.

Historical Methodology

Baseline: February 2026 = 100. This baseline is immutable once set.

Historical Reconstruction: To enable MoM and YoY calculations from day one, we backfilled historical data using archived prices from pydantic/genai-prices and simonw/llm-prices.

Model Substitution: When exact historical models aren't available, we use the closest equivalent from the same provider and tier. For example, historical data may use gemini-1.5-pro where current data uses gemini-2.5-pro.

Reconstructed Flag: Historical snapshots are marked with reconstructed: true to distinguish them from live calculations.

Update Schedule

Daily: Index values updated at 06:00 UTC
Quarterly: Full reports with analysis and commentary
As needed: Methodology revisions (versioned and documented)

Independence

Occupant does not accept referral fees, sponsored rankings, or payments from model providers. The index is funded independently and maintained as a public resource.

Our incentive is accuracy and utility, not revenue from recommendations.

Limitations

Basket assumptions: The workload definitions and weights reflect our best estimate of typical usage. Your organization's actual usage may differ.
Model selection: We track major commercial models. Open-source and self-hosted options are not included in the current methodology.
Quality normalization: We group models by tier but do not adjust for quality differences within tiers.
Latency and throughput: The index measures cost only, not performance characteristics like speed or availability.

Future Development

Civic CPI: Weights optimized for public sector workloads (intake, eligibility, appeals, compliance)
Quality-adjusted indices: Incorporating capability scores into cost calculations
Regional indices: Tracking price differences across deployment regions
API access: Programmatic access for researchers and governance teams

Contact

Questions about methodology? Suggestions for improvement? Interested in collaboration?

research@occupant.ee