Unit Economics for AI Products: A Cost Framework Beyond Tokens

Summary: If you're calculating AI costs as "tokens × price," you're seeing 20-40% of your actual spend. The real unit of accounting is the trace—a complete workflow that includes orchestration overhead, tool calls, retrieval costs, and failure paths.

The Token Fallacy

Most AI startups calculate costs as: Monthly Cost = Total Tokens × Price per Token.

Simple. Clear. But incomplete.

This model worked when AI products were single-call wrappers around GPT-3. But modern AI products aren't single calls—they're workflows.

Your "simple" customer support agent actually:

Embeds the user query (embedding model cost)
Retrieves context from a vector database (infrastructure cost)
Calls a planner model to decide what to do (hidden tokens)
Executes a tool call to your CRM (external API cost)
Generates a response (visible tokens)
Runs a safety check (guardrail model cost)

That's 6+ cost centers in a single interaction. Token math captures maybe two of them.

The Real Unit: The Trace

The unit of accounting is the trace, not the LLM call.

A trace is a complete workflow execution—from user request to final response—containing multiple spans (individual operations). Each span has its own cost profile.

A typical agent trace includes: retrieval (embedding + vector DB), reranking, tool calls, LLM generation (often multiple steps), guardrail checks, and quality verification. This is how observability platforms like Langfuse model AI applications—because it's how costs actually accumulate.

When you think in traces, uncomfortable truths emerge:

Your "cheap" Haiku-powered feature costs $0.15/trace because of the retrieval pipeline around it
Your agent workflow costs $2.40/trace, not the $0.08 your token math suggested
40% of your spend is on steps users never see

What Lives Inside a Trace

LLM Costs (What You're Already Tracking)

Component	Notes
Input/output tokens	The visible prompt and response
Hidden tokens	Planning, routing, self-critique—billed but invisible
Cached tokens	50-90% discount when prompts share stable prefixes

The hidden tokens trap: Multi-step agents generate 5-20x more internal tokens than visible output.

Tool Costs (The Forgotten Line Item)

Tool Type	Cost Model
Web search (Serper, Tavily)	$0.001-0.01 per query
CRM/ticketing APIs	Per-call or monthly + overage
Database queries	Per-read, varies by provider

If your agent calls three tools per trace, that's three line items your token math ignores.

Retrieval Costs (RAG Isn't Free)

Component	Cost Type
Vector database	Monthly ($70-500+ for Pinecone)
Query embedding	Per-search
Context stuffing	Retrieved chunks become input tokens

One e-commerce company saw costs jump from $5K/month in prototyping to $50K/month in staging—unoptimized RAG queries were fetching 10x more context than needed.

Reliability & Failure Costs

Production AI adds model calls for quality assurance: offline evals, online judges, guardrails, and incident reprocessing. A workflow that cost $0.05/trace in development often costs $0.15/trace in production after adding these layers.

And failures still cost money: tokens consumed before failure, retries from rate limits, fallback chains. If your retry rate is 5%, you're paying 5% more than happy-path math suggests.

The Complete Formula

Cost per Feature Invocation = Σ(step_cost) + Σ(tool_cost) + retrieval_cost + reliability_cost + failure_cost

Then calculate margins at the feature level, not the product level:

Feature	Avg Trace Cost	Revenue/Use	Margin
Quick chat	$0.02	$0.05	60%
Doc analysis	$0.18	$0.25	28%
Agent workflow	$1.85	$2.00	7.5%
Research task	$4.20	$3.00	-40%

That last row is margin compression from feature-level cost variance. A single negative-margin feature, heavily used by power users, can offset profits from everything else—invisible when you only track aggregate token spend.

We've written about this pattern in Usage Variance in AI Products.

See your margins at the feature level →

Attribution Makes It Actionable

Even if you track all these costs, attributing them matters. Finance can't tie token spend to business units. Product can't see which features are losing money.

The fix is trace-native attribution: tagging every trace with customer ID, feature name, model version, and environment. This lets you answer questions like "Why did costs spike Tuesday?" or "Which customers are unprofitable?" without engineering help.

If you're using OpenTelemetry (a standard for application tracing), these tags flow naturally through your existing infrastructure.

The Bottom Line

Token-based cost thinking reflects an earlier era of wrapper apps. Modern AI products are workflows—multi-step, tool-heavy, reliability-wrapped—and they need workflow-level cost accounting.

Cost-per-trace visibility at the feature level is essential for accurate margin calculations.

We built Bear Lumen because we lived this problem: per-customer margin visibility, feature-level cost attribution, and pricing scenario modeling. If any of this resonated, request early access. We're onboarding our founding cohort now.