Margin erosion doesn't announce itself. It accumulates.
When your first customers start using the AI product, it feels like validation. The usage metrics come in—requests, tokens (the units AI providers charge by), completions—and the product is working. Revenue starts flowing.
Early losses are acceptable. You're investing in quality. You want customers to love this new capability, so you route to the best models, tune prompts carefully, and make sure the experience is good enough to keep them coming back.
More customers onboard. Revenue grows. The AI feature becomes a selling point.
Then the costs start climbing. A few spikes appear in the provider invoices. The heavy users—the ones you celebrated for high engagement—are consuming far more than the pricing model assumed. One customer paying $20 a month has quietly accumulated $2,000 in API costs. Another "power user" on a $99 plan has run up $15,000 in provider fees over three months.
By the time the pattern is obvious, margin erosion is already baked in. 84% of AI companies report it. Only 15% can forecast infrastructure costs within ±10%. Accurate cost tracking is hard—especially when providers release new models and pricing every few months and cost calculation is a black box.
The billing infrastructure usually works fine. Stripe, Metronome, Orb—these systems track usage, calculate charges, and collect revenue. What's missing is the other half: cost visibility. Billing tells you what you charged Customer A. It doesn't tell you what Customer A is costing you to serve.
To know if you're profitable, you need both sides of the equation:
| Layer | Question Answered | Key Players | Output |
|---|---|---|---|
| Billing Infrastructure | "What should we charge?" | Stripe, Orb, Chargebee, Zuora | Invoices, payment collection |
| Cost Visibility | "Is this customer profitable at that price?" | CloudZero, Vantage, Kubecost, Finout, Bear Lumen | Profitability dashboards, margin alerts |
Billing Infrastructure: The Revenue Side
Billing infrastructure meters usage, calculates charges, generates invoices, and processes payments—the "quote-to-cash" pipeline. This side of the stack has evolved rapidly, translating the high-variance costs of AI into something finance teams and customers can understand.
Key Players:
-
Stripe: The default payment infrastructure for startups. Handles subscriptions, invoicing, and payment collection. Most AI companies start here. Now offers usage-based billing for AI and other high-volume usage models. And recently announced Agentic Commerce to help companies build AI-first products. It's a solid billing platform, but it doesn't provide margin intelligence.
-
Metronome: Stripe's acquisition of Metronome created an integrated platform for usage-based billing. Metronome's metering—already used by OpenAI, Anthropic, and NVIDIA—handles per-token charges, tiered rates, and hybrid subscriptions. Metronome markets "margin analysis" capabilities, but this is a bring-your-own-COGS model: they export granular revenue data to your warehouse, and you build the cost ingestion pipelines yourself.
-
Orb: Built for high-volume usage data, Orb tracks any usage metric—token counts, model-specific usage, API call volumes, compute—and combines it with complex pricing logic. Particularly strong for AI-native platforms with millions of billable events.
-
Chargebee, Zuora: Enterprise billing platforms that have added usage-based capabilities to their traditional subscription management. These platforms are more mature and have a longer history of working with enterprises, but they are not as specialized for AI and high-volume usage models.
This is solved infrastructure. 85% of high-growth firms have adopted hybrid pricing models, and billing platforms have responded with robust tooling.
Cost Visibility: The Profitability Side
Cost visibility attributes AI infrastructure costs to customers, enabling margin calculation and profitability analysis.
Key Players:
-
CloudZero: Cost intelligence platform that tracks costs by business dimensions—cost per customer, cost per feature, cost per AI token. Requires tagging infrastructure but provides deep unit economics.
-
Vantage: Multi-cloud visibility with native integrations across AWS, Azure, GCP, Kubernetes, and AI providers including OpenAI, Databricks, and Snowflake. Lower setup friction than CloudZero.
-
Kubecost: Open-source Kubernetes cost monitoring, often the entry point for teams starting with FinOps. Version 3.0 added GPU optimization for AI workloads.
-
Finout: "Cost observability" platform with a unified view across cloud and SaaS costs, designed for 100% cost allocation in messy multi-cloud environments.
-
Bear Lumen: Per-customer cost attribution specifically for AI API costs, with margin analysis and multi-provider consolidation across OpenAI, Anthropic, and Google.
Two Types of Cost Visibility
The cost visibility market serves two distinct problems:
Infrastructure cost governance answers: "What does our AI infrastructure cost?" These tools excel at internal cost allocation—chargebacks between departments, infrastructure budget forecasting, total spend tracking. They show your total OpenAI spend by team, feature, or time period. (Kubecost, CloudZero, Vantage, Finout, Mavvrik)
AI reseller margin intelligence answers: "Are our customers profitable when we resell AI?" This is a different problem. If you're charging customers for AI features powered by OpenAI, Anthropic, or other providers, you need to reconcile API costs against customer revenue. Customer X on Feature Y might be underwater because you're routing to GPT-4 when Sonnet would work at 10% the cost.
The distinction matters: if you're an enterprise running local LLMs, you need infrastructure cost governance. If you're an AI SaaS company reselling API-based AI to customers, you need margin intelligence at the customer level.
| Tool | What It Monitors | Who It's For | Question Answered |
|---|---|---|---|
| Kubecost | Kubernetes clusters, GPUs | DevOps running self-hosted infra | "What do our K8s workloads cost?" |
| CloudZero | Cloud spend by business dimension | Engineering/FinOps at cloud-heavy orgs | "What does each team/feature cost us?" |
| Vantage | Multi-cloud + some AI providers | Multi-cloud teams wanting quick setup | "What's our total cloud spend?" |
| Finout | Cloud + SaaS spend | Finance/IT at enterprises | "How do we allocate costs across units?" |
| Mavvrik | GPU, cloud, SaaS, on-prem AI infra | CFOs at enterprises adopting AI | "Where is our AI infrastructure money going?" |
| Bear Lumen | AI API costs tied to customer revenue | Founders at AI SaaS companies | "Which customers are profitable at our pricing?" |
This becomes concrete when you look at how AI costs actually behave.
The Cost Tracking Accuracy Problem
Here's what makes AI cost visibility harder than traditional cloud costs: token counts ≠ cost accuracy. What gets billed often differs dramatically from what gets reported.
According to FinOps Foundation, advertised per-token prices are misleading. The real costs hide in the details:
| Hidden Cost Factor | What Happens |
|---|---|
| Output vs. input | Output tokens cost 3-5x more than input tokens |
| Reasoning tokens | Models like o1 consume hidden reasoning tokens—one developer expected ~$5, got billed $20. Mathematical proofs can use 87,000 tokens that never appear in API responses |
| Vision multipliers | A user reported 490 tokens in API metadata but was billed for 8,539—an 18x multiplier for image analysis |
| Long-context pricing | Claude and Gemini charge 2x for requests over 200K tokens—all tokens, not just the excess |
| Caching variance | Anthropic charges 1.25-2x for cache writes but 0.1x for reads. Same token count might cost $1 or $20 depending on execution path |
| Tool use overhead | Anthropic adds 346 tokens per request when tools are registered, 735 for computer use |
| Provider variance | Identical prompts produce 10-20% token count variance across providers due to different tokenizers |
Even sophisticated teams get tracking wrong. A documented Microsoft Semantic Kernel bug caused the SDK to report only the final LLM call's tokens in agent workflows—all intermediate tokens were omitted. Organizations using agent patterns could be paying 2-10x more than their tracking tools show.
How teams typically track AI costs:
| Method | How It Works | Accuracy | Tradeoff |
|---|---|---|---|
| Request Counting | Charge per API call regardless of complexity | Low | Treats "hello" and a 10,000-token response equally |
| Token Estimation | Estimate tokens using approximation | Moderate | Requires database infrastructure; varies by model tokenizer |
| Actual Token Counts | Use provider-supplied input/output token data | High | Requires schema with timestamp tracking; best for FinOps integration |
Most teams start with request counting because it's simple. But when production AI usage crosses billions of tokens monthly, the accuracy gap creates real margin visibility problems.
Why Cost Visibility Is Hard to Build Yourself
Smart teams haven't solved this because it's genuinely difficult infrastructure to build and maintain:
1. Provider pricing is complex and changes frequently. Each provider has different pricing structures—input vs output tokens, vision multipliers, caching discounts, reasoning token overhead. Keeping track of current rates across OpenAI, Anthropic, and Google requires ongoing maintenance as models and pricing change.
2. Costs don't match what you track. Your SDK reports 490 tokens. The provider bills you for 8,539. Hidden reasoning tokens, vision multipliers, caching logic, and tool overhead create gaps between tracked usage and actual costs. Reconciling these requires deep knowledge of each provider's pricing quirks.
3. Attribution requires request-level data. Knowing your total OpenAI spend is easy. Knowing that Customer A's usage of Feature B cost $47.32 yesterday requires capturing metadata on every request, joining it with provider billing data, and handling edge cases (retries, streaming, batched requests).
4. Multi-provider routing complicates everything. If you route some requests to GPT-4 and others to Claude, you need to track which customer hit which model, at what price, with what token count (using that provider's tokenizer). The same prompt costs different amounts depending on the path.
5. It's not a one-time build. Providers change pricing, add models, modify APIs. Anthropic's caching pricing is different from OpenAI's. New models have different cost structures. The infrastructure requires ongoing maintenance.
This is why most teams either don't have cost visibility or maintain brittle spreadsheet-based reconciliation that runs monthly (too slow to catch margin problems early).
What Billing Infrastructure Doesn't Solve
Billing infrastructure handles revenue collection. What it doesn't handle:
| Problem | What's Missing | Concrete Example |
|---|---|---|
| Per-customer cost attribution | Billing knows what you charged Customer A. Not what Customer A cost. | Customer A and Customer B both pay $100/month. One costs $30 to serve, the other costs $120. Billing can't distinguish them. |
| Margin analysis by dimension | Which customers are profitable? Which features drive costs vs. revenue? | Your image analysis feature might have -35% margins while text features run at 60%—billing can't show this. |
| Proactive margin alerts | Billing alerts on payment failures. Cost visibility alerts on margin problems. | A customer's usage pattern shifted to GPT-4 from GPT-3.5—costs tripled but billing only shows the charge. |
| Multi-provider consolidation | Blended cost-to-serve across OpenAI, Anthropic, Google, open-source models. | Identical prompts cost 10-20% different across providers due to tokenizer variance. |
Stripe + Metronome, Orb, and other billing platforms handle the top half of the equation. They don't claim to handle the bottom half.
Metronome vs. Bear Lumen: What's Actually Built-In?
| Capability | Metronome | Bear Lumen |
|---|---|---|
| Customer usage tracking | ✅ Built-in | ✅ Built-in |
| Revenue/invoicing | ✅ Built-in | ❌ (complement) |
| Provider pricing data | ❌ DIY maintenance | ✅ Built-in |
| Cost-to-customer attribution | ❌ DIY in warehouse | ✅ Built-in |
| Margin calculation | ❌ DIY in BI tools | ✅ Built-in |
| Multi-provider reconciliation | ❌ Not addressed | ✅ Built-in |
| Real-time margin alerts | ❌ DIY | ✅ Built-in |
Metronome excels at revenue infrastructure. Bear Lumen excels at cost visibility. They're complementary—the value is not having to build the cost calculation, attribution, and margin analysis pipelines yourself.
How Bear Lumen Works
Bear Lumen connects AI provider costs to customer revenue. Here's what that means concretely:
1. SDK captures request metadata. A lightweight SDK wraps your AI calls and captures: customer ID, feature/endpoint, model used, token counts, and timestamp. This happens at the request level—every API call is tagged.
2. Cost calculation using provider pricing. Bear Lumen maintains current pricing for OpenAI, Anthropic, and Google models. When a request comes in, the system calculates the actual cost based on model, token counts, and pricing factors (input vs output tokens, caching, vision multipliers).
3. Cost-to-customer attribution. The system attributes calculated costs to customers. Customer A made 847 requests yesterday using GPT-4 and Claude, totaling $143.27 in provider costs. Customer B made 2,341 requests totaling $52.18. Same plan, 3x cost difference.
4. Margin calculation. Revenue data (from Stripe or your billing system) combines with attributed costs. The dashboard shows contribution margin by customer, by feature, by model—updated continuously, not monthly.
5. Alerts on margin problems. When a customer's margin drops below threshold, or a feature's cost-to-serve spikes, you know immediately—not when the monthly invoice reconciliation happens.
Setup: SDK integration typically takes 1-2 hours. Most teams see margin data within a day of connecting.
Why can't billing systems just pass through provider costs?
Consider a simple example: Customer uses your AI feature for image analysis.
Usage events reported: 490 tokens
Revenue collected: $50
Actual provider bill: $85 (8,539 tokens billed — 18x multiplier for images)
Margin: -$35 (unprofitable)
The 18x vision token discrepancy is documented in OpenAI community forums. Billing systems see the $50 revenue. Cost visibility tracks the $85 cost and explains why it varied from expectations.
The Profitability View
Where billing provides invoices, cost visibility provides this:
| Customer | Plan | Revenue | Cost | Margin | % |
|---|---|---|---|---|---|
| Acme Corp | Pro | $2,400 | $890 | $1,510 | 63% |
| Beta Inc | Pro | $2,400 | $2,150 | $250 | 10% |
| Gamma LLC | Starter | $400 | $620 | -$220 | -55% |
| Delta Co | Scale | $8,000 | $3,200 | $4,800 | 60% |
Beta Inc and Gamma LLC pay on time. Billing systems show healthy revenue. But one customer has 10% margins and one is actively losing money.
This view requires joining revenue data with cost attribution. Neither billing infrastructure nor cost visibility alone provides it.
Choosing Your Stack
For billing infrastructure, the market has consolidated around a few strong options:
- Stripe: Default choice for most startups. With Metronome now integrated, adds usage-based metering at scale.
- Orb: Better fit for AI-native platforms with complex usage metrics and high event volumes.
- Chargebee/Zuora: Enterprise-focused with broader subscription management beyond pure usage-based.
For cost visibility, the right choice depends on your infrastructure:
- CloudZero: Best for engineering-led organizations that need business-dimension cost tracking (cost per customer, per feature). Requires tagging discipline.
- Vantage: Best for multi-cloud teams wanting immediate visibility without extensive setup. Native AI provider integrations.
- Kubecost: Best for Kubernetes-heavy workloads, especially with GPU optimization needs.
- Bear Lumen: Best for teams specifically needing per-customer AI API cost attribution alongside margin analysis.
The FinOps Foundation recommends a centralized hub-and-spoke model: route AI workloads through monitored proxies with API keys tied to specific use cases. This enables accurate cost tracking regardless of which visibility tool you choose.
Using Stripe, Metronome, or Orb? Bear Lumen adds the cost visibility layer—cost calculation, customer attribution, and margin analysis—without building it yourself. Request early access to see your margins, not just your revenue.
Related: Multi-Provider AI Billing | Forecastability in AI Billing | The Power User Problem