You see the API pricing: GPT-4o at $2.50/$10 per million tokens (roughly ¾ of a word each). You calculate your monthly OpenAI bill at $10,000. You think that's your AI cost.
Your actual spend includes $15,000-20,000 more.
The AI Cost Iceberg
| Cost Category | % of Total | Monthly $ (if API = $10K) |
|---|---|---|
| API Costs (OpenAI/Anthropic) | 30-40% | $10,000 (visible) |
| Infrastructure (AWS/GCP/Azure) | 40-50% | $12,000-15,000 |
| Monitoring & Observability | 5-10% | $1,500-3,000 |
| Caching & Storage | 3-5% | $900-1,500 |
| Failed Requests & Retries | 2-3% | $600-900 |
| Development & Testing | 3-5% | $900-1,500 |
| TOTAL | 100% | $25,900-32,900 |
A $10K/month API bill becomes $26K-33K in total cost. This pattern explains why Cursor's AWS bill doubled from $6.2M to $12.6M/month even as they optimized API usage.
What's in the Hidden 70%
Infrastructure (40-50%): Compute for application servers and background workers. Databases for usage tracking and vector search. Storage for conversation history. Networking, load balancers, and container orchestration.
Monitoring (5-10%): LLM observability tools (LangSmith, Helicone), application performance (Datadog, Sentry), and product analytics. At 10,000 users, monitoring typically runs $2,000-3,500/month.
Caching & Storage (3-5%): Prompt caching has write costs—Anthropic charges 1.25x input price to write to cache. You need 8+ cache hits to break even. Vector databases for semantic search add $200-1,000/month at scale.
Failed Requests (2-3%): An 8% error rate with 3 retries averages to 24% wasted API spend. On a $10K bill, that's $2,400/month most companies don't track.
Development (3-5%): Local testing, CI/CD integration tests, staging environments, and prompt iteration. Typically $1,400-5,300/month depending on team size.
Cursor's Cost Discovery
Cursor reached $500M ARR and discovered their AWS costs were 79% of their Anthropic costs.
| Month | AWS Bill | Anthropic (est.) | AWS as % of API |
|---|---|---|---|
| May 2025 | $6.2M | ~$8M | 77% |
| June 2025 | $12.6M | ~$16M | 79% |
Why so high? Massive conversation history storage (200K token context windows), real-time collaboration infrastructure, code indexing, and distributed caching.
Their assumed economics: 64% gross margin based on API costs alone.
Their actual economics: 36% gross margin including infrastructure.
The 28-point margin difference led to four repricing cycles in 12 months, usage limits, a $200/month Ultra tier, and June 2025 pricing adjustments.
Real-World TCO: AI Chatbot Example
Product: Customer support chatbot with 10,000 users and 500K conversations/month on GPT-4o.
| Category | Cost | % of Total |
|---|---|---|
| API Costs | $5,000 | 30% |
| Infrastructure | $1,850 | 47% |
| Monitoring | $1,100 | 12% |
| Storage | $600 | 8% |
| Failed Requests | $700 | 3% |
| TOTAL | $8,250 | 100% |
True cost per customer: $0.83/month
At a $15/month price point, actual gross margin is 94.5%. If you only tracked API costs ($0.50/user), you'd calculate 96.7% margin—a 2-point error.
Why 2 points matters at scale:
- At $1M ARR: $20K/year difference
- At $10M ARR: $200K/year difference
- At $100M ARR: $2M/year difference
Key Takeaways
- API costs are 30-40% of total spend—infrastructure, monitoring, and overhead add 60-70% more
- Cursor's AWS bill was 79% of Anthropic costs—infrastructure often exceeds API spend at scale
- Failed requests waste 2-3% of budget—most companies don't track this
- Margin errors compound—a 2-point error becomes $2M/year at $100M ARR
Complete cost visibility enables accurate margin calculations.
Bear Lumen tracks API costs, infrastructure allocation, per-customer margins, and failed request overhead automatically.
Join our waitlist for early access.
Related Reading
- Usage Variance in AI Products — Per-customer cost distribution
- GitHub Copilot Unit Economics — AI margin case study
- AI API Costs 2025 — Model pricing comparison