Skip to main content
Back to Blog
technical5 min read

The True Cost of Running AI APIs: Complete 2025 Guide

Compare GPT-4, Claude, and Gemini pricing with real margin calculations. See $/token costs and profitability analysis for AI SaaS.

BLT

Bear Lumen Team

AI Billing Experts

#ai-costs#pricing#gpt-4#claude#gemini#unit-economics

A single power user generating 1M tokens per month can compress your margin from 97% to 45%—even on GPT-4 Turbo. This guide shows how to calculate whether your pricing model can handle that variance.


Quick Comparison Table

ModelInput CostOutput CostContext WindowBest For
GPT-4 Turbo$10/1M tokens$30/1M tokens128KGeneral purpose, reasoning
GPT-4o$2.50/1M tokens$10/1M tokens128KCost-optimized GPT-4
Claude 3.5 Sonnet$3/1M tokens$15/1M tokens200KLong documents, code
Gemini 1.5 Pro$1.25/1M tokens$5/1M tokens1MCost-sensitive, large context
Claude 3 Haiku$0.25/1M tokens$1.25/1M tokens200KHigh-volume, simple tasks

Prices as of January 2025. A token is roughly ¾ of a word.


What is Cost-to-Serve?

Cost-to-serve is the total infrastructure cost required to deliver your product to one customer, including API costs, compute, storage, and overhead.

Formula:

Cost-to-Serve = (AI API Costs + Infrastructure + Overhead) / Active Users

For AI-powered SaaS products, AI API costs typically represent 60-80% of total cost-to-serve.


Real-World Example: AI Writing Assistant

Let's calculate the economics for a hypothetical AI writing assistant:

Usage Assumptions

  • Average user: 50,000 tokens/month (input + output combined)
  • Token split: 70% input, 30% output
  • Price: $29/month subscription

Cost Breakdown by Model

ModelInput CostOutput CostTotal AI CostMarginMargin %
GPT-4 Turbo$0.35$0.45$0.80$28.2097.2%
GPT-4o$0.09$0.15$0.24$28.7699.2%
Claude 3.5 Sonnet$0.11$0.23$0.34$28.6698.8%
Claude 3 Haiku$0.009$0.019$0.03$28.9799.9%

At average usage, every model yields healthy margins. The problem emerges with variance.


High-Usage Customers: Where Margins Compress

Not all users are average. Consider a power user generating 1,000,000 tokens/month (20x average) while paying the same $29/month:

GPT-4 Turbo Costs for Power User:

ComponentCalculationCost
Input700,000 × $10/1M$7.00
Output300,000 × $30/1M$9.00
Total$16.00

Margin: $29 - $16 = $13 (44.8%)

Still profitable, but margin compressed from 97% to 45%. Bear Lumen shows this variance by customer, so you see it before margins compress across your base.

For detailed analysis of usage variance affecting GitHub Copilot, Cursor, and ChatGPT Pro, see: Usage Variance in AI Products.

With Usage-Based Tiers:

If you implement overage pricing:

  • First 50k tokens: Included in $29 base
  • Next 950k tokens: $0.015/1k tokens extra

Power user revenue: $29 + ($0.015 × 950) = $43.25

Margin: $43.25 - $16 = $27.25 (63%)

Better alignment between value delivered and cost incurred.


GitHub Copilot: A Real-World Example

GitHub Copilot operates at negative margins on most users:

  • Price: $10/month (or $19 for business)
  • Average cost: $20-$30/month per user
  • Result: Negative margins on average users

Why? Unlimited usage model combined with token-intensive code generation. Users generate millions of tokens monthly, and pricing hasn't adjusted since launch.

Pattern: Usage-based pricing aligns revenue with cost-to-serve across customer segments.

Related: From Flat-Rate to Usage-Based Pricing Migration


When to Switch Models

Use CaseRecommended ModelWhy
Quality-critical, premium customersGPT-4 TurboBest reasoning capability
High-volume, GPT-4 quality neededGPT-4o75% cost reduction vs Turbo
Large documents, code analysisClaude Sonnet200K context window
Simple tasks, classification, routingHaiku or GeminiLowest cost per token

The right model depends on the task. Many teams run multiple models, routing requests based on complexity.


Tracking Cost-to-Serve

Effective cost tracking requires three components:

  1. Instrument API calls — Record tokens consumed per request with user/customer attribution
  2. Aggregate by customer — Sum costs monthly, calculate cost per active user
  3. Compare to revenue — Identify which customers are margin-positive and which are compressing margins

Most teams discover 10-20% of customers generate 60-80% of API costs. Without per-customer visibility, this variance remains invisible until aggregate margins decline.


External Resources


Want to see cost-to-serve by customer? Join our early access program for margin visibility at the customer level.

Share this article