AITOT

Calculator

AI Token & Pricing Comparator

Estimate input/output token cost across OpenAI, Anthropic, Google, xAI, Mistral, and more — including prompt-cache savings.

Pricing data refreshed:

The AITOT Token & Pricing Comparator lets you compare per-token cost across 22 leading LLMs in 2026 — including OpenAI GPT-5, Claude Opus 4.7, Gemini 2.5 Pro, Llama 4 70B, DeepSeek V3, Mistral Large 2, and Amazon Nova. Plug in your average input and output tokens, hit calculate, and get per-request + monthly cost side-by-side.

Output tokens dominate most bills — they cost 3–5× input tokens on every major provider. The comparator sorts by total cost, not headline rate, so you see real-world impact. Prompt caching toggles cut input cost 60–90% on Anthropic and 50% on OpenAI when your system prompt is stable.

All pricing is sourced from official provider documentation and refreshed on the first of every month. Real bills land within 5–15% of these estimates — variance comes from caching, batching, region surcharges, and rate-limit headroom. No login required; results compute client-side.

Cheapest

Amazon · Nova Lite

$14.40

Per month

31 models
ProviderModelInput / 1MOutput / 1MPer requestPer month
AmazonNova Lite$0.06$0.24$0.0001$14.40
OpenAIGPT-5 nano$0.05$0.40$0.0002$20.00
GoogleGemini 2.5 Flash-Lite$0.10$0.40$0.0002$24.00
CohereCommand R$0.15$0.60$0.0004$36.00
MistralMistral Small 3$0.20$0.60$0.0004$40.00
DeepSeekDeepSeek V3$0.27$1.10$0.0007$65.60
OpenAIGPT-5.4 nano$0.20$1.25$0.0007$66.00
GoogleGemini 3.1 Flash-Lite$0.25$1.50$0.0008$80.00
OpenAIGPT-5 mini$0.25$2.00$0.001$100.00
Meta (Together)Llama 4 70B$0.88$0.88$0.0011$105.60
GoogleGemini 2.5 Flash$0.30$2.50$0.0012$124.00
DeepSeekDeepSeek R1$0.55$2.19$0.0013$131.60
xAIGrok 4 mini$0.60$2.40$0.0014$144.00
AmazonNova Pro$0.80$3.20$0.0019$192.00
OpenAIGPT-5.4 mini$0.75$4.50$0.0024$240.00
AnthropicClaude Haiku 4.5$1.00$5.00$0.0028$280.00
MistralMistral Large 2$2.00$6.00$0.004$400.00
Meta (Together)Llama 4 405B$3.50$3.50$0.0042$420.00
OpenAIo3$2.00$8.00$0.0048$480.00
GoogleGemini 3.5 Flash$1.50$9.00$0.0048$480.00
OpenAIGPT-5$1.25$10.00$0.005$500.00
GoogleGemini 2.5 Pro$1.25$10.00$0.005$500.00
CohereCommand R+$2.50$10.00$0.006$600.00
GoogleGemini 3.1 Pro$2.00$12.00$0.0064$640.00
OpenAIGPT-5.4$2.50$15.00$0.008$800.00
GoogleGemini 2.5 Pro (long ctx >200K)$2.50$15.00$0.008$800.00
AnthropicClaude Sonnet 4.6$3.00$15.00$0.0084$840.00
AnthropicClaude Opus 4.8$5.00$25.00$0.014$1,400.00
xAIGrok 4$5.00$25.00$0.014$1,400.00
OpenAIGPT-5.5$5.00$30.00$0.016$1,600.00
OpenAIGPT-5.5 Pro$30.00$180.00$0.096$9,600.00

Estimates only. Real bills can vary 5–15% depending on caching, batching, and region.

What this calculator does

22 LLMs in one table

GPT-5, Claude Opus 4.7, Gemini 2.5 Pro, Llama 4, DeepSeek V3, Mistral, Amazon Nova, Cohere Command — all comparable side-by-side.

Prompt cache modeling

Toggle cache hit rate from 0–100% to see Anthropic (10% on hit), OpenAI (50%), and Google (25%) effective rates.

Per-request + per-month

Calculator surfaces both per-request cost (UX-level) and monthly total (billing-level) for every model.

Workload presets

Chat, RAG, agent, summarization, and code-gen presets prefill realistic token splits — no guessing.

Output:input ratio

Most chat workloads are 4:1 input:output; code-gen 3:1; summarization 10:1. Slider lets you tune to your real ratio.

Export + share

Save scenarios to localStorage, export CSV, share via permalink with your team.

Quick comparison

Token pricing across the top LLMs (per 1M tokens)

ModelInputOutputBlended 50:50
Amazon Nova Lite$0.06$0.24$0.15
DeepSeek V3$0.27$1.10$0.69
Gemini 2.5 Flash$0.30$2.50$1.40
GPT-5 mini$0.40$1.60$1.00
Claude Haiku 4.5$0.80$4.00$2.40
Claude Sonnet 4.6$3.00$15.00$9.00
OpenAI GPT-5$10.00$30.00$20.00
Claude Opus 4.7$15.00$75.00$45.00

Output dominates most workloads. Use the calculator for your real input:output ratio.

How to use this calculator

Estimate input + output token cost for your workload across 22 LLMs in under 60 seconds.

  1. 1

    Pick a workload preset

    Choose chat, RAG, agent, summarization, or code-gen. The preset prefills realistic input/output token splits.

  2. 2

    Set requests per month

    Enter your expected monthly request volume. The calculator scales per-request cost to monthly total.

  3. 3

    Toggle prompt caching

    If your system prompt is stable, set cache hit rate to 50–80% to see effective input rates.

  4. 4

    Compare and pick

    Sort the result table by monthly cost. Pick the cheapest model that meets your quality bar.

Why use this calculator

  • Free forever — no login, no credit card
  • 22 LLMs refreshed monthly from official docs
  • Runs client-side — your inputs stay private
  • Workload presets, not generic averages
  • Includes prompt cache + batch discounts
  • Permalinks for team sharing

Frequently Asked Questions

How do I compare LLM token prices across providers in 2026?+
Plug your average input + output tokens per request and monthly request count into the comparator. It computes the per-request and per-month cost across 22 models — OpenAI GPT-5, Claude Opus 4.7, Gemini 2.5 Pro, Llama 4, Mistral, DeepSeek, Amazon Nova, and more. Sort by cheapest output rate since output dominates most production bills.
Which LLM has the cheapest output tokens in 2026?+
Amazon Nova Lite at $0.24 per million output tokens is the cheapest production-grade option. DeepSeek V3 at $1.10 and Gemini 2.5 Flash at $2.50 follow. Avoid Claude Opus 4.7 at $75/M output unless you specifically need its reasoning quality.
How much do prompt cache savings reduce my LLM bill?+
For RAG workloads with stable system prompts, prompt caching cuts input cost by 60–90% on Anthropic, 50% on OpenAI, and 75% on Google. Real-world steady-state cache hit rates are 50–70%. Toggle the "% input cached" slider to see your blended price.
Why does output cost more than input on every model?+
Output generation is sequential — each token requires a full forward pass through the model. Input tokens process in parallel. Output is also memory-bandwidth-bound on large models. Most providers price output 3–5× higher to reflect actual GPU time.
Does this calculator include the Batch API discount?+
No — the calculator shows real-time API pricing. For non-realtime workloads (overnight summarization, content moderation backfills), OpenAI and Anthropic both offer 50% off via Batch API. Subtract 50% from the displayed cost if your traffic can wait 24h.
Which model gives the best quality per dollar in 2026?+
Claude Sonnet 4.6 ($3 input, $15 output) and GPT-5 mini ($0.40 input, $1.60 output) lead price-performance benchmarks. For coding, Claude Sonnet 4.6 wins on SWE-bench. For general chat, Gemini 2.5 Flash is the cheap-but-capable default at $0.30/$2.50.