Question 1

What does a typical RAG application cost per month in 2026?

Accepted Answer

For 1M documents, 10k queries/day, with reranker on: about $40 vector DB + $30 reranker + $90 LLM generation = $160/month total. Add $15 one-time embedding pass for the corpus. Without reranker, drop to $130/month. The calculator builds this stack-by-stack.

Question 2

How do I split RAG cost between embedding, vector DB, and generation?

Accepted Answer

For a typical knowledge-base RAG: embedding 5% one-time, vector DB 25% recurring, generation 60% recurring, reranker 10% if used. Generation dominates at high query volume; vector DB dominates at large corpus + low query. The calculator shows the split for your specific scale.

Question 3

Should I use a reranker in my RAG pipeline?

Accepted Answer

Yes if precision matters more than 200ms latency. Cohere Rerank 3 at $1/1k searches typically improves answer quality 15–30% by re-scoring 50 retrieved chunks down to top-5. For chat UX, the latency tax is worth it. For batch RAG (overnight reports), always rerank.

Question 4

How many chunks should I retrieve per RAG query?

Accepted Answer

Retrieve 20–50 chunks, rerank down to 5–10, pass to the LLM. Retrieving fewer than 10 chunks risks missing the answer; passing more than 10 to the LLM bloats input cost and dilutes attention. The calculator multiplies retrieved-chunks × tokens-per-chunk into generation cost.

Question 5

Does prompt caching help RAG cost a lot?

Accepted Answer

Massively. If your system prompt + few-shot examples are stable (typical 4–8k tokens), prompt cache hits cut Anthropic input cost 90%, OpenAI 50%, Google 75%. Real-world steady-state RAG cache hit rate is 70–85%. Toggle the slider to see your bill drop.

Question 6

When is RAG cheaper than fine-tuning?

Accepted Answer

Below 10M monthly tokens or when knowledge changes weekly, RAG wins. Above 50M tokens with stable knowledge that fits the prompt, fine-tuning a smaller model often beats RAG total cost by 2–5×. Most production apps stay on RAG due to operational simplicity.

Component	Provider	Monthly
Embed (one-time amortized)	OpenAI 3-small	$5
Vector DB (10M chunks)	Pinecone Serverless	$40
Reranker (300k queries)	Cohere Rerank 3	$30
Generation (Sonnet 4.6)	Anthropic	$90
Generation w/ 70% cache hit	Anthropic	$28
Total with cache + rerank		$103 / mo

RAG Total Cost Calculator

Monthly cost breakdown

What this calculator does

Full RAG stack

Per-component breakdown

Reranker toggle

Prompt cache modeling

Per-query cost

Chunk strategy modeling

Quick comparison

How to use this calculator

Why use this calculator

Frequently Asked Questions