Calculator
RAG Total Cost Calculator
All-in-one RAG bill — embedding pass + vector DB + reranker + LLM generation. Plug in document count and query volume to see the full monthly stack.
Pricing data refreshed:
The AITOT RAG Total Cost calculator estimates monthly cost for a full RAG application stack — embedding (one-time + recurring), vector DB storage + queries, optional reranker, and LLM generation. Inputs: corpus size, chunks per doc, queries per day, chunks retrieved per query, generation tokens.
A typical knowledge-base RAG with 1M docs, 10k queries/day, and reranker on costs about $160/month: $40 vector DB + $30 reranker + $90 LLM generation. Generation dominates at high query volume; vector DB dominates at large corpus + low query. The calculator shows the split for your specific scale.
Toggle prompt caching to cut generation cost 50–90% — for stable system prompts (typical 4–8k tokens), real-world steady-state cache hit rate is 70–85%. Reranker on Cohere Rerank 3 at $1/1k searches improves answer quality 15–30% by re-scoring 50 retrieved chunks down to top-5.
Total monthly
$913
One-time embed cost
$6
Per query
$0.0061
Year 1 total
$10,956
Monthly cost breakdown
RAG bill = embedding query + vector DB storage/retrieval + reranker (optional) + LLM generation. Generation typically dominates above 50k queries/day. At MVP scale, vector DB minimums dominate.
What this calculator does
Full RAG stack
Embedding + vector DB + reranker + generation all in one bill.
Per-component breakdown
See exactly which line item is the biggest contributor at your scale.
Reranker toggle
Cohere Rerank 3 modeling. Adds $0.001/query but improves answer quality 15–30%.
Prompt cache modeling
Stable system prompts get 70–85% cache hits — toggle to see real cost.
Per-query cost
Surfaces $ per RAG query — critical for unit economics and pricing the product.
Chunk strategy modeling
Toggle chunks per doc and chunks retrieved per query to optimize cost.
Quick comparison
Monthly RAG cost at 1M docs, 10k queries/day (typical knowledge-base app)
| Component | Provider | Monthly |
|---|---|---|
| Embed (one-time amortized) | OpenAI 3-small | $5 |
| Vector DB (10M chunks) | Pinecone Serverless | $40 |
| Reranker (300k queries) | Cohere Rerank 3 | $30 |
| Generation (Sonnet 4.6) | Anthropic | $90 |
| Generation w/ 70% cache hit | Anthropic | $28 |
| Total with cache + rerank | $103 / mo |
Without prompt caching, generation alone is $90+. Cache is the single biggest lever.
How to use this calculator
Calculate full RAG stack monthly cost — embed + vector DB + reranker + generation.
- 1
Enter corpus + chunks
Documents × chunks per doc. Typical: 1 doc = 5–20 chunks at 500 tokens each.
- 2
Set query volume
Queries per day. Most production apps cache 30–50% of queries before reaching the LLM.
- 3
Toggle reranker
Cohere Rerank 3 adds $0.001/query but improves quality 15–30%. Usually worth it.
- 4
Set prompt cache hit rate
Stable system prompts hit 70–85%. Cuts generation cost 50–90% on Anthropic.
Why use this calculator
- ✓Full stack — not just LLM piece
- ✓Reranker toggle
- ✓Prompt cache modeling
- ✓Per-query unit economics
- ✓9 vector DB + 22 LLM providers
- ✓No login required