AITOT

Calculator

LLM Fine-tuning Cost Calculator

Compute fine-tuning cost — training tokens × per-million rate, plus the per-token inference uplift on the resulting custom model.

Pricing data refreshed:

The AITOT LLM Fine-tuning Cost calculator estimates training cost + inference uplift for fine-tuned models on OpenAI (GPT-4o, GPT-4o-mini, o3), Anthropic Claude (invite-only), Google Vertex (Gemini tuning), and Together AI (LoRA fine-tuning for Llama 4, Qwen, Mistral).

Training cost = training tokens × epochs × per-million training rate. OpenAI GPT-4o-mini: $3/M training tokens. Together Llama 4 70B LoRA: $1.20/M. Most production fine-tunes run $50–$500 in one-time training. Then inference costs 1.5–3× the base model per token forever after.

Toggle epochs (default 3) and inference volume to model year-1 total. Below 10M monthly tokens, fine-tuning rarely beats well-crafted prompts. Above 100M with stable task definition, fine-tuned smaller model beats larger model with prompts by 3–10× total cost.

Year 1 total · cheapest

Fireworks · Llama 4 8B

$248

ProviderBase modelTraining costMonthly inferenceYear 1 total
FireworksLlama 4 8B

≤16B LoRA SFT tier

$8$20$248
CohereCommand R$30$48$606
OpenAIGPT-4o mini

Stale — OpenAI moved to per-hour training 2026-05; verify pending

$45$48$621
MistralMistral Small 3

$2/mo hosting per deployed adapter

$45$58$741
FireworksLlama 4 70B

16-80B LoRA SFT tier

$45$90$1,125
TogetherLlama 3.3 70B

Legacy v3 line; verify pending 2026-05-18 — no longer top-listed on Together pricing

$75$88$1,131
OpenAIGPT-5 mini

Stale — OpenAI moved to per-hour training 2026-05; verify pending

$60$96$1,212
TogetherLlama 4 Maverick (LoRA SFT)

$16 minimum charge; Maverick = ~70B-class

$120$120$1,560
OpenAIo3-mini

Stale — OpenAI moved to per-hour training 2026-05; verify pending

$75$136$1,707
TogetherLlama 4 Maverick (LoRA DPO)$300$120$1,740
AWS BedrockClaude Haiku 4.5 (custom)

Provisioned throughput required

$120$303$3,756
MistralMistral Large 2$135$564$6,903
OpenAIGPT-4o

Stale — OpenAI moved to per-hour training 2026-05; verify pending

$375$600$7,575

Training cost = tokens × epochs × per-million rate. Inference uses the fine-tuned model's uplifted per-token rate, which is always higher than the base model. Year-1 total = one-time training + 12 months of inference.

What this calculator does

Multi-provider

OpenAI fine-tuning, Together LoRA, Vertex tuning, plus self-host estimates.

Training + inference split

One-time training cost separated from monthly inference uplift.

Epoch slider

Default 3 epochs. Calculator multiplies training tokens × epochs for billed total.

Inference uplift modeling

Fine-tuned models cost 1.5–3× base. Calculator captures the year-1 impact.

Year-1 total

One-time training + 12 months inference = single budget number.

LoRA vs full fine-tuning

LoRA on Together is 10× cheaper than full fine-tuning on OpenAI.

Quick comparison

Fine-tuning cost on 5M training tokens, 50M inference / month, 3 epochs

ProviderTraining CostInference UpliftYear-1 Total
Together Llama 4 70B (LoRA)$18+$50/mo$618
OpenAI GPT-4o-mini$45+$120/mo$1,485
Google Gemini 2.5 Flash tune$75+$150/mo$1,875
OpenAI GPT-4o$375+$1,200/mo$14,775
OpenAI o3$2,250+$3,500/mo$44,250

Year-1 = training + 12 × monthly inference uplift. Inference uplift is cost above the base model.

How to use this calculator

Calculate training + inference uplift cost for fine-tuned LLMs.

  1. 1

    Enter training tokens

    Total tokens in your training dataset. 100 examples × 500 tokens = 50k tokens.

  2. 2

    Set epochs

    Default 3. More than 4 typically overfits. Calculator bills training × epochs.

  3. 3

    Estimate monthly inference

    How many tokens will the fine-tuned model serve per month? Drives the uplift cost.

  4. 4

    Compare providers

    LoRA on Together is cheapest; OpenAI full fine-tune is highest. Calculator shows year-1 totals.

Why use this calculator

  • 5 providers refreshed monthly
  • Training + inference split
  • LoRA vs full FT comparison
  • Year-1 budget number
  • Epoch + token modeling
  • No login required

Frequently Asked Questions

How much does it cost to fine-tune an LLM in 2026?+
Training cost: 1M training tokens × per-million-token training rate. OpenAI GPT-4o-mini fine-tuning: $3/M training tokens. Anthropic Claude Haiku fine-tuning (limited access): $5/M. Together AI Llama 4 70B LoRA: $1.20/M. Most production fine-tunes run $50–$500 in training cost.
What is the inference uplift for fine-tuned models?+
Fine-tuned models cost 1.5–3× more per token than the base model at inference. OpenAI GPT-4o-mini base: $0.15/M input. Fine-tuned: $0.30/M input. Plan for this — fine-tuning a high-volume workload only saves money if you also switch to a smaller model class.
When does fine-tuning save money vs prompt engineering?+
Break-even is around 10M monthly tokens. Below that, fine-tuning rarely beats well-crafted few-shot prompts. Above 100M monthly tokens with stable task definition, fine-tuned smaller model often beats larger model with prompts by 3–10× total cost.
How many epochs should I fine-tune for?+
Default to 3 epochs for instruction-style data and 1–2 epochs for completion data. More than 4 epochs typically overfits. Calculator multiplies training tokens × epochs to get the billed training token count — small bumps in epochs significantly add cost.
Can I fine-tune Claude or only OpenAI models?+
OpenAI: GPT-4o, GPT-4o-mini, and o3 fine-tuning are GA. Anthropic Claude fine-tuning is invite-only in 2026. Google Vertex offers Gemini tuning. Together AI offers LoRA fine-tuning for all major open-weight models. Self-hosted Axolotl + Modal is the cheapest path for open weights.
How much training data do I need to fine-tune effectively?+
50–500 high-quality examples for style/format adaptation. 1,000–10,000 for domain knowledge. Above 10,000 examples, gains plateau. Quality beats quantity — 100 hand-curated examples often outperform 5,000 noisy ones. Token count matters for billing, not for quality.