Question 1

How much does it cost to fine-tune an LLM in 2026?

Accepted Answer

Training cost: 1M training tokens × per-million-token training rate. OpenAI GPT-4o-mini fine-tuning: $3/M training tokens. Anthropic Claude Haiku fine-tuning (limited access): $5/M. Together AI Llama 4 70B LoRA: $1.20/M. Most production fine-tunes run $50–$500 in training cost.

Question 2

What is the inference uplift for fine-tuned models?

Accepted Answer

Fine-tuned models cost 1.5–3× more per token than the base model at inference. OpenAI GPT-4o-mini base: $0.15/M input. Fine-tuned: $0.30/M input. Plan for this — fine-tuning a high-volume workload only saves money if you also switch to a smaller model class.

Question 3

When does fine-tuning save money vs prompt engineering?

Accepted Answer

Break-even is around 10M monthly tokens. Below that, fine-tuning rarely beats well-crafted few-shot prompts. Above 100M monthly tokens with stable task definition, fine-tuned smaller model often beats larger model with prompts by 3–10× total cost.

Question 4

How many epochs should I fine-tune for?

Accepted Answer

Default to 3 epochs for instruction-style data and 1–2 epochs for completion data. More than 4 epochs typically overfits. Calculator multiplies training tokens × epochs to get the billed training token count — small bumps in epochs significantly add cost.

Question 5

Can I fine-tune Claude or only OpenAI models?

Accepted Answer

OpenAI: GPT-4o, GPT-4o-mini, and o3 fine-tuning are GA. Anthropic Claude fine-tuning is invite-only in 2026. Google Vertex offers Gemini tuning. Together AI offers LoRA fine-tuning for all major open-weight models. Self-hosted Axolotl + Modal is the cheapest path for open weights.

Question 6

How much training data do I need to fine-tune effectively?

Accepted Answer

50–500 high-quality examples for style/format adaptation. 1,000–10,000 for domain knowledge. Above 10,000 examples, gains plateau. Quality beats quantity — 100 hand-curated examples often outperform 5,000 noisy ones. Token count matters for billing, not for quality.

Provider	Base model	Training cost	Monthly inference	Year 1 total
Fireworks	Llama 4 8B ≤16B LoRA SFT tier	$8	$20	$248
Cohere	Command R	$30	$48	$606
OpenAI	GPT-4o mini Stale — OpenAI moved to per-hour training 2026-05; verify pending	$45	$48	$621
Mistral	Mistral Small 3 $2/mo hosting per deployed adapter	$45	$58	$741
Fireworks	Llama 4 70B 16-80B LoRA SFT tier	$45	$90	$1,125
Together	Llama 3.3 70B Legacy v3 line; verify pending 2026-05-18 — no longer top-listed on Together pricing	$75	$88	$1,131
OpenAI	GPT-5 mini Stale — OpenAI moved to per-hour training 2026-05; verify pending	$60	$96	$1,212
Together	Llama 4 Maverick (LoRA SFT) $16 minimum charge; Maverick = ~70B-class	$120	$120	$1,560
OpenAI	o3-mini Stale — OpenAI moved to per-hour training 2026-05; verify pending	$75	$136	$1,707
Together	Llama 4 Maverick (LoRA DPO)	$300	$120	$1,740
AWS Bedrock	Claude Haiku 4.5 (custom) Provisioned throughput required	$120	$303	$3,756
Mistral	Mistral Large 2	$135	$564	$6,903
OpenAI	GPT-4o Stale — OpenAI moved to per-hour training 2026-05; verify pending	$375	$600	$7,575

Provider	Training Cost	Inference Uplift	Year-1 Total
Together Llama 4 70B (LoRA)	$18	+$50/mo	$618
OpenAI GPT-4o-mini	$45	+$120/mo	$1,485
Google Gemini 2.5 Flash tune	$75	+$150/mo	$1,875
OpenAI GPT-4o	$375	+$1,200/mo	$14,775
OpenAI o3	$2,250	+$3,500/mo	$44,250

LLM Fine-tuning Cost Calculator

What this calculator does

Multi-provider

Training + inference split

Epoch slider

Inference uplift modeling

Year-1 total

LoRA vs full fine-tuning

Quick comparison

How to use this calculator

Why use this calculator

Frequently Asked Questions