Fine-tuning

Fine-tune a model — only when you actually need to.

Here is the honest truth most courses won’t tell you: the vast majority of builders never need to train or fine-tune a model. Good prompting plus retrieval (RAG) solves almost every real problem. Fine-tuning is a narrow tool for a narrow job — a specific style or task, backed by a real dataset. This page shows you the difference, when fine-tuning is genuinely worth it, and how it works — so you don’t burn time and Naira on training you didn’t need.

Start with how AI works Run a model locally

Read this first

Do RAG first. Seriously.

If you only remember one thing from this page: before you even think about fine-tuning, exhaust prompting and retrieval. Most people who think they need a custom model actually need to (1) write a sharper prompt and (2) feed the model their own documents at question time with RAG. Both are cheaper, faster, easier to change, and you can do them today on a basic laptop. Fine-tuning is the last lever, not the first.

Learn prompting & RAG

Four different things people confuse

“Training,” “fine-tuning,” and “RAG” get used interchangeably and they are not the same. Climb this ladder from cheapest to most expensive — and stop as soon as something works.

Prompting

Cheapest · instant · try this first

You tell a model what to do in plain words — a clear instruction, a few examples, the format you want back. No training, no dataset, no GPU. Modern models are strong enough that good prompting solves most problems on its own. If you have not exhausted prompting, you are not ready to fine-tune.

Retrieval (RAG)

Cheap · gives the model your facts

RAG means Retrieval-Augmented Generation: you store your own documents, search them at question time, and paste the most relevant chunks into the prompt. This is how you make a model answer from your handbook, your prices, your policies — without changing the model at all. It is the right fix for “the model doesn’t know my data.”

Fine-tuning

More effort · changes the model’s behaviour

You take an existing model and continue training it on a focused dataset of your own examples — usually a few hundred to a few thousand input/output pairs. It nudges style, tone, and format, and can teach a narrow, repeatable task. It does not reliably teach the model new facts; that is what RAG is for.

Pretraining

Almost certainly not you

Building a model from scratch on trillions of words. This costs millions of dollars and needs huge clusters of GPUs. Effectively no individual builder or small startup does this — you build on top of models that big labs already pretrained. We mention it only so the words don’t confuse you.

When fine-tuning is actually worth it

All of these should be true — not just one. If they are not, stay on prompting and RAG.

Prompting and RAG have genuinely been tried and still fall short — not just “I read about fine-tuning and it sounds cool.”

You need a very specific, consistent style, tone, or output format that you cannot reliably get from instructions alone.

The task is narrow and repeatable — classify this, rewrite into that exact format, respond in this house voice every time.

You have a real dataset: clean, labelled input/output examples that look like the job you want done. No dataset, no fine-tune.

A smaller, cheaper model fine-tuned on your task would beat paying for a big model on every single call — and you have the volume to make that pay off.

When not to bother

If any of these is your reason, fine-tuning is the wrong tool. Here is what to do instead.

You want the model to “know” your documents, prices, or policies — use RAG, not fine-tuning.

You have only a handful of examples — a few dozen rows will not move a model; you will waste time and money.

You haven’t written a serious prompt yet — fix the prompt first; it is free.

You want it to be “smarter” in general — fine-tuning narrows a model to a task, it does not raise its overall intelligence.

The underlying need keeps changing — you would be retraining constantly; a prompt or RAG index is far easier to update.

The most common mistake: trying to fine-tune facts into a model. Models learn behaviour from fine-tuning, not reliable knowledge. If you want a model to answer from your prices, handbook, or policies, that is a RAG job — keep the model as-is and feed it the right text at question time.

If you do need it: the rough path

Two real routes — a hosted fine-tuning API (no GPU of your own) or local LoRA-style fine-tuning on an open model. The steps are the same; the heavy work is the dataset, not the training.

1. Build the dataset first

Collect real input/output pairs that look exactly like the job. Quality and consistency beat quantity — a few hundred clean, well-formatted examples beat thousands of messy ones. Most of the work of fine-tuning is here, not in the training.

// Most hosted APIs want JSONL — one example per line:
{"messages":[{"role":"user","content":"Rewrite this for our house voice: ..."},{"role":"assistant","content":"..."}]}
{"messages":[{"role":"user","content":"Rewrite this for our house voice: ..."},{"role":"assistant","content":"..."}]}

2. Hold some examples back

Split off a slice of your data the model never trains on. After fine-tuning, you test against that held-out slice to see if it actually improved — instead of fooling yourself. If you cannot measure better, you cannot claim better.

3. Pick the route: hosted API or local LoRA

A hosted fine-tuning API uploads your dataset, runs the training on the provider’s GPUs, and gives you a private model to call — no GPU of your own needed. Or, with an open model, you fine-tune locally using a LoRA / QLoRA approach, which only trains a small set of extra weights so it fits on a single consumer GPU.

# Local LoRA path uses the open-source ecosystem, roughly:
pip install transformers peft trl bitsandbytes datasets
# then a small training script loads a base model + your JSONL,
# trains a LoRA adapter, and saves it next to the base weights.

4. Train, then evaluate honestly

Run the job, then compare the fine-tuned model against your held-out examples — and against a plain prompt baseline. If a good prompt already matches it, you did not need to fine-tune. Keep the result only if it clearly wins.

5. Ship it behind your app

Call your fine-tuned model the same way you call any model — from your backend, with the key kept server-side. Watch quality in the real world, and be ready to re-do the dataset as the task drifts.

You almost never need to buy a GPU for this. A hosted fine-tuning API runs the training for you, and a LoRA fine-tune of an open model fits on a single rented cloud GPU for a few hours. Renting beats buying for nearly everyone — see what an hour of GPU time actually costs in Naira.

See Naira-priced cloud compute

Bottom line

Reach for the simplest tool that works.

Prompt first. Add RAG when the model needs your facts. Run a model locally if you want privacy or to work offline. Fine-tune only when you have a narrow task, a real dataset, and you have proven prompting and RAG fall short. That order saves you money, time, and a lot of frustration — and it is how serious builders actually work.

Run a model locally Back to learning AI