1. Build the dataset first
Collect real input/output pairs that look exactly like the job. Quality and consistency beat quantity — a few hundred clean, well-formatted examples beat thousands of messy ones. Most of the work of fine-tuning is here, not in the training.
// Most hosted APIs want JSONL — one example per line:
{"messages":[{"role":"user","content":"Rewrite this for our house voice: ..."},{"role":"assistant","content":"..."}]}
{"messages":[{"role":"user","content":"Rewrite this for our house voice: ..."},{"role":"assistant","content":"..."}]}2. Hold some examples back
Split off a slice of your data the model never trains on. After fine-tuning, you test against that held-out slice to see if it actually improved — instead of fooling yourself. If you cannot measure better, you cannot claim better.
3. Pick the route: hosted API or local LoRA
A hosted fine-tuning API uploads your dataset, runs the training on the provider’s GPUs, and gives you a private model to call — no GPU of your own needed. Or, with an open model, you fine-tune locally using a LoRA / QLoRA approach, which only trains a small set of extra weights so it fits on a single consumer GPU.
# Local LoRA path uses the open-source ecosystem, roughly:
pip install transformers peft trl bitsandbytes datasets
# then a small training script loads a base model + your JSONL,
# trains a LoRA adapter, and saves it next to the base weights.
4. Train, then evaluate honestly
Run the job, then compare the fine-tuned model against your held-out examples — and against a plain prompt baseline. If a good prompt already matches it, you did not need to fine-tune. Keep the result only if it clearly wins.
5. Ship it behind your app
Call your fine-tuned model the same way you call any model — from your backend, with the key kept server-side. Watch quality in the real world, and be ready to re-do the dataset as the task drifts.