Show HN: Fine-tuned Llama 3.2 3B to match 70B models for local transcripts
I wanted to process raw transcripts locally without OpenRouter. Llama 3.2 3B with a prompt was decent but incomplete, so I tried SFT. I fine-tuned Llama 3.2 3B to clean/analyze dictation and emit structured JSON (title, tags, entities, dates, actions).
Data: 13 real memos → Kimi K2 gold JSON → ~40k synthetic + gold; keys canonicalized. Chutes.ai (5k req/day).
Training: RTX 4090 24GB, ~4h, LoRA (r=128, α=128, dropout=0.05), max seq 2048, bs=16, lr=5e-5, cosine, Unsloth. On 2070 Super 8GB it was ~8h.
Inference: merged to GGUF, Q4_K_M (llama.cpp), runs in LM Studio.
Evals (100-sample, scored by GLM 4.5 FP8): overall 5.35 (base 3B) → 8.55 (fine-tuned); completeness 4.12 → 7.62; factual 5.24 → 8.57.
Head-to-head (10 samples): ~8.40 vs Hermes-70B 8.18, Mistral-Small-24B 7.90, Gemma-3-12B 7.76, Qwen3-14B 7.62. Teacher Kimi K2 ~8.82.
Why: task specialization + JSON canonicalization reduces variance; the model learns the exact structure/fields.
Lessons: train on completions only; synthetic is fine for narrow tasks; Llama is straightforward to train. Dataset pipeline + training script + evals: https://github.com/bilawalriaz/local-notes-transcribe-llm
No comments yet