Show HN: Fine-tuned Llama 3.2 3B to match 70B models for local transcripts

4 phantompeace 0 9/1/2025, 6:34:43 PM bilawal.net ↗
I wrote a small local tool to transcribe audio notes (Whisper/Parakeet). Code: https://github.com/bilawalriaz/lazy-notes

I wanted to process raw transcripts locally without OpenRouter. Llama 3.2 3B with a prompt was decent but incomplete, so I tried SFT. I fine-tuned Llama 3.2 3B to clean/analyze dictation and emit structured JSON (title, tags, entities, dates, actions).

Data: 13 real memos → Kimi K2 gold JSON → ~40k synthetic + gold; keys canonicalized. Chutes.ai (5k req/day).

Training: RTX 4090 24GB, ~4h, LoRA (r=128, α=128, dropout=0.05), max seq 2048, bs=16, lr=5e-5, cosine, Unsloth. On 2070 Super 8GB it was ~8h.

Inference: merged to GGUF, Q4_K_M (llama.cpp), runs in LM Studio.

Evals (100-sample, scored by GLM 4.5 FP8): overall 5.35 (base 3B) → 8.55 (fine-tuned); completeness 4.12 → 7.62; factual 5.24 → 8.57.

Head-to-head (10 samples): ~8.40 vs Hermes-70B 8.18, Mistral-Small-24B 7.90, Gemma-3-12B 7.76, Qwen3-14B 7.62. Teacher Kimi K2 ~8.82.

Why: task specialization + JSON canonicalization reduces variance; the model learns the exact structure/fields.

Lessons: train on completions only; synthetic is fine for narrow tasks; Llama is straightforward to train. Dataset pipeline + training script + evals: https://github.com/bilawalriaz/local-notes-transcribe-llm

Comments (0)

No comments yet