Show HN: Wrote a small tool that turns PDFs and docs into fine-tuning datasets

1 FineTuner42 1 8/14/2025, 11:34:48 AM github.com ↗
I previously posted a terminal tool that could generate fine-tuning datasets from real-world data using deep research. One of the most common requests was: “Can it work with local resources instead of only going online?”

Over the weekend, I built a separate version that does exactly that:

Point it to a local file (PDF, DOCX, JPG, TXT)

Describe the dataset you want

It extracts text → finds relevant parts via semantic search → applies your instructions through a generated schema → outputs a clean dataset.

Comments (1)

roscas · 22d ago
Can it be used to be 100% offline with my Ollama models?

If yes, amazing, I might use it.

If no, thanks but I won't use it because it makes no sense to send your PDF/DOC to an online service to be used to feed their AI models.