I built a tool to convert files into structured data for LLMs and automation

Hey HN,

I recently built File Decomposer — a tool that takes files like PDFs, DOCX, EPUB, HTML, etc., and converts them into structured Markdown or JSON.

The main goal is to help devs, AI engineers, and indie hackers who deal with unstructured documents and want clean, usable data to:

feed into LLMs (for RAG/chatbots)

create searchable knowledge bases

automate workflows

or just stop wasting time copying/pasting from PDFs

It handles:

Large files (multi-hundred-page PDFs, technical docs, books)

Structure preservation (headings, lists, sections)

JSON formatting that’s easy to parse or feed into your pipelines

Built this after wasting too much time cleaning up documents to get usable data into my AI projects. Happy to answer questions, and very open to feedback or edge cases you'd like supported.

You can try it here: https://filedecomposer.com/

Would love to hear how others are tackling this problem, or if there are ways I can make this tool more useful for your workflows.

I built a tool to convert files into structured data for LLMs and automation

Comments (1)