JSON formatting that’s easy to parse or feed into your pipelines
Built this after wasting too much time cleaning up documents to get usable data into my AI projects. Happy to answer questions, and very open to feedback or edge cases you'd like supported.
I recently built File Decomposer — a tool that takes files like PDFs, DOCX, EPUB, HTML, etc., and converts them into structured Markdown or JSON.
The main goal is to help devs, AI engineers, and indie hackers who deal with unstructured documents and want clean, usable data to:
feed into LLMs (for RAG/chatbots)
create searchable knowledge bases
automate workflows
or just stop wasting time copying/pasting from PDFs
It handles:
Large files (multi-hundred-page PDFs, technical docs, books)
Structure preservation (headings, lists, sections)
JSON formatting that’s easy to parse or feed into your pipelines
Built this after wasting too much time cleaning up documents to get usable data into my AI projects. Happy to answer questions, and very open to feedback or edge cases you'd like supported.
You can try it here: https://filedecomposer.com/
Would love to hear how others are tackling this problem, or if there are ways I can make this tool more useful for your workflows.