Show HN: ClearDoc – Extract fields from any document using OCR and LLM
I recently launched a prototype of *ClearDoc*, an AI-powered tool to extract structured data from unstructured documents like invoices, bills of lading, certificates, etc.
It uses *OCR (PaddleOCR)* and *LLMs* to detect and align key fields — even for complex documents with tables, nested fields, or in different languages.
It doesn't require templates and can be *self-hosted* (demo runs on my own GPU).
Live demo (no sign-up): http://cleardoc.v5ent.com/ Demo video: https://www.youtube.com/watch?v=u83T6iewfNs
Right now: - Fields are auto-aligned visually on the document - Works with PDFs, images, scans - No custom field design/editing in the demo yet
Would love feedback on: - Which use cases matter most to you? - What would make this valuable enough to adopt?
Thanks!