Definite Rust bugs found by the Miri UB detector (github.com)
Lagrangian vs. Newtonian Mechanics [video] (youtube.com)
Show HN: Undatas.io – A pay-on-accept document parsing API
Our journey started from a place of deep frustration with RAG (Retrieval-Augmented Generation). I was helping companies build internal knowledge bases on their own data, and the promise was huge. But in practice, the results were often mediocre. Important information was frequently missed during retrieval, and we kept hitting dead ends.
After endless debugging, we realized the problem wasn't the LLM; it was classic "garbage in, garbage out." We traced the retrieval failures back to the very first step: document parsing.
Whether we used open-source libraries or expensive paid APIs, the story was the same. Precision was lost. Key phrases, critical numbers, and entire table rows would just vanish during the parsing process. We spent countless hours manually comparing the original PDFs to the parsed output to find what went wrong. It was a soul-crushing, time-consuming nightmare.
The biggest pain points were:
1. Complex Tables: Most tools collapsed when faced with real-world documents. Borderless tables, cells merged across rows and columns, or tables containing handwritten notes were consistently mangled.
2. Lack of a Feedback Loop: When the parser got something wrong, there was no easy way to manually annotate and correct it. You were stuck with the bad output.
I got so frustrated that I decided to build the tool I wished I had: a parsing engine obsessed with precision, that makes the entire data extraction process transparent. That’s what undatas.io is. And today, we're launching our API.
We built our API around a simple principle: you only pay for results you actually accept.
To solve the transparency problem, every piece of extracted data in the JSON response includes its positional coordinates (bbox). This allows you to build your own "glass box" validator, mapping the data directly back to the source document, making the data prep stage for RAG completely transparent.
Our goal is to build the best and most trustworthy parsing tool for developers. We're just getting started and would be grateful for your feedback.
You can check out the docs and try it out here: https://doc.undatas.io/
I’ll be here all day to answer any questions. Let me know what you think.
No comments yet