Any words of advice on trying to find a job? (buttondown.com)
InstaClock Product Updates – May 25, 2025 (instaclock.app)
Show HN: Getting full-text scientific content into LLMs+Agents is stupidly hard
We hit this building agentic workflows and RAG backends. What we needed wasn’t “search”, it was a way to retrieve real, structured full text with enough metadata to plug straight into a reasoning system. So we built a system that could do that: multimodal inputs (text, math, figures), clean citations, reference chaining, and filters that work (by date, by source, etc).
The hard part wasn’t retrieval but preprocessing at scale. Figuring out how to analyse, chunk, structure tens of millions of docs without taking months or breaking the bank. Not to mention dealing with licensed content where formats vary wildly or building retrieval systems at this scale.
Still a work in progress with more updates on the way. But miles better than duct-taping together PDFs, AI search engines etc. and hoping to find the relevant context you need.