Show HN: Every problem and solution in Beyond Cracking the Coding Interview

1 points by leeny 49s ago 0 comments

Security breach: Russian nuclear facilities exposed online (danwatch.dk)

1 points by stochtastic 1m ago 0 comments

Scrappy: Make little apps for you and your friends (pontus.granstrom.me)

1 points by jimbosis 2m ago 0 comments

The AI Software Architect (github.com)

1 points by codenamev 3m ago 1 comments

Ask HN: Tired of all the AI, what other cool tech is out there?

1 points by Mariefay 5m ago 0 comments

Linux 6.16 Adds x86_NATIVE_CPU Option to Optimize Your Kernel Build for Your CPU (phoronix.com)

1 points by mfiguiere 7m ago 0 comments

Pipedream Gave Lleverage the Fast Lane to SaaS Integration Coverage (pipedream.com)

1 points by todsacerdoti 8m ago 0 comments

Pipedream Gave Lleverage the Fast Lane to SaaS Integration Coverage (pipedream.com)

1 points by todsacerdoti 9m ago 0 comments

Show HN: AI in Email Conversations (subjam.com)

1 points by wolfman1 9m ago 0 comments

The use of AI by AI engineers (vstorm.co)

3 points by bartgnk 10m ago 0 comments

Revenge of the Chickenized Reverse-Centaurs (pluralistic.net)

1 points by GreenWatermelon 11m ago 1 comments

Anti-drone systems offer new ways to counter rising threats (apnews.com)

3 points by hn_acker 12m ago 1 comments

Show HN: I rewrote my Mac Electron app in Rust (desktopdocs.com)

18 points by katrinarodri 13m ago 1 comments

Oldest known tools made from whale bones dated to 20k years ago (apnews.com)

1 points by gmays 14m ago 0 comments

China Unveils First 'Meltdown Proof' Thorium Reactor (oilprice.com)

1 points by danboarder 15m ago 0 comments

Pipedream Gave Lleverage the Fast Lane to SaaS Integration Coverage (pipedream.com)

1 points by todsacerdoti 17m ago 0 comments

Using brain-dead people for medical experiments: bioethics debate (english.elpais.com)

2 points by PaulHoule 17m ago 0 comments

DeepSeek Unveils Update to R1 Model (bloomberg.com)

2 points by ekaesmem 17m ago 1 comments

Reed Hastings appointed to Anthropic's board of directors (anthropic.com)

2 points by mfiguiere 18m ago 0 comments

Show HN: Deliver Free Water and Promote Your Brand – Starting at $1 (freewater.io)

1 points by forbetterlife 19m ago 0 comments

Exploring a Language Runtime with Bpftrace (mgaudet.ca)

1 points by mgaudet 20m ago 0 comments

Open Source AI Infrastructure (WebAI)

2 points by eduardoworrel 20m ago 0 comments

Glacier collapses burying large parts of Swiss village Blatten (swissinfo.ch)

2 points by sschueller 21m ago 0 comments

The Decline of Battery Life (brainbaking.com)

2 points by akyuu 22m ago 0 comments

State, despite ample evidence, claims no high-risk AI in California government (calmatters.org)

2 points by pessimizer 22m ago 0 comments

Prompt2Voice (omnidim.io)

1 points by ShounakYC 23m ago 1 comments

UAE makes ChatGPT Plus free for all residents as part of deal with OpenAI (businesstoday.in)

1 points by jadijadi 23m ago 0 comments

Neil deGrasse Tyson: pedantry in space (2016) (samkriss.com)

1 points by thunderbong 23m ago 0 comments

Any words of advice on trying to find a job? (buttondown.com)

1 points by mooreds 25m ago 0 comments

Pareto Frontier for the Agents (github.com)

2 points by sytelus 27m ago 0 comments

InstaClock Product Updates – May 25, 2025 (instaclock.app)

1 points by flashblaze 28m ago 0 comments

The Unparalleled Daily Miracle of Tap Water (nytimes.com)

1 points by 2OEH8eoCRo0 28m ago 2 comments

Bandit: Find common security issues in Python code (github.com)

1 points by saikatsg 28m ago 0 comments

A Primer on Proxies (blog.cloudflare.com)

2 points by aburan28 29m ago 0 comments

My SaaS Security Breach: Why Security Should Care About Every App (reco.ai)

2 points by mooreds 29m ago 0 comments

TXSE: An electronic national securities exchange (txse.com)

1 points by mooreds 29m ago 0 comments

Related Repos – Discover related open source projects (relatedrepos.com)

1 points by consumer451 30m ago 0 comments

Study Reveals the Lasting Voter Suppression Effects of Restrictive Texas Law (brennancenter.org)

2 points by hn_acker 30m ago 1 comments

Show HN: GetZen.news – A news app that only shows what affects you (getzen.news)

1 points by alexcloudstar 30m ago 1 comments

Join the Enthusiasts! (aiascendant.com)

1 points by rezendi 30m ago 1 comments

Meet the dbt Fusion Engine: the new Rust-based, industrial-grade engine for dbt (docs.getdbt.com)

5 points by data_ders 30m ago 1 comments

P2P Media Loader (github.com)

1 points by OsrsNeedsf2P 31m ago 0 comments

Short Note on Montgomery Reduction: Why Operating Modulo B? (cryptologie.net)

1 points by speckx 31m ago 0 comments

Astrocytes might explain the human brain's storage capacity (news.mit.edu)

3 points by gmays 32m ago 0 comments

Web-linked pulse oximeter ring monitors, analyzes night-to-night sleep metrics (healio.com)

1 points by zomg 32m ago 0 comments

After Deepfaking YouTube, Google's Veo 3 Could Slop-Ify Video Games Next (gizmodo.com)

6 points by rntn 33m ago 0 comments

Japan Post launches 'digital address' system (japantimes.co.jp)

3 points by jmsflknr 33m ago 0 comments

Microsoft's ICC email block reignites European data sovereignty concerns (computerweekly.com)

4 points by doener 34m ago 0 comments

World faces new danger of 'economic denial' in climate fight, Cop30 head says (theguardian.com)

5 points by doener 35m ago 0 comments

Field-Portable Illicit Drug Discrimination via Deep Learning and Spectroscopy (pubs.acs.org)

1 points by PaulHoule 35m ago 0 comments

Show HN: Getting full-text scientific content into LLMs+Agents is stupidly hard

3 zk108 2 5/27/2025, 8:37:18 PM valyu.network ↗

Most APIs don’t return actual content. You get metadata, maybe an abstract, maybe a snippet...never the thing itself. And if you want proper sources like arXiv, PubMed, or major publishers? Good luck. You’re stuck scraping tens of millions PDFs or semantic scholar and building your own ingestion pipeline.

We hit this building agentic workflows and RAG backends. What we needed wasn’t “search”, it was a way to retrieve real, structured full text with enough metadata to plug straight into a reasoning system. So we built a system that could do that: multimodal inputs (text, math, figures), clean citations, reference chaining, and filters that work (by date, by source, etc).

The hard part wasn’t retrieval but preprocessing at scale. Figuring out how to analyse, chunk, structure tens of millions of docs without taking months or breaking the bank. Not to mention dealing with licensed content where formats vary wildly or building retrieval systems at this scale.

Still a work in progress with more updates on the way. But miles better than duct-taping together PDFs, AI search engines etc. and hoping to find the relevant context you need.

Comments (2)

yorkeccak · 20h ago

aligns very well with what Anthropic researchers said on a recent podcast that even if AI progress stalls, current AI models are already capable of automating all white-collar jobs - the only lacking components being better access to information, and the infra/workflows around the models themselves

yorkeccak · 20h ago

https://x.com/evankirstel/status/1927184767218229309?s=46