Show HN: Fighting Medical LLM Hallucinations with a Grounded RAG System

4 heliosinc 0 8/16/2025, 2:54:37 PM my-openhealth.com ↗

Hi HN,

We've been frustrated with how confidently LLMs hallucinate—a dangerous flaw in high-stakes domains like health and medicine. The standard "I am not an expert" disclaimer feels insufficient since we all ignore those statements.

Our approach is a RAG/agentic system built to solve this. It runs on ~40M+ scientific papers, but goes beyond simple retrieval. A multi-agent workflow decomposes queries, cross-references claims against multiple sources, and synthesizes answers, ensuring every key statement is cited directly from the literature. Beyond the literature, our agent system has tools to access the internet, databases, and social platforms, with dedicated review agents to ensure proper citation and reduce hallucinations.

This is just the start. Our long-term goal is building health superintelligence by integrating multiscale data—from the genomic and cellular level all the way up to clinical studies in humans. To achieve this, we're exploring SFT, RL, and self-improvement techniques like GEPA to create models that can evolve their own scientific reasoning and to pioneer new standards for accuracy/hallucination mitigation. We plan to rigorously benchmark our work and share the data publicly.

We'd love specific feedback on:

Our RAG/agentic architecture—what failure modes are we missing?

On building superintelligence—beyond SFT/RL/GEPA, what other techniques should we be exploring for a model to truly understand multiscale biology/health/medicine?

Evaluation—what are the best benchmarks for medical/health AI trustworthiness today?

The site itself—any thoughts on the UI/UX, quality of the responses, or other features?

You can see the current system here: https://www.my-openhealth.com/

Ask HN: Do you still bookmark websites?

Ask HN: What payment gateways are you using for a subscription-based service

VC-backed company just killed my EU trademark for a small OSS project

F-Droid build servers can't build modern Android apps due to outdated CPUs

Hardware and software for scanning and OCR old magazines

ChatGPT-5 System Prompt Leaked

Ask HN: How do you tune your personality to get better at interviews?

Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

Ask HN: Tesla switching from "Godot" to "Unreal": is this ~informative?

Ask HN: Is the rise of AI tools going to be the next 'dot com' bust?

Ask HN: What alternatives to GitHub are you using?

Ask HN: What toolchains are people using for desktop app development in 2025?

Ask HN: What problem would you solve with unlimited resources? (August 2025)

Individual Bestbuy email subscription pages are apparently indexed by Google

Ask HN: What are your favorite obscure but brilliant C/C++ libraries?

Ask HN: What's your most valuable query to an LLM?

Ask HN: How are you scaling AI agents reliably in production?

One weird trick to making Claude Code palatable to use

What would you name a new programming language?

Ask HN: How is Cognition able to raise at $10B?

Ask HN: Is there an AI that can read code aloud and explain it?

Ask HN: Do you use personal AI Agents?

Design History in India

Ask HN: Are there software engineering areas that are safe from LLMs invasion?

Ask HN: Why Is My Happiness Tied to My Productivity?

Ask HN: What "developer holy war" have you flip-flopped on?

Show HN: Fighting Medical LLM Hallucinations with a Grounded RAG System

Comments (0)