Show HN: KARMA – An evaluation framework for Medical AI systems

2 k2so 0 8/11/2025, 3:44:11 PM karma.eka.care ↗

KARMA-OpenMedEvalKit is an expandable toolkit for assessing AI models in medical applications, featuring multiple healthcare-focused datasets with particular emphasis on the Indian healthcare environment.

KARMA can evaluate text, image, and audio-based medical AI models using 21+ healthcare datasets We support popular models (Qwen, MedGemma, IndicConformer, OpenAI, Anthropic models - via AWS Bedrock, and practically any HuggingFace models) out-of-the-box KARMA also handles medical-specific evaluation needs like ASR models that need language-aware post-processing, or having LLM as a judge on rubric based evaluations. KARMA caches model outputs so you can iterate on metrics without re-running expensive inference.

Medical AI evaluation is currently fragmented – researchers often build custom evaluation scripts for each project. KARMA provides standardized metrics and a registry system where you can easily plug in your own models and datasets.

KARMA has extensible registry system with decorators for easy model/dataset integration. It supports custom metrics with dataset-specific post-processing. The model's output are cached based on the datapoint and the model configuration to speed up evaluation iterations.

The Indian healthcare focus came from our work focused on building AI systems for India. Most medical AI benchmarks are heavily skewed toward Western contexts, missing important regional variations in medical terminology, disease prevalence, and clinical practices.

To aid in this, we are also releasing 4 datasets - Medical ASR Evaluation Dataset, Medical Records Parsing Evaluation Dataset, Structured Clinical Note Generation Dataset, Eka Medical Summarisation Dataset. Find the collection here - https://huggingface.co/collections/ekacare/ekacare-medical-p...

Along with our datasets, we are also releasing 2 models from our Parrotlet series in the public domain licensed under MIT. Parrotlet-a-en-5b: A purpose-built model for automatic speech recognition for medical context for English and Parrotlet-v-lite-4b: A purpose-built model for medical report understanding. Link - https://huggingface.co/collections/ekacare/ekacare-public-he...

We've been using KARMA internally and thought the community might find it useful. Happy to answer questions about the architecture or specific use cases!

GitHub: https://github.com/eka-care/KARMA-OpenMedEvalKit

Docs: https://karma.eka.care

Release blog: https://info.eka.care/services/introducing-karma-openmedeval...

Wikimedia Foundation Challenges UK Online Safety Act Regulations (wikimediafoundation.org)

GitHub is no longer independent at Microsoft after CEO resignation (theverge.com)

I tried every todo app and ended up with a .txt file (al3rez.com)

Neki – sharded Postgres by the team behind Vitess (planetscale.com)

OpenSSH Post-Quantum Cryptography (openssh.com)

Claude Is the Drug, Cursor Is the Dealer (middlelayer.substack.com)

The Value of Institutional Memory (timharford.com)

The Joy of Mixing Custom Elements, Web Components, and Markdown (deanebarker.net)

Byte Buddy is a code generation and manipulation library for Java (bytebuddy.net)

UI vs. API. vs. UAI (joshbeckman.org)

Trellis (YC W24) Is Hiring: Automate Prior Auth in Healthcare (ycombinator.com)

Claude Code is all you need (dwyer.co.za)

How Boom uses software to accelerate hardware development (bscholl.substack.com)

Pricing Pages – A Curated Gallery of Pricing Page Designs (pricingpages.design)

The Chrome VRP Panel has decided to award $250k for this report (issues.chromium.org)

Learn, Reflect, Apply, Prepare: The Four Daily Practices That Changed How I Live (opuslabs.substack.com)

White Mountain Direttissima (whitemountainski.co)

A Guide Dog for the Face-Blind (asimov.blog)

Wikipedia loses challenge against Online Safety Act verification rules (bbc.com)

AP to end its weekly book reviews (dankennedy.net)

Launch HN: Halluminate (YC S25) – Simulating the internet to train computer use

Token growth indicates future AI spend per dev (blog.kilocode.ai)

36B solar mass black hole at centre of the Cosmic Horseshoe gravitational lens (academic.oup.com)

Faster substring search with SIMD in Zig (aarol.dev)

Designing Software in the Large (dafoster.net)

Porting to OS/2 – GitPius (gitpi.us)

Mistral Integration Improved in Llama.cpp (github.com)

A simple pixel physics simulator in Rust using Macroquad (github.com)

Apache Iceberg V3 Spec new features for more efficient and flexible data lakes (opensource.googleblog.com)

Optimizing my sleep around Claude usage limits (mattwie.se)

A Global Look at Teletext (text-mode.org)

Ollama and gguf (github.com)

Show HN: ServerBuddy – GUI SSH client for managing Linux servers from macOS (serverbuddy.app)

Millau Viaduct (fosterandpartners.com)

Going faster than memcpy (squadrick.dev)

GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAM (old.reddit.com)

Why the em dash is attracting unfair suspicion (theglobeandmail.com)

Compiling a Lisp: Lambda lifting (bernsteinbear.com)

Show HN: Free SVG Icons – Browse, customize, and grab icons (iconshelf.com)

Operation Costs in CPU Clock Cycles (2016) (ithare.com)

Hand-picked selection of articles on AI fundamentals/concepts (aman.ai)

Meta brought AI to rural Colombia. Now students are failing exams (restofworld.org)

Lists and Lists: Basics of Lisp through interactive fiction (1996) (eblong.com)

AOL to discontinue dial-up internet (nytimes.com)

Why deterministic output from LLMs is nearly impossible (unstract.com)

Schools are next for Flock Safety's automatic license place reader cameras (therecord.media)

Generic Containers in C: Safe Division Using Maybe (uecker.codeberg.page)

Optimizing My Disk Usage Program (healeycodes.com)

Auf Wiedersehen, GitHub (github.blog)

Graham: Synchronizing Clocks by Leveraging Local Clock Properties (2022) [pdf] (usenix.org)

Show HN: KARMA – An evaluation framework for Medical AI systems

Comments (0)