Show HN: KARMA – An evaluation framework for Medical AI systems
KARMA can evaluate text, image, and audio-based medical AI models using 21+ healthcare datasets We support popular models (Qwen, MedGemma, IndicConformer, OpenAI, Anthropic models - via AWS Bedrock, and practically any HuggingFace models) out-of-the-box KARMA also handles medical-specific evaluation needs like ASR models that need language-aware post-processing, or having LLM as a judge on rubric based evaluations. KARMA caches model outputs so you can iterate on metrics without re-running expensive inference.
Medical AI evaluation is currently fragmented – researchers often build custom evaluation scripts for each project. KARMA provides standardized metrics and a registry system where you can easily plug in your own models and datasets.
KARMA has extensible registry system with decorators for easy model/dataset integration. It supports custom metrics with dataset-specific post-processing. The model's output are cached based on the datapoint and the model configuration to speed up evaluation iterations.
The Indian healthcare focus came from our work focused on building AI systems for India. Most medical AI benchmarks are heavily skewed toward Western contexts, missing important regional variations in medical terminology, disease prevalence, and clinical practices.
To aid in this, we are also releasing 4 datasets - Medical ASR Evaluation Dataset, Medical Records Parsing Evaluation Dataset, Structured Clinical Note Generation Dataset, Eka Medical Summarisation Dataset. Find the collection here - https://huggingface.co/collections/ekacare/ekacare-medical-p...
Along with our datasets, we are also releasing 2 models from our Parrotlet series in the public domain licensed under MIT. Parrotlet-a-en-5b: A purpose-built model for automatic speech recognition for medical context for English and Parrotlet-v-lite-4b: A purpose-built model for medical report understanding. Link - https://huggingface.co/collections/ekacare/ekacare-public-he...
We've been using KARMA internally and thought the community might find it useful. Happy to answer questions about the architecture or specific use cases!
GitHub: https://github.com/eka-care/KARMA-OpenMedEvalKit
Docs: https://karma.eka.care
Release blog: https://info.eka.care/services/introducing-karma-openmedeval...
No comments yet