Show HN: A Comprehensive AI Data Quality Evaluation Tool
What's included:
50+ evaluation metrics across text, image, and multimodal data Academic citations for every metric (RedPajama, CLIP, NIMA, etc.) Rule-based and LLM-based evaluation approaches Practical usage examples and API documentation Key categories:
Text Quality: Completeness, Fluency, Relevance, Effectiveness Image Quality: Clarity, Similarity, Validity Security: Political sensitivity, prohibited content, harmful information Classification: Topic categorization, content classification This is particularly useful for:
Data scientists working on model training Researchers needing standardized evaluation frameworks Anyone dealing with large-scale data quality assessment The documentation includes detailed academic references and practical implementation examples. All open source and ready to use.
Metrics Link: https://github.com/MigoXLab/dingo/blob/dev/docs/metrics.md
No comments yet