Show HN: TheorIA – An Open Curated Physics Dataset (Equations,Explanations,JSON)
Why? Physics is rich with beautiful, formal results — but most of them are trapped in PDFs, LaTeX, or lecture notes. That makes it hard to:
- train symbolic/physics-aware ML models,
- build derivation-checking tools,
- or even just teach physics interactively.
THEORIA fills that gap. Each entry includes:
A result name (e.g., Lorentz transformations)
Clean equations (AsciiMath)
Straightforward step-by-step derivation with reasoning
Symbol definitions & assumptions
Programmatic validation using sympy
References, arXiv-style domain tags, and contributor metadata
Everything is in open, self-contained JSON files. No scraping, no PDFs, just clear structured data for physics learners, teachers, and ML devs.
Contributors Wanted: We’re tiny right now and trying to grow. If you’re into physics or symbolic ML:
Add an entry (any result you love)
Review others' derivations
Build tools on top of the dataset
GitHub https://github.com/theoria-dataset/theoria-dataset/
Licensed under CC-BY 4.0, and we welcome educators, students, ML people, or just anyone who thinks physics deserves better data.