I submitted questions to HLE. I tend to agree that the review was far from perfect. For example, some of my questions were just not understood, and another was claimed to have the wrong answer, though it wasn't.
I think the situation is better for math and physics, where answers are more straightforward to verify; probably even worse in the humanities. I also believe that releasing the models' answers would help verify the questions, but it has never been done (possibly in fear of even more train-on-test?)
Edit: to clarify, when I contacted orgs they helped me with the problems but I guess it wouldn't happen if the problem was in opposite direction
falcor84 · 1d ago
It's funny, but I suppose it's in full alignment with the replication crisis - that's probably close to the amount of published "science" that is indeed wrong. Can we use this opportunity to connect each claim more directly with the evidence for it, and help resolve the crisis?
I think the situation is better for math and physics, where answers are more straightforward to verify; probably even worse in the humanities. I also believe that releasing the models' answers would help verify the questions, but it has never been done (possibly in fear of even more train-on-test?)
Edit: to clarify, when I contacted orgs they helped me with the problems but I guess it wouldn't happen if the problem was in opposite direction