A fundamental problem that we’re still far away from solving is not necessarily that LLMs/LRMs cannot reason the same way that we do (which I guess should be clear by now); but that they might not have to. They generate slop so fast that, if one can benefit a little bit from each output, i.e. if you can find a little bit of use hidden beneath the mountain of meaningless text they’ll create, then this might still be more valuable than preemptively taking the time to create something more meaningful to begin with. I can’t say for sure what is the reward system behind LLM use in general, but given how much money people are willing to spend with models even in their current deeply flawed state, I’d say it’s clear that the time savings are outweighing the mistakes and shallowness.
Take the comment paper, for example. Since Claude Opus is the first author, I’m assuming that the human author took a backseat and let the AI build the reasoning and most of the writing. Unsurprisingly, it is full of errors and contradictions, to a point where it looks like the human author didn’t bother too much to check what was being published. One might say that the human author, in trying to build some reputation by showing that their model could answer a scientific criticism, actually did the opposite: it provided more evidence that its model cannot reason deeply, and maybe hurt their reputation even more.
But the real question is, did they really? How much backlash will they possibly get from submitting this to arxiv without checking? Would that backlash keep them from submitting 10 more papers next week with Claude as the first author? If one puts in a balance the amount of slop you can put out (with a slight benefit) vs. the bad reputation one gets from it, I cannot say that “human thinking” is actually worth it anymore.
iLoveOncall · 1m ago
Mediocre people produce mediocre work. Using AI might make those mediocre people produce even worse work, but I don't think it'll affect competent people who have standards regardless of the available tooling.
If anything the outcome will be good: mediocre people will produce even worse work and will weed themselves out.
Take the comment paper, for example. Since Claude Opus is the first author, I’m assuming that the human author took a backseat and let the AI build the reasoning and most of the writing. Unsurprisingly, it is full of errors and contradictions, to a point where it looks like the human author didn’t bother too much to check what was being published. One might say that the human author, in trying to build some reputation by showing that their model could answer a scientific criticism, actually did the opposite: it provided more evidence that its model cannot reason deeply, and maybe hurt their reputation even more.
But the real question is, did they really? How much backlash will they possibly get from submitting this to arxiv without checking? Would that backlash keep them from submitting 10 more papers next week with Claude as the first author? If one puts in a balance the amount of slop you can put out (with a slight benefit) vs. the bad reputation one gets from it, I cannot say that “human thinking” is actually worth it anymore.
If anything the outcome will be good: mediocre people will produce even worse work and will weed themselves out.