Don Knuth on ChatGPT(07 April 2023)

62 b-man 13 8/9/2025, 5:13:14 PM cs.stanford.edu ↗

Comments (13)

jlarocco · 5m ago
It's sad that we've made the internet so disorganized and crammed with advertising and crap that we now need tools to find actual information and summarize it for us.
hodgehog11 · 5m ago
2023 was a crazy and exciting year for AI research. LLMs have come a long way, but clearly still have a long way to go. They should do much better on most of these questions.

The discussion at the end also reminded me of how a lot of us took Gary Marcus' prose more seriously at the time before many of his short-term predictions started failing spectacularly.

ayhanfuat · 56m ago
Previous discussion: Don Knuth plays with ChatGPT - May 20, 2023, 626 comments, 927 points https://news.ycombinator.com/item?id=36012360
krackers · 50m ago
I'll never get over the fact that the grad student didn't even bother to use gpt-4, so this was using gpt 3.5 or something.
bigyabai · 35m ago
It's not the end of the world. Both are equally "impressive" at basic Q/A skills and GPT-4 is noticeably more sterile writing prose.

Even if GPT-3.5 was noticeably worse for any of these questions, it's honestly more interesting for someone's first experience to be with the exaggerated shortcomings of AI. The slightly-screwy answers are still endemic of what you see today, so it all ended well enough I think. Would've been a terribly boring exchange if Knuth's reply was just "looks great, thanks for asking ChatGPT" with no challenging commentary.

vbezhenar · 1h ago
For question 3, ChatGPT 5 Pro gave better answer:

> It isn’t “wrong.” Wolfram defines Binomial[n,m] at negative integers by a symmetric limiting rule that enforces Binomial[n,m] = Binomial[n,n−m]. With n = −1, m = −1 this forces Binomial[−1,−1] = Binomial[−1,0] = 1. The gamma-formula has poles at nonpositive integers, so values there depend on which limit you adopt. Wolfram chooses the symmetry-preserving limit; it breaks Pascal’s identity at a few points but keeps symmetry. If you want the convention that preserves Pascal’s rule and makes all cases with both arguments negative zero, use PascalBinomial[−1,−1] = 0. Wolfram added this explicitly to support that alternative definition.

Of course this particular question might have been in the training set.

Honestly 2.5 years feel like infinity when it comes to AI development. I'm using ChatGPT very regularly, and while it's far from perfect, recently it gave obviously wrong answers very rarely. Can't say anything about ChatGPT 5, I feel like in my conversations with AI, I've reached my limit, so I'd hardly notice AI getting smarter, because it's already smart enough for my questions.

jlarocco · 4m ago
> recently it gave obviously wrong answers very rarely

Are you concerned it may be giving you subtley wrong answers that you're not noticing? If you have to double check everything, is it really saving time?

seanhunter · 58m ago
On Wolfram specifically, GPT-5 is a huge step up from GPT-4. One of the first things I asked it was to write me a mathematica program to test the basic properties (injectivity, surjectivity, bijectivity) of various functions. The notebook it produced was

1) 100% correct

2) Really useful (ie it includes various things I didn’t ask for but are really great like a little manipulator to walk through the function at various points and visualize what the mapping is doing)

3) Built in a general way so I can easily change the mapping to explore different types of functions and how they work.

It seems very clear (both from what they said in the launch demos etc and from my experience of trying it out) that performance on coding tasks has been an area of massive focus and the results are pretty clear to me.

godelski · 16m ago

  > gave *obviously wrong* answers very rarely.
I don't think this is a reason I'd trust it, actually this is a reason I don't trust it.

There's a big difference between "obviously wrong" and "wrong". It is not objective but entirely depends on the reader/user.

The problem is it optimizes deception alongside accuracy. It's a useful tool but good design says we should want to make errors loud and apparent. That's because we want tools to complement us, to make us better. But if errors are subtle, nuanced, or just difficult to notice then there is actually a lot of danger to the tool (true for any tool).

I'm reminded of the Murray Gell-Mann Amnesia effect: you read something in the news paper that you're an expert in and lambast it for its inaccuracies, but then turn the page to something you don't have domain knowledge and trust it.

The reason I bring up MGA is because we don't often ask GPT things we know about or have deep knowledge in. But this is a good way to learn about how much we should trust it. Pretend to know nothing about a topic you are an expert in. Are its answers good enough? If not, then be careful when asking questions you can't verify.

Or, I guess... just ask it to solve "5.9 = x + 5.11"

tra3 · 54m ago
Right, I’m still trying to wrap my mind around how gpts work.

If we keep retraining them on the currently available datasets then the questions that stumped ChatGPT3 are in the training set for chatgpt5.

I don’t have the background to understand the functional changes between ChatGPT 3 and 5. It can’t be just the training data can it?

No comments yet

TZubiri · 7m ago
I was reading yesterday about a Buddhist concept (albeit quite popular in the west) called Begginer's Mind. I think this post represents it perfectly.

We are presented with a first reaction to chatgpt, we must never forget how incredible this technology is, and not become accustomed to it.

Donald knuth approached several of the questions from the absence of knowledge, asking questions as basic as "12. Write a sentence that contains only 5-letter words.", and being amazed not only by correct answers, but incorrect answers parsed effectively and with semantic understanding.

wslh · 2h ago
It would be great to have an update from Knuth. There is no other Knuth.
rvba · 48m ago
What is with those reposts?

Someone could at least run the same questions on the latest model and show the new answers.

Farming karma reddit style..

gjvc · 10m ago
[delayed]