Learnings from two years of using AI tools for software engineering (newsletter.pragmaticengineer.com)

It's sad that we've made the internet so disorganized and crammed with advertising and crap that we now need tools to find actual information and summarize it for us.

hodgehog11 · 5m ago

2023 was a crazy and exciting year for AI research. LLMs have come a long way, but clearly still have a long way to go. They should do much better on most of these questions.

The discussion at the end also reminded me of how a lot of us took Gary Marcus' prose more seriously at the time before many of his short-term predictions started failing spectacularly.

ayhanfuat · 56m ago

Previous discussion: Don Knuth plays with ChatGPT - May 20, 2023, 626 comments, 927 points https://news.ycombinator.com/item?id=36012360

krackers · 50m ago

I'll never get over the fact that the grad student didn't even bother to use gpt-4, so this was using gpt 3.5 or something.

bigyabai · 35m ago

It's not the end of the world. Both are equally "impressive" at basic Q/A skills and GPT-4 is noticeably more sterile writing prose.

Even if GPT-3.5 was noticeably worse for any of these questions, it's honestly more interesting for someone's first experience to be with the exaggerated shortcomings of AI. The slightly-screwy answers are still endemic of what you see today, so it all ended well enough I think. Would've been a terribly boring exchange if Knuth's reply was just "looks great, thanks for asking ChatGPT" with no challenging commentary.

vbezhenar · 1h ago

For question 3, ChatGPT 5 Pro gave better answer:

> It isn’t “wrong.” Wolfram defines Binomial[n,m] at negative integers by a symmetric limiting rule that enforces Binomial[n,m] = Binomial[n,n−m]. With n = −1, m = −1 this forces Binomial[−1,−1] = Binomial[−1,0] = 1. The gamma-formula has poles at nonpositive integers, so values there depend on which limit you adopt. Wolfram chooses the symmetry-preserving limit; it breaks Pascal’s identity at a few points but keeps symmetry. If you want the convention that preserves Pascal’s rule and makes all cases with both arguments negative zero, use PascalBinomial[−1,−1] = 0. Wolfram added this explicitly to support that alternative definition.

Of course this particular question might have been in the training set.

Honestly 2.5 years feel like infinity when it comes to AI development. I'm using ChatGPT very regularly, and while it's far from perfect, recently it gave obviously wrong answers very rarely. Can't say anything about ChatGPT 5, I feel like in my conversations with AI, I've reached my limit, so I'd hardly notice AI getting smarter, because it's already smart enough for my questions.

jlarocco · 4m ago

> recently it gave obviously wrong answers very rarely

Are you concerned it may be giving you subtley wrong answers that you're not noticing? If you have to double check everything, is it really saving time?

seanhunter · 58m ago

On Wolfram specifically, GPT-5 is a huge step up from GPT-4. One of the first things I asked it was to write me a mathematica program to test the basic properties (injectivity, surjectivity, bijectivity) of various functions. The notebook it produced was

1) 100% correct

2) Really useful (ie it includes various things I didn’t ask for but are really great like a little manipulator to walk through the function at various points and visualize what the mapping is doing)

3) Built in a general way so I can easily change the mapping to explore different types of functions and how they work.

It seems very clear (both from what they said in the launch demos etc and from my experience of trying it out) that performance on coding tasks has been an area of massive focus and the results are pretty clear to me.

godelski · 16m ago

  > gave *obviously wrong* answers very rarely.

I don't think this is a reason I'd trust it, actually this is a reason I don't trust it.

There's a big difference between "obviously wrong" and "wrong". It is not objective but entirely depends on the reader/user.

The problem is it optimizes deception alongside accuracy. It's a useful tool but good design says we should want to make errors loud and apparent. That's because we want tools to complement us, to make us better. But if errors are subtle, nuanced, or just difficult to notice then there is actually a lot of danger to the tool (true for any tool).

I'm reminded of the Murray Gell-Mann Amnesia effect: you read something in the news paper that you're an expert in and lambast it for its inaccuracies, but then turn the page to something you don't have domain knowledge and trust it.

The reason I bring up MGA is because we don't often ask GPT things we know about or have deep knowledge in. But this is a good way to learn about how much we should trust it. Pretend to know nothing about a topic you are an expert in. Are its answers good enough? If not, then be careful when asking questions you can't verify.

Or, I guess... just ask it to solve "5.9 = x + 5.11"

tra3 · 54m ago

Right, I’m still trying to wrap my mind around how gpts work.

If we keep retraining them on the currently available datasets then the questions that stumped ChatGPT3 are in the training set for chatgpt5.

I don’t have the background to understand the functional changes between ChatGPT 3 and 5. It can’t be just the training data can it?

No comments yet

TZubiri · 7m ago

I was reading yesterday about a Buddhist concept (albeit quite popular in the west) called Begginer's Mind. I think this post represents it perfectly.

We are presented with a first reaction to chatgpt, we must never forget how incredible this technology is, and not become accustomed to it.

Donald knuth approached several of the questions from the absence of knowledge, asking questions as basic as "12. Write a sentence that contains only 5-letter words.", and being amazed not only by correct answers, but incorrect answers parsed effectively and with semantic understanding.

wslh · 2h ago

It would be great to have an update from Knuth. There is no other Knuth.

rvba · 48m ago

What is with those reposts?

Someone could at least run the same questions on the latest model and show the new answers.

Farming karma reddit style..

gjvc · 10m ago

[delayed]

Yet Another Online YAML Lint Tool (yamllint.vibecodinghub.org)

Show HN: Portfolio Calculator – a simple stock portfolio simulator (anotherchart.com)

Microsoft investigates Israeli military's use of Azure cloud storage (theguardian.com)

Show HN: Tovideo – AI Video Generator with 9 Models (Google Veo 3 etc.) (apps.apple.com)

Show HN: AI That Rewrites Your Shopify Store for Every Visitor in Real Time

Remarkable News in Potatoes (theatlantic.com)

The new American shopping mall is less Macy's, more church, bowling, bookstore (cnbc.com)

Show HN: Custom statusline for Claude Code with Git/PR/environment info (gist.github.com)

Wassette: Microsoft's Rust-Powered Bridge Between WASM and MCP (thenewstack.io)

Honky-Tonk Tokyo (2020) (afar.com)

JD Vance's team had water level of Ohio river raised for family's boating trip (theguardian.com)

Physical Media Is Cool Again. Streaming Services Have Themselves to Blame (rollingstone.com)

Citizen Lab director warns cyber industry about US authoritarian descent (techcrunch.com)

Show HN: Goat – An open-source social debate platform (goat.uz)

Curious about the training data of OpenAI's new GPT-OSS models? I was too (twitter.com)

Linus Torvalds Rejects RISC-V Changes for Linux 6.17: "Garbage" (phoronix.com)

Batch Inference Benchmarks (outerbounds.com)

Google rolls out AI coding tool for GitHub (infoworld.com)

Breaking through the Senior Engineer ceiling (incident.io)

We built Chipp – 1,650 users have moved $54K since March (chipp.it)

Debian 13 "Trixie" Released (micronews.debian.org)

Ch.at – a lightweight LLM chat service accessible through HTTP, SSH, DNS and API (ch.at)

Inspector: Visual testing tool for MCP servers (github.com)

Ask HN: OpenAI GPT-5 API seems to be significantly slower – is this expected?

Learnings from two years of using AI tools for software engineering (newsletter.pragmaticengineer.com)

Ask HN: Would you still recommend SICP in 2025?

Textile scientist on unshrinking clothes that's shrunk in the wash (theconversation.com)

Episode 2 – Wolf Rock Lighthouse maintenance visit and tour [video] (youtube.com)

Google Gemini's Self Loathing (businessinsider.com)

Show HN: I Started Building a Clay Alternative (enrichspot.com)

AOL discontinues dial-up Internet service (appleinsider.com)

Pkl Lang for Writing and Maintaining Config (pkl-lang.org)

Nvidia is dominating the S&P 500 more than any company in at least 44 years (sherwood.news)

Mike Oldfield recording the Blue Peter theme [video] (youtube.com)

Alexa Got an A.I. Brain Transplant. How Smart Is It Now? (nytimes.com)

Offshore.cat – The Real Offshore Hosting List (offshore.cat)

AI model uses audio to help protect endangered species (blog.google)

MCP vs. SDK: Two Paths to LLM-Powered Extensibility (osada.blog)

Presidential order might upend a long-standing tradition of grant peer-review (nature.com)

Ask HN: What do do with inherited WW2 material

People with a Home by the Ocean Live Longer and We Don't Know Why (sciencealert.com)

Roy Benavidez, the Fearless Vietnam War Veteran Who Survived 'Six Hours in Hell' (allthatsinteresting.com)

A new view of Africa's civilizations and archaeology – Knowable Magazine (knowablemagazine.org)

What's the Matter with Dallas? (derekthompson.org)

Did Shakespeare Write Hamlet While He Was Stoned? (lithub.com)

How This Entrepreneur Exaggerated and Self-Promoted Her Way into Turmoil (forbes.com)

Fake Friend How ChatGPT betrays vulnerable teens encouraging dangerous behavior (counterhate.com)

The Rock Art of Serrania De La Lindosa (earthasweknowit.com)

ImgEditor AI (imgeditor.ai)

Debian 13 "Trixie" Released (cdimage.debian.org)

Don Knuth on ChatGPT(07 April 2023)

Comments (13)