Lightning declines over shipping lanes following regulation of sulfur emissions (theconversation.com)

For someone enthusiastically using LLMs since GPT-3, the question gives off a strong vibe of not being a good question for a LLM. Is anyone still surprised by that? Doesn’t everyone quickly develop such intuition?

d4rkn0d3z · 2h ago

I'm not sure intuition is required. Please bear with me.

If I ask a factual question of AI it will issue some output. In order for me to check that output, which I am apparently bound to do in all cases, I must check reliable sources, perhaps several. But that is precisely the work I wanted to avoid by using AI. Ergo, the AI has increased my work load because I had the extra useless step of asking the AI. Obviously, I could have simply checked several reliable sources in the first place. I see this as the razor at work.

It ought to be clear now that the use of AI for factual questions entails that it be trustworthy; when you ask an AI a factual question, the work you are hoping to avoid is equal to the work of checking the AI output. Hence, no time can ever be saved by asking factual questions of an untrustworthy AI.

QED

P.S. This argument, and its extensions, occurred to me and my advisors 25 years ago. It caused me to conclude that building anything other than a near perfect AI is pointless, except as a research project to discover the path to a nearly perfect AI. Nearly perfect should be interpreted to be something like "as reliable as the brakes on your car" in terms of MTBF.

6510 · 2h ago

With patents there is this funny situation where you need to know exactly how to do something in order to find the document.

I forget who came up with the idea but we could create a database with functions for every use case with the idea to never have to write something already written but finding the one you are looking for (by conventional search) would take more time than writing from scratch.

AI just provides new angles to attack from. It could save time or take more time, bit of a gamble. Examine your cards before placing the bet.

d4rkn0d3z · 2h ago

Sounds practical, however, a new means of attack that requires me to verify afterward whether the correct target was attacked and whether claimed victories are real takes me back to the argument I gave above.

politelemon · 2h ago

I don't think they do. We know that they are imprecise and based on probability. The vast majority of users outside our online circles treat it as authoritative sources. The average user is not and should not have to be aware of that aspect of it.

baq · 3h ago

You’re asking a lossily compressed database with an imprecise and ambiguous query language interface about hard facts, you get a plausible reconstructed answer.

Work with the tool to get best results instead. You wouldn’t do csi style zoom enhance on a jpeg either.

lxgr · 3h ago

That's not what popular chat interfaces to LLMs have been for quite a while now.

They can and do make extensive use of web search, and since they're pretty good at summarizing structured and unstructured text, this actually works quite well in my experience.

baq · 2h ago

That’s exactly my point - the screenshots in TFA don’t show any tool usage by bots.

a2128 · 2h ago

ChatGPT and Gemini almost certainly did because they both cite links as sources, and when I ask the same question as a free user on ChatGPT the search tool usage is only shown before the response is generated.

lxgr · 3h ago

So, when was it released? Did one of them get it right? Or are all readers about this article on LLM (non-)capabilities expected to be familiar with Cisco's product lines?

oezi · 1h ago

Search Google for it seems not turn up easy to verify results.

On Amazon available since Sep 2018:

https://www.amazon.de/-/en/C1101-4P-Integrated-Services-Ethe...

But is it the right model? Does the release date actually matter to anyone?

mehulashah · 3h ago

So, what’s the right answer and how do you know? The only way to know is to go to some primary source.

ares623 · 3h ago

Have you tried enabling deep thinking/research? (/s)

jqpabc123 · 3h ago

LLMs don't provide answers.

They provide information --- some of which is random in nature and only casually reflective of any truth or reality.

And as this example illustrates, they are far from being trustworthy. Their main achievement is to consistently produce functionally acceptable grammar.

lxgr · 3h ago

LLMs don't provide correct answers to all questions, but claiming that they don't provide answers at all seems absurd.

d4rkn0d3z · 1h ago

Not really absurd, even broken clocks get the time right twice a day. If you read the clock at that time by chance, you may conclude that the clock is working better than it is.

Is an answer that is correct by chance the same as one that is correct by reason?

JimDabell · 3h ago

> LLMs don't provide answers.

If I ask an LLM “What is the capital of France?” and it answers “Paris.”, then it has provided an answer by any reasonable definition of the term.

This anti-AI weirdness where people play word games to deny what AI is clearly doing has to stop.

jqpabc123 · 41s ago

I just asked an LLM "What is the capital of Eswatini".

It answered "Mbabane".

There was no mention of the fact that there are actually 2 capitals --- Mbabane (the administrative capital) and Lobamba which serves as the executive seat of government.

The point being --- any "answer" from an LLM is questionable. An an unreliable answer is really not an answer at all.

Manim: Animation engine for explanatory math videos (github.com)

Rethinking the Linux cloud stack for confidential VMs (lwn.net)

Developer's block (underlap.org)

I Made a Floppy Disk from Scratch (kottke.org)

WebR – R in the Browser (docs.r-wasm.org)

World Wide Lightning Location Network (wwlln.net)

Building a computer in the 90s (2019) (dfarq.homeip.net)

The ROI of Exercise (herman.bearblog.dev)

Shader Academy: Learn computer graphics by solving challenges (shaderacademy.com)

You can't grow cool-climate plants in hot climates (crimepaysbutbotanydoesnt.com)

Lightning declines over shipping lanes following regulation of sulfur emissions (theconversation.com)

I'm too dumb for Zig's new IO interface (openmymind.net)

Show HN: JavaScript-free (X)HTML Includes (github.com)

David Klein's TWA Posters (2018) (flashbak.com)

The Fancy Rug Dilemma (epan.land)

Nitro: A tiny but flexible init system and process supervisor (git.vuxu.org)

Self-driving cars begin testing on NYC streets (amny.com)

Echidna Enters a New Era of Symbolic Execution (gustavo-grieco.github.io)

The theory and practice of selling the Aga cooker (1935) [pdf] (comeadwithus.wordpress.com)

The first Media over QUIC CDN: Cloudflare (moq.dev)

From M1 MacBook to Arch Linux: A month-long experiment that became permanenent (ssp.sh)

I run a full Linux desktop in Docker just because I can (howtogeek.com)

Top Secret: Automatically filter sensitive information (thoughtbot.com)

FFmpeg 8.0 (ffmpeg.org)

Websites and web developers mostly don't care about client-side problems (utcc.utoronto.ca)

The use of LLM assistants for kernel development (lwn.net)

Glyn: Type-safe PubSub and Registry for Gleam actors with distributed clustering (github.com)

Measuring the environmental impact of AI inference (arstechnica.com)

My tips for using LLM agents to create software (efitz-thoughts.blogspot.com)

LabPlot: Free, open source and cross-platform Data Visualization and Analysis (labplot.org)

The issue of anti-cheat on Linux (2024) (tulach.cc)

Leaving Gmail for Mailbox.org (giuliomagnifico.blog)

Bluesky Goes Dark in Mississippi over Age Verification Law (wired.com)

Computer fraud laws used to prosecute leaking air crash footage to CNN (techdirt.com)

It’s not wrong that "\u{1F926}\u{1F3FC}\u200D\u2642\uFE0F".length == 7 (2019) (hsivonen.fi)

Launch HN: BlankBio (YC S25) – Making RNA Programmable

A visual history of Visual C++ (2017) (malsmith.net)

Closing the Nix gap: From environments to packaged applications for rust (devenv.sh)

Popular Japanese smartphone games have introduced external payment systems (english.kyodonews.net)

VHS-C: When a lazy idea stumbles towards perfection [video] (youtube.com)

Transcribe music in abc with syntax highlighting (fugue-state.io)

Why is this hard? (programmersstone.blog)

Io_uring, kTLS and Rust for zero syscall HTTPS server (blog.habets.se)

What about using rel="share-url" to expose sharing intents? (shkspr.mobi)

How Not to Buy a SSD (andrei.xyz)

What Happened to Egghead Software (dfarq.homeip.net)

Launch HN: Inconvo (YC S23) – AI agents for customer-facing analytics

Japan city drafts ordinance to cap smartphone use at 2 hours per day (english.kyodonews.net)

U.S. government takes 10% stake in Intel (cnbc.com)

Build Log: Macintosh Classic (jeffgeerling.com)

Asking three LLMs a simple question

Comments (18)