An example of the prompt engineering phenomenon: my wife and I were recently discussing a financial decision. I'd offered my arguments in favor of one choice and she was mostly persuaded but decided to check in with ChatGPT to help reassure herself that I was right. She asked the financial question in layman's terms and got the opposite answer that I had given.
She showed me the result and I immediately saw the logical flaws and pointed them out to her. She pressed the model on it and it of course apologized and corrected itself. Out of curiosity I tried the prompt again, this time using financial jargon that I was familiar with and my wife was not. The intended meaning of the words was the same, the only difference is that my prompt sounded like it came from someone who knew finance. The result was that the model got it right and gave an explanation for the reasoning in exacting detail.
It was an interesting result to me because it shows that experts in a field are not only more likely to recognize when a model is giving incorrect answers but they're also more likely to get correct answers because they are able to tap into a set of weights that are populated by text that knew what it was talking about. Lay people trying to use an LLM to understand an unfamiliar field are vulnerable to accidentally tapping into the "amateur" weights and ending up with an answer learned from random Reddit threads or SEO marketing blog posts, whereas experts can use jargon correctly in order to tap into answers learned from other experts.
whstl · 2h ago
I very often also get better programming results than less experienced engineers, even though I'm not remotely doing any kind of "prompt engineering".
Also how you ask matters a lot. Sometimes it just wants to make you happy with whatever answer, if you go along without skepticism it will definitely make garbage.
Fun story: at a previous job a Product Manager made someone work a full week on a QR-Code standard that doesn't exist, except in ChatGPT's mind. It produced test cases and examples, but since nobody had a way to test
When it was sent to a bank in Sweden to test, the customer was just "wait this feature doesn't exist in Sweden" and a heated discussion ensued until the PM admitted using ChatGPT to create the requirements.
colinmorelli · 4h ago
Related similar thing when I sent my dog's recent bloodwork to an LLM, including dates, tests, and values. The model suggested that an advancement in her kidney values (all still within normal range) were likely evidence of chronic kidney disease in its early stage. Naturally this caused some concern for my wife.
But, I work in healthcare and have enough knowledge of health to know that CKD almost certainly could not advance fast enough to be the cause of the kidney value changes in the labs that were only 6 weeks apart. I asked the LLM if that's the best explanation for these values given they're only 6 weeks apart, and it adjusted its answer to say CKD is likely not the explanation as progression would happen typically over 6+ months to a year at this stage, and more likely explanations were nephrotoxins (recent NSAID use), temporary dehydration, or recent infection.
We then spoke to our vet who confirmed that CKD would be unlikely to explain a shift in values like this between two tests that were just 6 weeks apart.
That would almost certainly throw off someone with less knowledge about this, however. If the tests were 4-6 months apart, CKD could explain the change. It's not an implausible explanation, but it skipped over a critical piece of information (the time between tests) before originally coming to that answer.
osigurdson · 1h ago
The internet, and now LLMs have always been bad at diagnosing medical problems. I think it comes from the data source. For instance, few articles would be linked to / popular if a given set of symptoms were just associated with not getting enough sleep. No, the articles stand out are the ones where the symptoms are associated with some rare / horrible condition. This is our LLM training data which are often missing the entire middle part of the bell curve.
colinmorelli · 1h ago
For what it's worth this statement is actually not entirely correct anymore. Top-end models today are on par with diagnostic capabilities of physicians on average (across many specialties), and, in some cases, can outperform them when RAG'd in with vetted clinical guidelines (like NIH data, UpToDate, etc)
However, they do have particular types of failure modes that they're more prone to, and this is one of them. So they're imperfect.
osigurdson · 1h ago
This is ChatGPT's self assessment. Perhaps you mean a specialized agent with RAG + evals however.
ChatGPT is not reliable for medical diagnosis.
While it can summarize symptoms, explain conditions, or clarify test results using public medical knowledge, it:
• Is not a doctor and lacks clinical judgment
• May miss serious red flags or hallucinate diagnoses
• Doesn’t have access to your medical history, labs, or physical exams
• Can’t ask follow-up questions like a real doctor would
colinmorelli · 58m ago
Sorry, I should have clarified, but no this is not ChatGPT's self assessment.
I am suggesting that today's best in class models (Gemini 2.5 Pro and o3, for example), when given the same context that a physician has access to (labs, prior notes, medication history, diagnosis history, etc), and given an appropriate eval loop, can achieve similar diagnostic accuracy.
I am not suggesting that patients turn to ChatGPT for medical diagnosis, or that these tools are made available to patients to self diagnose, or that physicians can or should be replaced by an LLM.
But there absolutely is a role for an LLM to play in diagnostic workflows to support physicians and care teams.
c22 · 2h ago
This same phenomenon is true for classic search engines as well. Whenever I am becoming informed on a new topic my first searches are always very naive and targeted just at discovering the relevant jargon that will let me make better searches. It turns out that many disciplines contain analogue concepts, just with different words being used to describe them. Understanding the domain specific language used is more than half the battle.
skydhash · 1h ago
If I’m dealing with an unfamiliar domain, my next step is always wikipedia or an introductory book. Just to collect a sets of keywords and references to narrow my future searches. I don’t think I’ve ever asked google a question shaped query.
nothercastle · 2h ago
Yeah I reminder that being the trick to get Google to provide good results was to find some key industry or area terms for what you were looking for. This doesn’t work anymore because Google search has gotten so bad
caust1c · 4h ago
This anecdote corroborates my theory that it will still be critical to become an expert in your field. Everyone is treating AI like it's a zero-sum game with regards to jobs being "lost" to AI, but the reality is that the best results will come from experts in the field who have the vocabulary and knowledge to get the best answers.
My fear is that people treat AI like an oracle when they should be treating it just like any other human being.
redeye100 · 1h ago
This is just bad design. Or a faulty tool. Why should the job market shift to accommodate this gap in the functioning of LLMs? This is a bug that needs to be fixed.
I have a personal gripe about this bringing an unfinished tool to market and then prophetizing about its usefulness. And how we all better get ready for it. This seems very hand-wavey and is looking more and more like vaporware.
It's like trying to quickly build a house on an unfinished foundation. Why are we rushing to build? Can't we get the foundational things right first?
xnorswap · 3h ago
People treat certain humans, or humans in certain roles, as oracles too.
lblume · 3h ago
What percentage of people are actually experts at their jobs though?
jimbokun · 23m ago
So LLMs need an ELI5 mode and detect when to use it. This would use the non-technical terminology but pull concepts from the more jargony model.
This may or may not be easily possible by tweaking current training techniques. But it shows the many edge cases still needed to be addressed by AI models.
aleph_minus_one · 3h ago
For the sake of discussion I want to play devil's advocate concerning your point
>
It was an interesting result to me because it shows that experts in a field are not only more likely to recognize when a model is giving incorrect answers but they're also more likely to get correct answers because they are able to tap into a set of weights that are populated by text that knew what it was talking about. Lay people trying to use an LLM to understand an unfamiliar field are vulnerable to accidentally tapping into the "amateur" weights and ending up with an answer learned from random Reddit threads or SEO marketing blog posts, whereas experts can use jargon correctly in order to tap into answers learned from other experts.
Couldn't it be the case that people who (in this case recognizable to the AI by their choice of wording) are knowledgeable in the topic need different advise than people who know less about the topic?
To give one specific examples from finance: if you know a lot about finance, getting some deep analysis and advice about what is the best way to trade some exotic options is likely sound advice. On the other hand, for people who are not deeply into finance the best advice is likely rather "don't do it!".
lolinder · 3h ago
In some cases, sure, but not here—neither option had more risk associated with it than the other, it was just an optimization problem. The first answer that the model gave to my wife was just wrong about the math, with no room for subjectivity.
dogleash · 56m ago
> Couldn't it be the case [...] need different advise than people who know less about the topic?
> for people who are not deeply into finance the best advice is likely rather "don't do it!".
Oh boy, more nanny software. This future blows.
aleph_minus_one · 17m ago
> Oh boy, more nanny software. This future blows.
I think this topic is a little bit more complicated: this is rather a balancing of the model between
1. "giving the best possible advice to the respective person given their circumstances" vs
2. "giving the most precise answer to the query to the user"
(if you ask me: the best decision would in my opinion be to give the user a choice for this, but this would be overtaxing to many users)
- Freedom-loving people will hate it if they don't get 2
- On the other hand, many people would like to actually get the advice that is most helpful to them (i.e. 1), and not the one that may answer their question exactly, but is likely a bad idea for them
dogleash · 36s ago
Of course, everything can always be more complicated. For example:
1. The AI will never know the user well enough to predict what will be best for them. It will resort to treating everybody like children. In fact, many of the crude ways LLMs currently steer and censor infantilizing.
2. The user's own benefit vs. "for your own good" as defined by a product vendor is a scam that vendors have perpetrated for ages. Even the extremely unsubtle version of it has a bunch of stooges cheering them on. Things will not improve in the users' favor when it's harder to notice, and easier to pretend they're not doing it.
3. A bunch of Californians using the next wave of tech to spread cultural imperialism is better than China, I guess. But why are those my options?
cjohnson318 · 4h ago
I had a similar experience. I did some back of the envelope math and my wife suggested I run it through ChatGPT. After actually doing the math, I felt a lot better about my understanding of the problem and I just... don't trust an LLM to understand algebra. Yeah, they're awesome most of the time, but I don't want to trust it with something important, be wrong, and then have to explain to someone that I trusted the opinion of a couple of big matrices, over my own knowledge and experience, on a high school word problem.
aaronbaugher · 3h ago
I've noticed Grok struggles with dates and relative time, especially when referring to things it "remembers" from earlier conversations. Even a phrase like "last night" will be wrongly interpreted sometimes. So although I've had it research numbers and create estimates for me, I wouldn't just assume the numbers are right without checking everything.
lblume · 3h ago
Since LLMs are basically linear algebra all the way down, this is vaguely reminiscent of how human brains also have a very hard time understanding neural circuitry despite literally being made from it.
amelius · 4h ago
Maybe try a 2-step approach: first ask the LLM to translate your question into expert-language, then ask that question :)
aleph_minus_one · 56m ago
The expert's answer when asked in expert language is "it's complicated". :-)
disambiguation · 4h ago
This is my experience with prompting as well, but I struggle to describe it adequately. Something like the "direction" of the prompt, if it's too open ended you're likely to get mixed results, but if you give it a kind of "running start" it performs much better.
chriskanan · 4h ago
What I do is to always set the context by giving the my "background" and some papers as reading material such that I've conditioned the model for whatever topic that will be discussed as the first step.
xnorswap · 4h ago
Using LLM's in any unfamiliar context is dangerous. When asking them anything, always follow up with: "Are you sure?".
Far too often it'll cheerily apologise and correct their own answer.
aaronbaugher · 3h ago
I asked Grok for advice on a personal decision I was making. It suggested I do A. I said I was leaning toward not-A, and explained why. Then it said I should do not-A. I asked why, if not-A was the best choice, it hadn't said that in the first place. Was it unable to think of not-A before I mentioned it? It said no, it had considered not-A from the start, but thought A was the best choice for me until I gave it more context.
That reminded me how important it is to give it the full parameters and context of my question, including things you could assume another human being would just get. It also has a sort of puppy-dog's eagerness to please that I've had to tell it not to let get in the way of objective analysis. Sometimes the "It's awesome that you asked about that" stuff verges on a Hitchhiker's Guide joke. Maybe that's what they were going for.
daveguy · 1h ago
It should have told you it has no concept of A or not-A, doesn't perform anything close to thinking, and was just picking the most likely words to follow the prompt and context window both times. The people who program it weren't "going for" anything as they couldn't "go for" any specific response. And the model has no concept of self, jokes, or even perception by an outside entity. But it will pick phrases that mimic its training set and those might happen to be about any one of those topics.
jon-wood · 3h ago
Often it'll even apologise and correct it's own answer despite the original answer being correct, because you just primed it's model to believe it was wrong.
jononor · 3h ago
Not necessarily correct the answer! Sometimes it just changes it, insists that it is correct - but it might still be wrong (perhaps now in a new way)...
zehaeva · 2h ago
I wonder how you are to tell when it does give you the correct answer, and one asks "Are you sure?" and it cheerily apologizes and give you a new incorrect answer.
zahlman · 1h ago
... But does it even actually try to evaluate the veracity of what it just output? Or is it merely modelling the idea that being asked "Are you sure?" is a reason for self-doubt?
xnorswap · 1h ago
Do LLMs "evaluate the veracity" of anything? That's not really how they work.
klabb3 · 3h ago
First, these tips and tricks that ”works for me” aren’t universal, it may or may not work, there’s literally no way to tell other than to run large scale empirical experiments, otherwise it’s just mythbuilding.
Secondly, it can be even worse. I’ve been ”gaslighted” when pressing on answers I knew were incorrect (in this case, cryptography). It comes up with extremely plausibly sounding arguments, specifically addressing my counterpoints, and even chain of reasoning, yet still isn’t correct. You’d have to be a domain expert to tell it’s wrong, at which point it makes no sense to use LLMs in the first place.
xnorswap · 3h ago
Oh for sure, asking "Are you sure?" isn't a trick to improve accuracy, it's a trick to show people the danger of asking it in the first place.
It just leaves you with two contradictory statements, much like the man with two watches who never knows the correct time.
thunky · 4h ago
I'd like to see the before and after questions because it seems possible that the layman's version was less exact and therefore interpreted differently, even if the intention was the same. Which, can happen with humans too.
lolinder · 4h ago
Given the topic I'm unfortunately not comfortable sharing the details in a public space like this, but the answer that it gave was not just a misinterpretation of the question, it was actually entirely wrong on the merits of its own interpretation.
And even if it were a misinterpretation the result is still largely the same: if you don't know how to ask good questions you won't get good answers, which makes it dangerous to rely on the tools for things that you're not already an expert in. This is in contrast to all to people who claim to be using them for learning about important concepts (including lots of people who claim to be using them as financial advisors!).
loveparade · 3h ago
If you don't know how to ask a human doctor a good question you can't expect to get a good answer either.
The difference is that a human doctor probably has a lot of context about you and the situation you're in, so that they probably guess what your intention behind the question is, and adjust their answer appropriately. When you talk to an LLM, it has none of that context. So the comparison isn't really fair.
Has your mom ever asked you a computer question? Half of the time the question makes no sense and explaining to her why would take hours, and then she still wouldn't get it. So the best you can do is guess what she wants based on the context you have.
deadbabe · 4h ago
Doesn’t matter. The LLM’s job should be to deliver results in the way that is intended, regardless of the skill of the prompter. If it can’t do that then it’s really no better than a Google search.
osigurdson · 3h ago
That is a pretty high bar. Humans aren't any better than a Google search based on this criteria.
deadbabe · 3h ago
An expert human can give you the answer you need with the same layman prompt, without errors.
osigurdson · 1h ago
You would first have to find the expert however, which might not be trivial. Anyway, I think there is value in the space between a basic google search and a human expert. If you don't think so that is fine.
conception · 4h ago
Can you do a quick experiment and ask ChatGPT to craft a prompt to answer “laymen question “ as a financial expert? Having ChatGPT craft its own prompts is usually pretty successful for me.
lolinder · 4h ago
That probably would work most of the time, but that's also an example of the phenomenon that TFA is talking about: you can't safely just use these tools without becoming an expert at least in the tool. The way they're currently being sold as totally accessible to everyone is dangerous.
diggan · 4h ago
> without becoming an expert at least in the tool
Yeah, we're basically repeating the "search engine/query" problem but slightly differently. Using a search engine the right way always been a skill you needed to learn, and the ones who didn't always got poor results, and many times took those results at face value. Then Google started giving "answers" so if your query is shit, the "answer" most likely is too.
Point is, I don't think this phenomenon is new, it's just way less subtle today with LLMs, at least for people who have expertise in the subjects.
solarwindy · 3h ago
> On two occasions I have been asked, ’Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
osigurdson · 3h ago
Understanding the terminology can help a lot. The key is to start the LLM conversation like this, ask it what terminology is used in a particular problem domain and then frame your actual question based on this.
lostmsu · 2h ago
A couple more alternative explanations: 1. random - sample size is too small to claim otherwise 2. phrasing in both cases had a leaning in it, and as a "yes man" LLM gave corresponding biased response.
bdangubic · 4h ago
very interesting - could you share your wife's and yours prompt to provide more concrete context?
lolinder · 4h ago
I considered it, but given the personal nature of the topic I want to keep the details off the public internet. I realize that makes my observation less specific than it could be, but I think the result should be relatively unsurprising to anyone familiar with the tools—this is just a natural consequence of the way the stats fall out.
I love reading, I enjoy long-form articles, but I really wish technical bloggers especially would practice distilling their point into shorter posts. I notice it a lot with (older) scott alexander articles, this implicit assumption that your writing is informative/entertaining enough that you can stretch a simple idea to many pages.
I want to reitterate that I don't want dull, minimal, writing. I don't subscribe to the "reduce your wordcount until it can't be reduced any further" style of writing advice. I just think that many people have very similar ideas about ai (and written very similar things), and if you have something to say that you haven't seen expressed before, it is worthwile (imo) to express it without preamble.
stego-tech · 4h ago
As someone with this very style (my own blog posts often rise into the 5k word range) and also from a technical background, I can at least explain my motivations for length: absolute domination of the argument.
In professional settings, brevity is often mistaken for inexperience or a weak position. As the thinking goes, a competent engineer should be able to defend every position they take like a PhD candidate defending their dissertation. At the same time, however, excess verbosity is viewed as distinctly “cold” and “engineer” in tone, and frowned upon by non-technical folks in my experience; they wanted an answer, not an explainer.
The problem is that each of us have the data points of what succeeds in convincing others: the longer argument, every single time. Thus we use it in our own writing because we want to convince the imagined reader (as well as ourselves) that our position is correct, or at the very least, sound. In doing so we write lengthy posts, while often doing research to validate our positions with charts, screenshots, Wikipedia articles, news sources, etc. It’s as much about convincing ourselves as it is other readers, hence why we go for longer posts based on real world experiences.
One plot twist subjective to me: my lengthy posts are also about quelling my brain, in a very real sense. It is the reader, and if I do not get everything out of my head about that topic and onto “paper”, it will continue to dwell and gnaw on the missed points in perpetuity. Thus, 5k posts about things like the inefficiency of hate in Capital or a Systems Analysis of American Hegemony, just so I can have peace and quiet in my own head by getting it completely out of said head.
collinmcnulty · 4h ago
As someone who similarly writes to think, I found a lot of insight from this video [0] from the University of Chicago. Long story appropriately short, he recommends writing something twice: once for yourself and once for the reader.
I don't recall if this is covered in the video, but here are two pitfalls I have noticed from my own attempts:
1) If I am considering possible objections to my position, I have to be very clear which points I am raising only for the sake of argument, and which are the ones I am actually advocating for, or else it will appear confused or self-contradictory.
A related issue is to preempt possible objections to the point where the reader might lose track of the main issue.
2) After making several passes to hone my position, it can seem so obvious to me that what I write for the reader is too terse for anyone who is approaching the issue for the first time.
stego-tech · 2h ago
That approach has helped me immensely in my communications, but less so for blog posts. I think it’s because I’ve fully internalized writing in my downtime as writing for myself first, and I just like longer, in-depth reads as a personal preference.
542354234235 · 2h ago
Also, the internet in particular has a tendency to go out of its way to interpret things in the least charitable way possible. If there is a way to take anything you said negatively, or hyper literally, or any other way to misinterpret your intention, it will. So you tend to assume a bad-faith reading and preemptively explain/respond to possible nitpicks.
keiferski · 5h ago
Ironically this is one of the best use cases I’ve found for AI tools at the moment.
Summarize and critique this argument in a series of bullet points.
More seriously though, I think there is a lack of rigorous thinking about AI specifically and technology in general. And hence you get a lot of these rambling thought-style posts which are no doubt by intelligent people with something compelling to say, but without any fundamental method for analyzing those thoughts.
Which is why I really recommend taking a course in symbolic logic or analytic philosophy, if you are able to. You’ll quickly learn how to communicate your ideas in a straightforward, no nonsense manner.
SilverSlash · 2h ago
> Which is why I really recommend taking a course in symbolic logic or analytic philosophy, if you are able to. You’ll quickly learn how to communicate your ideas in a straightforward, no nonsense manner.
Do you have any free online course recommendations?
keiferski · 2h ago
I haven’t taken any online courses unfortunately (took them in person in college) but for symbolic logic I recommend the book by Klenk. I used that in my course and found it to be a good intro.
There are a bunch of lectures on YouTube about analytic philosophy though, and from a quick look they seem solid.
edent · 3h ago
The problem is, unless you expand every point then some jerk on HN will nit-pick your "logical fallacies".
I find myself writing longer and more defensively because lots of people don't understand nuance or subtext. Forget hyperbole or humour - lots of technical readers lack the ability to understand them.
Finally, editing is hard work. Revising and refining a document often takes several times longer than writing the first draft.
divan · 4h ago
I wish it was a skill easy to learn but it's not.
disambiguation · 3h ago
I mean it's their blog, the writing is just as much for them as it is for you.
decimalenough · 5h ago
I'm generally quite skeptical of AI, but this overstates its case. Two things stand out:
> what LLMs do is string together words in a statistically highly probable manner.
This is not incorrect, but it's no longer a sufficient mental model for reasoning models. For example, while researching new monitors today, I told Gemini to compare $NEW_MODEL_1 with $NEW_MODEL_2. Its training data did not contain information about either model, but it was capable of searching the Internet to find information about both and provide me with a factual (and, yes, I checked, accurate) comparison of the differences in the specs of the models as well as a summary of sentiment for reliability etc for the two brands.
> Currently available software may very well make human drivers both more comfortable and safe, but the hype has promised completely autonomous cars reliably zipping about in rush hour traffic.
And this is already not hype, it's reality anywhere Waymo operates.
lolinder · 4h ago
To have a good mental model for modern AI agents you have to understand both the LLM and the other stuff that's built up around it. OP is correct about the behavior of LLMs, and that is valuable information to keep in mind. Then you layer on top of that an understanding that some implementations of agents will sometimes automatically feed search results into context, if you ask them to or are paying for an advanced tier or whatever the extra qualifications are for your particular tool.
If you skip this two-part understanding then you run the risk of missing when the agent decided not to do a search for some reason and is therefore entirely dependent on statistical probability in the training data. I've personally seen people without this mental model take an LLM at its word when it was wrong because they'd gotten used to it looking things up for them.
jvanderbot · 4h ago
Defending TFA a little ... the rest of the article builds up context around the word "Completely", so that the single example of "Highway zipping" is not just what is being discussed.
"Completely" here should be expanded to include all the unique and unforseen circumstances a driver might encounter, such as a policeman directing traffic manually or any other "soft" situation that is not well represented in training.
Not to mention the somewhat extreme amount of apriori and continuous mapping that goes into operating a fleet of AVs. That is hardly to be considered "Completely autonomous".
This isn't just pedantry, the disconnect between a technical person's deep understanding and a common user's everyday experience is pretty much what the article hinges on. Try taking a Waymo from SF to NYC. This seems like something a "Completely autonomous" car should be able to do given a layperson's understanding of "Completely", without the experts' long list of caveats.
zahlman · 1h ago
> For example, while researching new monitors today, I told Gemini...
You told an agent, not just an LLM.
> And this is already not hype, it's reality anywhere Waymo operates.
> For example, while researching new monitors today, I told Gemini to compare $NEW_MODEL_1 with $NEW_MODEL_2.
But this feature was a staple of most online shops that sell monitors and a bunch of "review" sites. You don't need a highly complex system to compare 2 monitors, you need a spreadsheet.
sceptic123 · 3h ago
Waymo only works because it's geofenced — that's a massive barrier to "completely autonomous" (or level 5 automation)
agumonkey · 3h ago
> what LLMs do is string together words in a statistically highly probable manner.
I guess the article fails to admit that when you have billions of connected points in a vector space, "stringing together" is not simply "stringing together". I'm not a fanboy but somehow GPT/attention based logic is capable of parsing input and data then remodeling it in depths that are surprising.
belter · 5h ago
Waymo operates on highly controlled and mapped environments. Can they handle Rome or Mumbai?
whynotminot · 4h ago
Why is that the standard? I, a human, can’t handle driving in Mumbai.
And lol at anyone who thinks any urban driving environment is “highly controlled”.
jogjayr · 2h ago
I, a human who learned to drive in Mumbai, can't handle driving in Mumbai anymore.
Reubachi · 3m ago
some things in "Car-culture" that surprised me on my trips to india;
1. People do not as a matter of their daily complaints complain about bad traffic, bad drivers, dents, door dings etc.
2. There are less accidents per capita than US
3. Insurance is required, body shops work better than in US.
4. Electrification of tuk-tuk fleet is....impressive.
Waymo/Autonomos driving would drastically slow down most of transporation infrastructure in most of the world. I don't think waymo should spend billions figuirng out how to drive better than Indians.
rad_gruchalski · 4h ago
> Why is that the standard? I, a human, can’t handle driving in Mumbai.
Aren’t you confusing “navigating” vs “driving”?
decimalenough · 4h ago
In Indian traffic, navigating is the least of your worries.
petesergeant · 2h ago
The Western mind simply cannot fathom what some of our Eastern brothers and sisters have managed to achieve on the roads.
I remember the first time I went to visit my father in Kathmandu what his address was, and he patiently explained to me the street he lived on simply had no name or unique identifier. Or driving for the first time in Vietnam and being inducted into a traffic system where your only responsibility is the cone in front of you of things you can see. Or the terror of realizing that your second taxi driver of the day in Bangkok is literally on speed.
All this to say: no, I assume he means he can’t handle driving in Mumbai.
jogjayr · 2h ago
Waymo works where it works and it's useful where it works. Can a Mumbai autorickshaw handle an American freeway? Does that make it a pointless vehicle?
smus · 4h ago
Have you been to San Francisco?
riehwvfbk · 3h ago
Compared to driving in the developing world though, SF traffic is very structured and very tame.
My favorite data point here is Cairo: the sound of traffic there is horns blaring and metal-on-metal. Driving in Cario is a contact sport. And it doesn't seem to matter how nice a car is: a fancy Mercedes will have as many body dents as a rust bucket Lada.
colinmorelli · 5h ago
This feels a lot like "moving the goalposts." First, it was complete science fiction to have technology in the car. Then, it was in the car, but it could only do navigation and music, it can't operate the car the way humans can. Then, it can prevent you from weaving out of your lane, and it can stop the car if you're about to crash into something, but it can't help you with your commute. Then, it can speed up, slow down, and steer on the highway, but it can't take you door to door. Now, it can take you door to door, but only in certain environments, it can't do it everywhere.
All of the above happened over the last ~20 year or so. The progression clearly seems to point to this being more than hype, even if it takes us longer to realize than originally anticipated.
thesuitonym · 3h ago
It's not really moving the goalposts, though. The idea of a self driving car has always been "I can get in my car, tell it where I want to go, and then it goes there while I read a book."
Having navigation and music, and lane assist, and adaptive cruise control, and some cars that can operate autonomously in some environments is great, but it's not what we meant when we said self driving cars.
colinmorelli · 1h ago
The point is not that those things were meant when we said self driving cars. It's that, at every step along the way, there were a group of people who doubted that cars could do that thing, and then they did that thing. And then the thing we said they can't do changed to something else.
Today, you absolutely can "get in a car, tell it where you want to go, and it goes there while you read a book" - it's literally what Waymo is and has been doing. And now we're saying it can't do it in Mumbai, so it's still not self-driving.
At some point, the distinction seems pointless. We are undeniably continuing to make progress on the road to autonomous driving, and it does work in certain scenarios today. To suggest things are slowing down because we haven't met the most reason interpretation of the words is neither helpful nor correct.
zahlman · 1h ago
> It's that, at every step along the way, there were a group of people who doubted that cars could do that thing
...Can you cite that?
> And then the thing we said they can't do changed to something else.
...And they were the same people?
> We are undeniably continuing to make progress
Where did anyone deny this?
> To suggest things are slowing down
Where did anyone make this argument?
The quote from TFA:
> but the hype has promised completely autonomous cars reliably zipping about in rush hour traffic.
The author did not restrict that to SF, and is presumably referring to "hype" that "promised" this globally.
colinmorelli · 40m ago
You conveniently left out the first part of the sentence you quoted:
> Currently available software may very well make human drivers both more comfortable and safe...
Which is objectively not what Waymo does, and whether intentional or not, invalidates the progress that has been made.
Also, immediately preceding that:
> Driverless vehicles in closed systems have been in use for a long time.
Which is also not what current frontier self driving technology is.
> Where did anyone make this argument?
The title of the article is quite literally "Is Winter Coming?"
gilleain · 4h ago
Not so much moving the goalposts as pointing out that playing American football is not like playing soccer (football - that is, driving in Rome) or even cricket (Mumbai).
In fact, cricket doesn't even _have_ goalposts, it has wickets. Driving in cities outside North America is very different.
colinmorelli · 4h ago
I'm not sure I get your analogy here. If you're suggesting that it's not "moving the goalposts" because it's pointing out that driving in Rome or Mumbai is different than driving in North America, then that is exactly what is meant by moving the goalposts.
10 years ago the claim was that "cars can't drive autonomously," Waymo quietly chips away to the point that they absolutely can drive autonomously, even in an unpredictable environment (with evidently drastically lower-than-human accident rates, for example), and the reaction of those original people is to say "yeah but it can't drive in [even more complex place]"
Sure, that's not exactly surprising. We generally don't design technology to do the most complex version of the task it's supposed to do first. We generally start with a simpler scenario it can accomplish and progressively enhance it as we learn more. Cars have been doing that for decades.
So perhaps the tech doesn't work in Mumbai or Rome yet. Maybe we'll advance the tech to do that thing, or maybe we'll come up with a different solution to autonomous driving in these places if we find out it'll be more expensive to advance this technology than it will be to do something else instead. But either way, it's already doing the thing that many, many people claimed it can't do, and those people are now claiming there's something else it can't do. That is the very definition of moving the goalposts.
RetroTechie · 3h ago
> Waymo quietly chips away (..)
Perfect example of the saying: "if you have a big problem, first solve the smaller problems. Then your bigger problem may turn out to be not so big after all".
Current AI is much like that: one 'little' problem after another being solved (or at least, progressing).
xnx · 4h ago
> Driving in cities outside North America is very different.
Describing SF's Tenderloin at night as a "highly controlled environment" would be stretching it.
Anyway, you're moving the goalposts here. Waymo is operating at scale in actual human cities in actual rush hour traffic. Sure, it would struggle in Buffalo during a snowstorm or in Mumbai during the monsoon, but so do human drivers.
hiatus · 3h ago
> Sure, it would struggle in Buffalo during a snowstorm or in Mumbai during the monsoon, but so do human drivers.
We don't expect technology to be on par with human capabilities but to exceed them.
Havoc · 4h ago
The hype cooling down a bit might not be a terrible thing
osigurdson · 1h ago
A different kind of AI winter is already here. This "winter" is associated with companies laying people off and then lazily waiting around for AGI to emerge. This is leading to a kind of malaise that I think will ultimately be bad for economies. It is fine to use any available tool to boost productivity, but magical thinking is not sound management.
wellUc · 1h ago
A predict a lot of circumlocutions about AI but most people not noticing since they blindly follow the TV/politics as-is anyway.
A lot of people (still a tiny proportion of the population) will be loud in opposition but ultimately overwhelmed by the nihilism and indifference of the aggregate.
The loudest will be those who perceive some loss to their own lifestyle that relies on exploiting other’s attention, as AI presents new risk to their attention grabbing behaviors.
Then they will die off and humanity will carry on with AI not them.
Circle of life Simba.
antirez · 3h ago
Orthogonal: the lemons in the picture, from Palermo (Sicily), could not only being lemon or lemon-shaped soap, but also a sweet, our very famous "frutta martorana": https://en.wikipedia.org/wiki/Frutta_martorana
qudat · 3h ago
While reading this article, I kept asking myself the question: "Why can't LLM ask us follow up questions?"
lblume · 3h ago
They absolutely can if you prompt them to. You can even add it to your system prompt for it to happen in every new conversation!
esjeon · 3h ago
AI winters will keep coming as long as the definition of AI stays relative. We used to call chess programs chess "AIs", but hardly anyone says that anymore. We call LLMs "AIs" now, but let's be real: a few decades from now, we'll probably be calling them token predictors, while some shiny new "AIs" are already out there kicking asses.
At the end of the day, "AI" really just means throwing expensive algorithms at problems we've labeled as "subjective" and hoping for the best. More compute, faster communication, bigger storage, and we get to run more of those algorithms. Odds are, the real bottleneck is hardware, not software. Better hardware just lets us take bolder swings at problems, basically wasting even more computing power on nothing.
So yeah, we’ll get yet another AI boom when a new computing paradigm shows up. And that boom will hit yet another AI winter, because it'll smack into the same old bottleneck. And when that winter hits, we'll do what we've always done. Move the goalposts, lower the bar, and start the cycle all over again. Just with new chips this time.
Ah, Jesus. I should quit drinking Turkish coffee.
rvz · 3h ago
Let's just say that people once thought that Big Tech was once invincible, until it wasn't.
Obviously those exposed in the AI hype will tell you that there is no winter.
Until the music stops and almost little to no-one can make money out of this AI race to zero.
dinfinity · 42m ago
> Let's just say that people once thought that Big Tech was once invincible, until it wasn't.
> Obviously those exposed in the AI hype will tell you that there is no winter.
Go look at how much money was spent on AI R&D in the last AI 'summers' (and winters). Pennies compared to the billions and billions of dollars the private and public sector is throwing at it right now.
Will some investments turn out to be a waste of time and money? Yes.
Will investment be reduced to a fraction of what it is today? Hell no.
The music stops when humans are economically obsolete.
GaggiX · 5h ago
>Spring 2024
Reasoning models like o1 had not yet been released at that time. It's amazing how much progress has been made since then.
Edit: also Search wasn't available as the blog mention "citation"s.
netdevphoenix · 5h ago
The point isn't that progress is not happening but that it's slowing down. You get more of the same, smaller memory footprint, faster responses, less hallucinations, etc. Significant progress would be another deep seek kinda of breakthrough, near 0% hallucination rate, performing like current models with less than half of their dataset, epistemological self awareness (i.e. I am not sure of the correctness of the answer I just gave you, ability to override assumptions from the training dataset, etc).
We are just getting cars of different shapes and colours, with built-in speakers and radio. Not exactly progress
patapong · 4h ago
This assumes that LLMs are only useful if they are AGI. I don't think we do - what we have today is already sufficient to unlock an enormous amount of value, we just haven't done so yet.
Thus, I think we can compare them to electricity - a sophisticated technology with a ton of potential, which will take years to fully exploit, even if there are no more fundamental breakthroughs. But also not the solution to every single problem.
zahlman · 1h ago
Arguably, LLMs - or whatever systems succeed them - are only useful if they are not AGI. Given the evidence already collected about how willing humans are to make these systems "agentive", we pretty well have to worry about the possibility of an AGI using us instead. Even if there's some other logical barrier to recursive self-improvement ("hard takeoff") scenarios.
eisfresser · 4h ago
> another deep seek kinda of breakthrough
That was only six month ago. I don't think this is an argument that things are slowing down (yet).
netdevphoenix · 1h ago
I didn't say that progress stopped only that it is slowing down (ie they become less frequent). Deep seek happening 6 months ago doesn't counter what I said.
pixl97 · 4h ago
>but that it's slowing down
Progress isn't a smooth curve but more step like.
Also, the last 10% of getting AI right is 90% of the work, but it doesn't seem to us humans. I don't think you understand the gigantic impact that last 10% is going to make on the world and how fast it will change things once we accomplish it.
Personally, I hope it takes us a while. We're not ready for this as a society and planet.
GaggiX · 4h ago
Reasoning models like o1 are on a whole new level compared to previous models, for example they are incredible at math, something that previous models struggled with a lot, this seems pretty huge to me as the performance of previous models was flattening, it's a kinda a new paradigm.
empath75 · 1h ago
The consequences of a new thing being invented are not entirely dependent on progress of innovation in the thing itself. It takes quite a long time for people to build _on top of_ a new technology. LLMs could not appreciably improve at all, and we've still barely scratched the surface of applying what we have.
Your car example is a perfect one -- society was _completely reordered_ around the car, even though the fundamental technology behind the car didn't change from the early 20th century until the invention of the electric car.
netdevphoenix · 1h ago
Surely, finding new applications to existing tech can't be considered progress in the development of that tech
Jedd · 4h ago
That's also one of those things that would probably confuse LLMs as readily as it confuses North Americans (for much the same reason - training).
Spring 2024 for me was from the 1st of September to the 30th of November.
barbazoo · 1h ago
Only about 10% of earths population live in the southern hemisphere. It’s pretty fair to assume northern.
GaggiX · 4h ago
What's the confusion here? The author is from Sweden, also neither I nor the author are North Americans.
Jedd · 4h ago
While one half of the planet is having spring, the other half is having autumn.
GaggiX · 4h ago
Yeah that's something you generally learn when you are a kid.
Jedd · 3h ago
We should catch up next autumn for a quiet ale to talk about the ambiguity of that date format.
She showed me the result and I immediately saw the logical flaws and pointed them out to her. She pressed the model on it and it of course apologized and corrected itself. Out of curiosity I tried the prompt again, this time using financial jargon that I was familiar with and my wife was not. The intended meaning of the words was the same, the only difference is that my prompt sounded like it came from someone who knew finance. The result was that the model got it right and gave an explanation for the reasoning in exacting detail.
It was an interesting result to me because it shows that experts in a field are not only more likely to recognize when a model is giving incorrect answers but they're also more likely to get correct answers because they are able to tap into a set of weights that are populated by text that knew what it was talking about. Lay people trying to use an LLM to understand an unfamiliar field are vulnerable to accidentally tapping into the "amateur" weights and ending up with an answer learned from random Reddit threads or SEO marketing blog posts, whereas experts can use jargon correctly in order to tap into answers learned from other experts.
Also how you ask matters a lot. Sometimes it just wants to make you happy with whatever answer, if you go along without skepticism it will definitely make garbage.
Fun story: at a previous job a Product Manager made someone work a full week on a QR-Code standard that doesn't exist, except in ChatGPT's mind. It produced test cases and examples, but since nobody had a way to test
When it was sent to a bank in Sweden to test, the customer was just "wait this feature doesn't exist in Sweden" and a heated discussion ensued until the PM admitted using ChatGPT to create the requirements.
But, I work in healthcare and have enough knowledge of health to know that CKD almost certainly could not advance fast enough to be the cause of the kidney value changes in the labs that were only 6 weeks apart. I asked the LLM if that's the best explanation for these values given they're only 6 weeks apart, and it adjusted its answer to say CKD is likely not the explanation as progression would happen typically over 6+ months to a year at this stage, and more likely explanations were nephrotoxins (recent NSAID use), temporary dehydration, or recent infection.
We then spoke to our vet who confirmed that CKD would be unlikely to explain a shift in values like this between two tests that were just 6 weeks apart.
That would almost certainly throw off someone with less knowledge about this, however. If the tests were 4-6 months apart, CKD could explain the change. It's not an implausible explanation, but it skipped over a critical piece of information (the time between tests) before originally coming to that answer.
However, they do have particular types of failure modes that they're more prone to, and this is one of them. So they're imperfect.
ChatGPT is not reliable for medical diagnosis.
While it can summarize symptoms, explain conditions, or clarify test results using public medical knowledge, it: • Is not a doctor and lacks clinical judgment • May miss serious red flags or hallucinate diagnoses • Doesn’t have access to your medical history, labs, or physical exams • Can’t ask follow-up questions like a real doctor would
I am suggesting that today's best in class models (Gemini 2.5 Pro and o3, for example), when given the same context that a physician has access to (labs, prior notes, medication history, diagnosis history, etc), and given an appropriate eval loop, can achieve similar diagnostic accuracy.
I am not suggesting that patients turn to ChatGPT for medical diagnosis, or that these tools are made available to patients to self diagnose, or that physicians can or should be replaced by an LLM.
But there absolutely is a role for an LLM to play in diagnostic workflows to support physicians and care teams.
My fear is that people treat AI like an oracle when they should be treating it just like any other human being.
I have a personal gripe about this bringing an unfinished tool to market and then prophetizing about its usefulness. And how we all better get ready for it. This seems very hand-wavey and is looking more and more like vaporware.
It's like trying to quickly build a house on an unfinished foundation. Why are we rushing to build? Can't we get the foundational things right first?
This may or may not be easily possible by tweaking current training techniques. But it shows the many edge cases still needed to be addressed by AI models.
> It was an interesting result to me because it shows that experts in a field are not only more likely to recognize when a model is giving incorrect answers but they're also more likely to get correct answers because they are able to tap into a set of weights that are populated by text that knew what it was talking about. Lay people trying to use an LLM to understand an unfamiliar field are vulnerable to accidentally tapping into the "amateur" weights and ending up with an answer learned from random Reddit threads or SEO marketing blog posts, whereas experts can use jargon correctly in order to tap into answers learned from other experts.
Couldn't it be the case that people who (in this case recognizable to the AI by their choice of wording) are knowledgeable in the topic need different advise than people who know less about the topic?
To give one specific examples from finance: if you know a lot about finance, getting some deep analysis and advice about what is the best way to trade some exotic options is likely sound advice. On the other hand, for people who are not deeply into finance the best advice is likely rather "don't do it!".
> for people who are not deeply into finance the best advice is likely rather "don't do it!".
Oh boy, more nanny software. This future blows.
I think this topic is a little bit more complicated: this is rather a balancing of the model between
1. "giving the best possible advice to the respective person given their circumstances" vs
2. "giving the most precise answer to the query to the user"
(if you ask me: the best decision would in my opinion be to give the user a choice for this, but this would be overtaxing to many users)
- Freedom-loving people will hate it if they don't get 2
- On the other hand, many people would like to actually get the advice that is most helpful to them (i.e. 1), and not the one that may answer their question exactly, but is likely a bad idea for them
1. The AI will never know the user well enough to predict what will be best for them. It will resort to treating everybody like children. In fact, many of the crude ways LLMs currently steer and censor infantilizing.
2. The user's own benefit vs. "for your own good" as defined by a product vendor is a scam that vendors have perpetrated for ages. Even the extremely unsubtle version of it has a bunch of stooges cheering them on. Things will not improve in the users' favor when it's harder to notice, and easier to pretend they're not doing it.
3. A bunch of Californians using the next wave of tech to spread cultural imperialism is better than China, I guess. But why are those my options?
Far too often it'll cheerily apologise and correct their own answer.
That reminded me how important it is to give it the full parameters and context of my question, including things you could assume another human being would just get. It also has a sort of puppy-dog's eagerness to please that I've had to tell it not to let get in the way of objective analysis. Sometimes the "It's awesome that you asked about that" stuff verges on a Hitchhiker's Guide joke. Maybe that's what they were going for.
Secondly, it can be even worse. I’ve been ”gaslighted” when pressing on answers I knew were incorrect (in this case, cryptography). It comes up with extremely plausibly sounding arguments, specifically addressing my counterpoints, and even chain of reasoning, yet still isn’t correct. You’d have to be a domain expert to tell it’s wrong, at which point it makes no sense to use LLMs in the first place.
It just leaves you with two contradictory statements, much like the man with two watches who never knows the correct time.
And even if it were a misinterpretation the result is still largely the same: if you don't know how to ask good questions you won't get good answers, which makes it dangerous to rely on the tools for things that you're not already an expert in. This is in contrast to all to people who claim to be using them for learning about important concepts (including lots of people who claim to be using them as financial advisors!).
The difference is that a human doctor probably has a lot of context about you and the situation you're in, so that they probably guess what your intention behind the question is, and adjust their answer appropriately. When you talk to an LLM, it has none of that context. So the comparison isn't really fair.
Has your mom ever asked you a computer question? Half of the time the question makes no sense and explaining to her why would take hours, and then she still wouldn't get it. So the best you can do is guess what she wants based on the context you have.
Yeah, we're basically repeating the "search engine/query" problem but slightly differently. Using a search engine the right way always been a skill you needed to learn, and the ones who didn't always got poor results, and many times took those results at face value. Then Google started giving "answers" so if your query is shit, the "answer" most likely is too.
Point is, I don't think this phenomenon is new, it's just way less subtle today with LLMs, at least for people who have expertise in the subjects.
[0] https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect
I want to reitterate that I don't want dull, minimal, writing. I don't subscribe to the "reduce your wordcount until it can't be reduced any further" style of writing advice. I just think that many people have very similar ideas about ai (and written very similar things), and if you have something to say that you haven't seen expressed before, it is worthwile (imo) to express it without preamble.
In professional settings, brevity is often mistaken for inexperience or a weak position. As the thinking goes, a competent engineer should be able to defend every position they take like a PhD candidate defending their dissertation. At the same time, however, excess verbosity is viewed as distinctly “cold” and “engineer” in tone, and frowned upon by non-technical folks in my experience; they wanted an answer, not an explainer.
The problem is that each of us have the data points of what succeeds in convincing others: the longer argument, every single time. Thus we use it in our own writing because we want to convince the imagined reader (as well as ourselves) that our position is correct, or at the very least, sound. In doing so we write lengthy posts, while often doing research to validate our positions with charts, screenshots, Wikipedia articles, news sources, etc. It’s as much about convincing ourselves as it is other readers, hence why we go for longer posts based on real world experiences.
One plot twist subjective to me: my lengthy posts are also about quelling my brain, in a very real sense. It is the reader, and if I do not get everything out of my head about that topic and onto “paper”, it will continue to dwell and gnaw on the missed points in perpetuity. Thus, 5k posts about things like the inefficiency of hate in Capital or a Systems Analysis of American Hegemony, just so I can have peace and quiet in my own head by getting it completely out of said head.
[0]: https://www.youtube.com/watch?v=vtIzMaLkCaM
1) If I am considering possible objections to my position, I have to be very clear which points I am raising only for the sake of argument, and which are the ones I am actually advocating for, or else it will appear confused or self-contradictory.
A related issue is to preempt possible objections to the point where the reader might lose track of the main issue.
2) After making several passes to hone my position, it can seem so obvious to me that what I write for the reader is too terse for anyone who is approaching the issue for the first time.
Summarize and critique this argument in a series of bullet points.
More seriously though, I think there is a lack of rigorous thinking about AI specifically and technology in general. And hence you get a lot of these rambling thought-style posts which are no doubt by intelligent people with something compelling to say, but without any fundamental method for analyzing those thoughts.
Which is why I really recommend taking a course in symbolic logic or analytic philosophy, if you are able to. You’ll quickly learn how to communicate your ideas in a straightforward, no nonsense manner.
Do you have any free online course recommendations?
There are a bunch of lectures on YouTube about analytic philosophy though, and from a quick look they seem solid.
I find myself writing longer and more defensively because lots of people don't understand nuance or subtext. Forget hyperbole or humour - lots of technical readers lack the ability to understand them.
Finally, editing is hard work. Revising and refining a document often takes several times longer than writing the first draft.
> what LLMs do is string together words in a statistically highly probable manner.
This is not incorrect, but it's no longer a sufficient mental model for reasoning models. For example, while researching new monitors today, I told Gemini to compare $NEW_MODEL_1 with $NEW_MODEL_2. Its training data did not contain information about either model, but it was capable of searching the Internet to find information about both and provide me with a factual (and, yes, I checked, accurate) comparison of the differences in the specs of the models as well as a summary of sentiment for reliability etc for the two brands.
> Currently available software may very well make human drivers both more comfortable and safe, but the hype has promised completely autonomous cars reliably zipping about in rush hour traffic.
And this is already not hype, it's reality anywhere Waymo operates.
If you skip this two-part understanding then you run the risk of missing when the agent decided not to do a search for some reason and is therefore entirely dependent on statistical probability in the training data. I've personally seen people without this mental model take an LLM at its word when it was wrong because they'd gotten used to it looking things up for them.
"Completely" here should be expanded to include all the unique and unforseen circumstances a driver might encounter, such as a policeman directing traffic manually or any other "soft" situation that is not well represented in training.
Not to mention the somewhat extreme amount of apriori and continuous mapping that goes into operating a fleet of AVs. That is hardly to be considered "Completely autonomous".
This isn't just pedantry, the disconnect between a technical person's deep understanding and a common user's everyday experience is pretty much what the article hinges on. Try taking a Waymo from SF to NYC. This seems like something a "Completely autonomous" car should be able to do given a layperson's understanding of "Completely", without the experts' long list of caveats.
You told an agent, not just an LLM.
> And this is already not hype, it's reality anywhere Waymo operates.
Some beg to differ; see e.g. https://www.youtube.com/watch?v=040ejWnFkj0 .
But this feature was a staple of most online shops that sell monitors and a bunch of "review" sites. You don't need a highly complex system to compare 2 monitors, you need a spreadsheet.
I guess the article fails to admit that when you have billions of connected points in a vector space, "stringing together" is not simply "stringing together". I'm not a fanboy but somehow GPT/attention based logic is capable of parsing input and data then remodeling it in depths that are surprising.
And lol at anyone who thinks any urban driving environment is “highly controlled”.
1. People do not as a matter of their daily complaints complain about bad traffic, bad drivers, dents, door dings etc. 2. There are less accidents per capita than US 3. Insurance is required, body shops work better than in US. 4. Electrification of tuk-tuk fleet is....impressive.
Waymo/Autonomos driving would drastically slow down most of transporation infrastructure in most of the world. I don't think waymo should spend billions figuirng out how to drive better than Indians.
Aren’t you confusing “navigating” vs “driving”?
I remember the first time I went to visit my father in Kathmandu what his address was, and he patiently explained to me the street he lived on simply had no name or unique identifier. Or driving for the first time in Vietnam and being inducted into a traffic system where your only responsibility is the cone in front of you of things you can see. Or the terror of realizing that your second taxi driver of the day in Bangkok is literally on speed.
All this to say: no, I assume he means he can’t handle driving in Mumbai.
My favorite data point here is Cairo: the sound of traffic there is horns blaring and metal-on-metal. Driving in Cario is a contact sport. And it doesn't seem to matter how nice a car is: a fancy Mercedes will have as many body dents as a rust bucket Lada.
All of the above happened over the last ~20 year or so. The progression clearly seems to point to this being more than hype, even if it takes us longer to realize than originally anticipated.
Having navigation and music, and lane assist, and adaptive cruise control, and some cars that can operate autonomously in some environments is great, but it's not what we meant when we said self driving cars.
Today, you absolutely can "get in a car, tell it where you want to go, and it goes there while you read a book" - it's literally what Waymo is and has been doing. And now we're saying it can't do it in Mumbai, so it's still not self-driving.
At some point, the distinction seems pointless. We are undeniably continuing to make progress on the road to autonomous driving, and it does work in certain scenarios today. To suggest things are slowing down because we haven't met the most reason interpretation of the words is neither helpful nor correct.
...Can you cite that?
> And then the thing we said they can't do changed to something else.
...And they were the same people?
> We are undeniably continuing to make progress
Where did anyone deny this?
> To suggest things are slowing down
Where did anyone make this argument?
The quote from TFA:
> but the hype has promised completely autonomous cars reliably zipping about in rush hour traffic.
The author did not restrict that to SF, and is presumably referring to "hype" that "promised" this globally.
> Currently available software may very well make human drivers both more comfortable and safe...
Which is objectively not what Waymo does, and whether intentional or not, invalidates the progress that has been made.
Also, immediately preceding that:
> Driverless vehicles in closed systems have been in use for a long time.
Which is also not what current frontier self driving technology is.
> Where did anyone make this argument?
The title of the article is quite literally "Is Winter Coming?"
In fact, cricket doesn't even _have_ goalposts, it has wickets. Driving in cities outside North America is very different.
10 years ago the claim was that "cars can't drive autonomously," Waymo quietly chips away to the point that they absolutely can drive autonomously, even in an unpredictable environment (with evidently drastically lower-than-human accident rates, for example), and the reaction of those original people is to say "yeah but it can't drive in [even more complex place]"
Sure, that's not exactly surprising. We generally don't design technology to do the most complex version of the task it's supposed to do first. We generally start with a simpler scenario it can accomplish and progressively enhance it as we learn more. Cars have been doing that for decades.
So perhaps the tech doesn't work in Mumbai or Rome yet. Maybe we'll advance the tech to do that thing, or maybe we'll come up with a different solution to autonomous driving in these places if we find out it'll be more expensive to advance this technology than it will be to do something else instead. But either way, it's already doing the thing that many, many people claimed it can't do, and those people are now claiming there's something else it can't do. That is the very definition of moving the goalposts.
Perfect example of the saying: "if you have a big problem, first solve the smaller problems. Then your bigger problem may turn out to be not so big after all".
Current AI is much like that: one 'little' problem after another being solved (or at least, progressing).
Waymo is testing in Japan: https://waymo.com/blog/2025/04/new-beginnings-in-japan
Anyway, you're moving the goalposts here. Waymo is operating at scale in actual human cities in actual rush hour traffic. Sure, it would struggle in Buffalo during a snowstorm or in Mumbai during the monsoon, but so do human drivers.
We don't expect technology to be on par with human capabilities but to exceed them.
A lot of people (still a tiny proportion of the population) will be loud in opposition but ultimately overwhelmed by the nihilism and indifference of the aggregate.
The loudest will be those who perceive some loss to their own lifestyle that relies on exploiting other’s attention, as AI presents new risk to their attention grabbing behaviors.
Then they will die off and humanity will carry on with AI not them.
Circle of life Simba.
At the end of the day, "AI" really just means throwing expensive algorithms at problems we've labeled as "subjective" and hoping for the best. More compute, faster communication, bigger storage, and we get to run more of those algorithms. Odds are, the real bottleneck is hardware, not software. Better hardware just lets us take bolder swings at problems, basically wasting even more computing power on nothing.
So yeah, we’ll get yet another AI boom when a new computing paradigm shows up. And that boom will hit yet another AI winter, because it'll smack into the same old bottleneck. And when that winter hits, we'll do what we've always done. Move the goalposts, lower the bar, and start the cycle all over again. Just with new chips this time.
Ah, Jesus. I should quit drinking Turkish coffee.
Obviously those exposed in the AI hype will tell you that there is no winter.
Until the music stops and almost little to no-one can make money out of this AI race to zero.
Half the world runs on Big Tech. Some of them have cash reserves bigger than the GDP of sizeable countries. They lead in R&D investment: https://www.rdworldonline.com/top-15-rd-spenders-of-2024/
> Obviously those exposed in the AI hype will tell you that there is no winter.
Go look at how much money was spent on AI R&D in the last AI 'summers' (and winters). Pennies compared to the billions and billions of dollars the private and public sector is throwing at it right now.
Will some investments turn out to be a waste of time and money? Yes.
Will investment be reduced to a fraction of what it is today? Hell no.
The music stops when humans are economically obsolete.
Reasoning models like o1 had not yet been released at that time. It's amazing how much progress has been made since then.
Edit: also Search wasn't available as the blog mention "citation"s.
We are just getting cars of different shapes and colours, with built-in speakers and radio. Not exactly progress
Thus, I think we can compare them to electricity - a sophisticated technology with a ton of potential, which will take years to fully exploit, even if there are no more fundamental breakthroughs. But also not the solution to every single problem.
That was only six month ago. I don't think this is an argument that things are slowing down (yet).
Progress isn't a smooth curve but more step like.
Also, the last 10% of getting AI right is 90% of the work, but it doesn't seem to us humans. I don't think you understand the gigantic impact that last 10% is going to make on the world and how fast it will change things once we accomplish it.
Personally, I hope it takes us a while. We're not ready for this as a society and planet.
Your car example is a perfect one -- society was _completely reordered_ around the car, even though the fundamental technology behind the car didn't change from the early 20th century until the invention of the electric car.
Spring 2024 for me was from the 1st of September to the 30th of November.
No comments yet