$30 Homebrew Automated Blinds Opener (2024) (sifter.org)
296 points by busymom0 21h ago 131 comments
Spaced repetition systems have gotten better (domenic.me)
918 points by domenicd 1d ago 477 comments
Is Winter Coming? (2024)
77 rbanffy 84 5/19/2025, 10:50:40 AM datagubbe.se ↗
She showed me the result and I immediately saw the logical flaws and pointed them out to her. She pressed the model on it and it of course apologized and corrected itself. Out of curiosity I tried the prompt again, this time using financial jargon that I was familiar with and my wife was not. The intended meaning of the words was the same, the only difference is that my prompt sounded like it came from someone who knew finance. The result was that the model got it right and gave an explanation for the reasoning in exacting detail.
It was an interesting result to me because it shows that experts in a field are not only more likely to recognize when a model is giving incorrect answers but they're also more likely to get correct answers because they are able to tap into a set of weights that are populated by text that knew what it was talking about. Lay people trying to use an LLM to understand an unfamiliar field are vulnerable to accidentally tapping into the "amateur" weights and ending up with an answer learned from random Reddit threads or SEO marketing blog posts, whereas experts can use jargon correctly in order to tap into answers learned from other experts.
Also how you ask matters a lot. Sometimes it just wants to make you happy with whatever answer, if you go along without skepticism it will definitely make garbage.
Fun story: at a previous job a Product Manager made someone work a full week on a QR-Code standard that doesn't exist, except in ChatGPT's mind. It produced test cases and examples, but since nobody had a way to test
When it was sent to a bank in Sweden to test, the customer was just "wait this feature doesn't exist in Sweden" and a heated discussion ensued until the PM admitted using ChatGPT to create the requirements.
But, I work in healthcare and have enough knowledge of health to know that CKD almost certainly could not advance fast enough to be the cause of the kidney value changes in the labs that were only 6 weeks apart. I asked the LLM if that's the best explanation for these values given they're only 6 weeks apart, and it adjusted its answer to say CKD is likely not the explanation as progression would happen typically over 6+ months to a year at this stage, and more likely explanations were nephrotoxins (recent NSAID use), temporary dehydration, or recent infection.
We then spoke to our vet who confirmed that CKD would be unlikely to explain a shift in values like this between two tests that were just 6 weeks apart.
That would almost certainly throw off someone with less knowledge about this, however. If the tests were 4-6 months apart, CKD could explain the change. It's not an implausible explanation, but it skipped over a critical piece of information (the time between tests) before originally coming to that answer.
My fear is that people treat AI like an oracle when they should be treating it just like any other human being.
> It was an interesting result to me because it shows that experts in a field are not only more likely to recognize when a model is giving incorrect answers but they're also more likely to get correct answers because they are able to tap into a set of weights that are populated by text that knew what it was talking about. Lay people trying to use an LLM to understand an unfamiliar field are vulnerable to accidentally tapping into the "amateur" weights and ending up with an answer learned from random Reddit threads or SEO marketing blog posts, whereas experts can use jargon correctly in order to tap into answers learned from other experts.
Couldn't it be the case that people who (in this case recognizable to the AI by their choice of wording) are knowledgeable in the topic need different advise than people who know less about the topic?
To give one specific examples from finance: if you know a lot about finance, getting some deep analysis and advice about what is the best way to trade some exotic options is likely sound advice. On the other hand, for people who are not deeply into finance the best advice is likely rather "don't do it!".
And even if it were a misinterpretation the result is still largely the same: if you don't know how to ask good questions you won't get good answers, which makes it dangerous to rely on the tools for things that you're not already an expert in. This is in contrast to all to people who claim to be using them for learning about important concepts (including lots of people who claim to be using them as financial advisors!).
The difference is that a human doctor probably has a lot of context about you and the situation you're in, so that they probably guess what your intention behind the question is, and adjust their answer appropriately. When you talk to an LLM, it has none of that context. So the comparison isn't really fair.
Has your mom ever asked you a computer question? Half of the time the question makes no sense and explaining to her why would take hours, and then she still wouldn't get it. So the best you can do is guess what she wants based on the context you have.
Yeah, we're basically repeating the "search engine/query" problem but slightly differently. Using a search engine the right way always been a skill you needed to learn, and the ones who didn't always got poor results, and many times took those results at face value. Then Google started giving "answers" so if your query is shit, the "answer" most likely is too.
Point is, I don't think this phenomenon is new, it's just way less subtle today with LLMs, at least for people who have expertise in the subjects.
Far too often it'll cheerily apologise and correct their own answer.
That reminded me how important it is to give it the full parameters and context of my question, including things you could assume another human being would just get. It also has a sort of puppy-dog's eagerness to please that I've had to tell it not to let get in the way of objective analysis. Sometimes the "It's awesome that you asked about that" stuff verges on a Hitchhiker's Guide joke. Maybe that's what they were going for.
Secondly, it can be even worse. I’ve been ”gaslighted” when pressing on answers I knew were incorrect (in this case, cryptography). It comes up with extremely plausibly sounding arguments, specifically addressing my counterpoints, and even chain of reasoning, yet still isn’t correct. You’d have to be a domain expert to tell it’s wrong, at which point it makes no sense to use LLMs in the first place.
It just leaves you with two contradictory statements, much like the man with two watches who never knows the correct time.
[0] https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect
> what LLMs do is string together words in a statistically highly probable manner.
This is not incorrect, but it's no longer a sufficient mental model for reasoning models. For example, while researching new monitors today, I told Gemini to compare $NEW_MODEL_1 with $NEW_MODEL_2. Its training data did not contain information about either model, but it was capable of searching the Internet to find information about both and provide me with a factual (and, yes, I checked, accurate) comparison of the differences in the specs of the models as well as a summary of sentiment for reliability etc for the two brands.
> Currently available software may very well make human drivers both more comfortable and safe, but the hype has promised completely autonomous cars reliably zipping about in rush hour traffic.
And this is already not hype, it's reality anywhere Waymo operates.
If you skip this two-part understanding then you run the risk of missing when the agent decided not to do a search for some reason and is therefore entirely dependent on statistical probability in the training data. I've personally seen people without this mental model take an LLM at its word when it was wrong because they'd gotten used to it looking things up for them.
"Completely" here should be expanded to include all the unique and unforseen circumstances a driver might encounter, such as a policeman directing traffic manually or any other "soft" situation that is not well represented in training.
Not to mention the somewhat extreme amount of apriori and continuous mapping that goes into operating a fleet of AVs. That is hardly to be considered "Completely autonomous".
This isn't just pedantry, the disconnect between a technical person's deep understanding and a common user's everyday experience is pretty much what the article hinges on. Try taking a Waymo from SF to NYC. This seems like something a "Completely autonomous" car should be able to do given a layperson's understanding of "Completely", without the experts' long list of caveats.
But this feature was a staple of most online shops that sell monitors and a bunch of "review" sites. You don't need a highly complex system to compare 2 monitors, you need a spreadsheet.
I guess the article fails to admit that when you have billions of connected points in a vector space, "stringing together" is not simply "stringing together". I'm not a fanboy but somehow GPT/attention based logic is capable of parsing input and data then remodeling it in depths that are surprising.
And lol at anyone who thinks any urban driving environment is “highly controlled”.
Aren’t you confusing “navigating” vs “driving”?
My favorite data point here is Cairo: the sound of traffic there is horns blaring and metal-on-metal. Driving in Cario is a contact sport. And it doesn't seem to matter how nice a car is: a fancy Mercedes will have as many body dents as a rust bucket Lada.
All of the above happened over the last ~20 year or so. The progression clearly seems to point to this being more than hype, even if it takes us longer to realize than originally anticipated.
Having navigation and music, and lane assist, and adaptive cruise control, and some cars that can operate autonomously in some environments is great, but it's not what we meant when we said self driving cars.
In fact, cricket doesn't even _have_ goalposts, it has wickets. Driving in cities outside North America is very different.
10 years ago the claim was that "cars can't drive autonomously," Waymo quietly chips away to the point that they absolutely can drive autonomously, even in an unpredictable environment (with evidently drastically lower-than-human accident rates, for example), and the reaction of those original people is to say "yeah but it can't drive in [even more complex place]"
Sure, that's not exactly surprising. We generally don't design technology to do the most complex version of the task it's supposed to do first. We generally start with a simpler scenario it can accomplish and progressively enhance it as we learn more. Cars have been doing that for decades.
So perhaps the tech doesn't work in Mumbai or Rome yet. Maybe we'll advance the tech to do that thing, or maybe we'll come up with a different solution to autonomous driving in these places if we find out it'll be more expensive to advance this technology than it will be to do something else instead. But either way, it's already doing the thing that many, many people claimed it can't do, and those people are now claiming there's something else it can't do. That is the very definition of moving the goalposts.
Perfect example of the saying: "if you have a big problem, first solve the smaller problems. Then your bigger problem may turn out to be not so big after all".
Current AI is much like that: one 'little' problem after another being solved (or at least, progressing).
Waymo is testing in Japan: https://waymo.com/blog/2025/04/new-beginnings-in-japan
Anyway, you're moving the goalposts here. Waymo is operating at scale in actual human cities in actual rush hour traffic. Sure, it would struggle in Buffalo during a snowstorm or in Mumbai during the monsoon, but so do human drivers.
We don't expect technology to be on par with human capabilities but to exceed them.
I want to reitterate that I don't want dull, minimal, writing. I don't subscribe to the "reduce your wordcount until it can't be reduced any further" style of writing advice. I just think that many people have very similar ideas about ai (and written very similar things), and if you have something to say that you haven't seen expressed before, it is worthwile (imo) to express it without preamble.
In professional settings, brevity is often mistaken for inexperience or a weak position. As the thinking goes, a competent engineer should be able to defend every position they take like a PhD candidate defending their dissertation. At the same time, however, excess verbosity is viewed as distinctly “cold” and “engineer” in tone, and frowned upon by non-technical folks in my experience; they wanted an answer, not an explainer.
The problem is that each of us have the data points of what succeeds in convincing others: the longer argument, every single time. Thus we use it in our own writing because we want to convince the imagined reader (as well as ourselves) that our position is correct, or at the very least, sound. In doing so we write lengthy posts, while often doing research to validate our positions with charts, screenshots, Wikipedia articles, news sources, etc. It’s as much about convincing ourselves as it is other readers, hence why we go for longer posts based on real world experiences.
One plot twist subjective to me: my lengthy posts are also about quelling my brain, in a very real sense. It is the reader, and if I do not get everything out of my head about that topic and onto “paper”, it will continue to dwell and gnaw on the missed points in perpetuity. Thus, 5k posts about things like the inefficiency of hate in Capital or a Systems Analysis of American Hegemony, just so I can have peace and quiet in my own head by getting it completely out of said head.
[0]: https://www.youtube.com/watch?v=vtIzMaLkCaM
Summarize and critique this argument in a series of bullet points.
More seriously though, I think there is a lack of rigorous thinking about AI specifically and technology in general. And hence you get a lot of these rambling thought-style posts which are no doubt by intelligent people with something compelling to say, but without any fundamental method for analyzing those thoughts.
Which is why I really recommend taking a course in symbolic logic or analytic philosophy, if you are able to. You’ll quickly learn how to communicate your ideas in a straightforward, no nonsense manner.
Do you have any free online course recommendations?
There are a bunch of lectures on YouTube about analytic philosophy though, and from a quick look they seem solid.
I find myself writing longer and more defensively because lots of people don't understand nuance or subtext. Forget hyperbole or humour - lots of technical readers lack the ability to understand them.
Finally, editing is hard work. Revising and refining a document often takes several times longer than writing the first draft.
At the end of the day, "AI" really just means throwing expensive algorithms at problems we've labeled as "subjective" and hoping for the best. More compute, faster communication, bigger storage, and we get to run more of those algorithms. Odds are, the real bottleneck is hardware, not software. Better hardware just lets us take bolder swings at problems, basically wasting even more computing power on nothing.
So yeah, we’ll get yet another AI boom when a new computing paradigm shows up. And that boom will hit yet another AI winter, because it'll smack into the same old bottleneck. And when that winter hits, we'll do what we've always done. Move the goalposts, lower the bar, and start the cycle all over again. Just with new chips this time.
Ah, Jesus. I should quit drinking Turkish coffee.
Obviously those exposed in the AI hype will tell you that there is no winter.
Until the music stops and almost little to no-one can make money out of this AI race to zero.
Reasoning models like o1 had not yet been released at that time. It's amazing how much progress has been made since then.
Edit: also Search wasn't available as the blog mention "citation"s.
We are just getting cars of different shapes and colours, with built-in speakers and radio. Not exactly progress
Thus, I think we can compare them to electricity - a sophisticated technology with a ton of potential, which will take years to fully exploit, even if there are no more fundamental breakthroughs. But also not the solution to every single problem.
Progress isn't a smooth curve but more step like.
Also, the last 10% of getting AI right is 90% of the work, but it doesn't seem to us humans. I don't think you understand the gigantic impact that last 10% is going to make on the world and how fast it will change things once we accomplish it.
Personally, I hope it takes us a while. We're not ready for this as a society and planet.
That was only six month ago. I don't think this is an argument that things are slowing down (yet).
Spring 2024 for me was from the 1st of September to the 30th of November.
No comments yet