An underrated quality of LLMs as study partner is that you can ask "stupid" questions without fear of embarrassment. Adding in a mode that doesn't just dump an answer but works to take you through the material step-by-step is magical. A tireless, capable, well-versed assistant on call 24/7 is an autodidact's dream.
I'm puzzled (but not surprised) by the standard HN resistance & skepticism. Learning something online 5 years ago often involved trawling incorrect, outdated or hostile content and attempting to piece together mental models without the chance to receive immediate feedback on intuition or ask follow up questions. This is leaps and bounds ahead of that experience.
Should we trust the information at face value without verifying from other sources? Of course not, that's part of the learning process. Will some (most?) people rely on it lazily without using it effectively? Certainly, and this technology won't help or hinder them any more than a good old fashioned textbook.
Personally I'm over the moon to be living at a time where we have access to incredible tools like this, and I'm impressed with the speed at which they're improving.
romaniitedomum · 6h ago
> Learning something online 5 years ago often involved trawling incorrect, outdated or hostile content and attempting to piece together mental models without the chance to receive immediate feedback on intuition or ask follow up questions. This is leaps and bounds ahead of that experience.
But now, you're wondering if the answer the AI gave you is correct or something it hallucinated. Every time I find myself putting factual questions to AIs, it doesn't take long for it to give me a wrong answer. And inevitably, when one raises this, one is told that the newest, super-duper, just released model addresses this, for the low-low cost of $EYEWATERINGSUM per month.
But worse than this, if you push back on an AI, it will fold faster than a used tissue in a puddle. It won't defend an answer it gave. This isn't a quality that you want in a teacher.
So, while AIs are useful tools in guiding learning, they're not magical, and a healthy dose of scepticism is essential. Arguably, that applies to traditional learning methods too, but that's another story.
PaulRobinson · 25s ago
Despite the name of "Generative" AI, when you ask LLMs to generate things, they're dumb as bricks. You can test this by asking them anything you're an expert at - it would dazzle a novice, but you can see the gaps.
What they are amazing at though is summarisation and rephrasing of content. Give them a long document and ask "where does this document assert X, Y and Z", and it can tell you without hallucinating. Try it.
Not only does it make for an interesting time if you're in the World of intelligent document processing, it makes them perfect as teaching assistants.
cvoss · 6h ago
> But now, you're wondering if the answer the AI gave you is correct
> a healthy dose of scepticism is essential. Arguably, that applies to traditional learning methods too, but that's another story.
I don't think that is another story. This is the story of learning, no matter whether your teacher is a person or an AI.
My high school science teacher routinely mispoke inadvertently while lecturing. The students who were tracking could spot the issue and, usually, could correct for it. Sometimes asking a clarifying question was necessary. And we learned quickly that that should only be done if you absolutely could not guess the correction yourself, and you had to phrase the question in a very non-accusatory way, because she had a really defensive temper about being corrected that would rear its head in that situation.
And as a reader of math textbooks, both in college and afterward, I can tell you you should absolutely expect errors. The errata are typically published online later, as the reports come in from readers. And they're not just typos. Sometimes it can be as bad as missing terms in equations, missing premises in theorems, missing cases in proofs.
A student of an AI teacher should be as engaged in spotting errors as a student of a human teacher. Part of the learning process is reaching the point where you can and do find fault with the teacher. If you can't do that, your trust in the teacher may be unfounded, whether they are human or not.
tekno45 · 5h ago
How are you supposed to spot errors if you don't know the material?
You're telling people to be experts before they know anything.
filoleg · 5h ago
> How are you supposed to spot errors if you don't know the material?
By noticing that something is not adding up at a certain point. If you rely on an incorrect answer, further material will clash with it eventually one way or another in a lot of areas, as things are typically built one on top of another (assuming we are talking more about math/cs/sciences/music theory/etc., and not something like history).
At that point, it means that either the teacher (whether it is a human or ai) made a mistake or you are misunderstanding something. In either scenario, the most correct move is to try clarifying it with the teacher (and check other sources of knowledge on the topic afterwards to make sure, in case things are still not adding up).
personalyisus · 2h ago
> By noticing that something is not adding up at a certain point.
Ah, but information is presented by AI in a way that SOUNDS like it makes absolute sense if one doesn't already know it doesn't!
And if you have to question the AI a hundred times to try and "notice that something is not adding up" (if it even happens) then that's no bueno.
> In either scenario, the most correct move is to try clarifying it with the teacher
A teacher that can randomly give you wrong information with every other sentence would be considered a bad teacher
tekno45 · 55m ago
Yeah, they're all thinking that everyone is an academic with hotkeys to google scholar for every interaction on the internet.
Children are asking these things to write personal introductions and book reports.
wizzwizz4 · 28m ago
> In either scenario, the most correct move is to try clarifying it with the teacher
A teacher will listen to what you say, consult their understanding, and say "oh, yes, that's right". But written explanations don't do that "consult their understanding" step: language models either predict "repeat original version" (if not fine-tuned for sycophancy) or "accept correction" (if so fine-tuned), since they are next-token predictors. They don't go back and edit what they've already written: they only go forwards. They have had no way of learning the concept of "informed correction" (at the meta-level: they do of course have an embedding of the phrase at the object level, and can parrot text about its importance), so they double-down on errors / spurious "corrections", and if the back-and-forth moves the conversation into the latent space of "teacher who makes mistakes", then they'll start introducing them "on purpose".
LLMs are good at what they do, but what they do is not teaching.
tekno45 · 57m ago
what are children who don't have those skills yet supposed to do?
AlexCoventry · 3h ago
It's possible in highly verifiable domains like math.
ToucanLoucan · 5h ago
> You're telling people to be experts before they know anything.
I mean, that's absolutely my experience with heavy LLM users. Incredibly well versed in every topic imaginable, apart from all the basic errors they make.
ay · 5h ago
My favourite story of that involved attempting to use LLM to figure out whether it was true or my hallucination that the tidal waves were higher in Canary Islands than in Caribbean, and why; it spewed several paragraphs of plausibly sounding prose, and finished with “because Canary Islands are to the west of the equator”.
This phrase is now an inner joke used as a reply to someone quoting LLMs info as “facts”.
teleforce · 5h ago
Please check this excellent LLM-RAG AI-driven course assistant at UIUC for an example of university course [1]. It provide citations and references mainly for the course notes so the students can verify the answers and further study the course materials.
[1] AI-driven chat assistant for ECE 120 course at UIUC (only 1 comment by the website creator):
> you're wondering if the answer the AI gave you is correct or something it hallucinated
Regular research has the same problem finding bad forum posts and other bad sources by people who don't know what they're talking about, albeit usually to a far lesser degree depending on the subject.
bradleyjg · 3h ago
The difference is that llms mess with our heuristics. They certainly aren’t infallible but over time we develop a sense for when someone is full of shit. The mix and match nature of llms hides that.
y1n0 · 2h ago
Yes but that is generally public, with other people able to weigh in through various means like blog posts or their own paper.
Results from the LLM are your eyes only.
reactordev · 3h ago
While true, trial and error is a great learning tool as well. I think in time we’ll get to an LLM model that is definitive in its answer.
gronglo · 6h ago
> But now, you're wondering if the answer the AI gave you is correct or something it hallucinated. Every time I find myself putting factual questions to AIs, it doesn't take long for it to give me a wrong answer.
I know you'll probably think I'm being facetious, but have you tried Claude 4 Opus? It really is a game changer.
physix · 5h ago
A game changer in which respect?
Anyway, this makes me wonder if LLMs can be appropriately prompted to indicate whether the information given is speculative, inferred or factual. Whether they have the means to gauge the validity/reliability of their response and filter their response accordingly.
I've seen prompts that instruct the LLM to make this transparent via annotations to their response, and of course they comply, but I strongly suspect that's just another form of hallucination.
QuantumGood · 6h ago
I often ask first, "discuss what it is you think I am asking" after formulating my query. Very helpful for getting greater clarity and leads to fewer hallucinations.
ramraj07 · 6h ago
What exactly did 2025 AI hallucinate for you? The last time I've seen a hallucination from these things was a year ago. For questions that a kid or a student is going to answer im not sure any reasonable person should be worried about this.
j2kun · 5h ago
If the last time you saw a wrong answer was a year ago, then you are definitely regularly getting them and not noticing.
apparent · 20m ago
ChatGPT hallucinates things all the time. I will feed it info on something and have a conversation. At first it's mostly fine, but eventually it starts just making stuff up.
majormajor · 1h ago
I use it every day for work and every day it gets stuff wrong of the "that doesn't even exist" variety. Because I'm working on things that are complex + highly verifiable, I notice.
Sure, Joe Average who's using it to look smart in Reddit or HN arguments or to find out how to install a mod for their favorite game isn't gonna notice anymore, because it's much more plausible much more often than two years ago, but if you're asking it things that aren't trivially easy for you to verify, you have no way of telling how frequently it hallucinates.
ziotom78 · 1h ago
Just a couple of days ago, I submitted a few pages from the PDF of a PhD thesis written in French to ChatGPT, asking it to translate them into English. The first 2-3 pages were perfect, then the LLM started hallucinating, putting new sentences and removing parts. The interesting fact is that the added sentences were correct and generally on the spot: the result text sounded plausible, and only a careful comparison of each sentence revealed the truth. Near the end of the chapter, virtually nothing of what ChatGPT produced was directly related to the original text.
physix · 5h ago
I had Google Gemini 2.5 Flash analyse a log file and it quoted content that simply didn't exist.
It appears to me like a form of decoherence and very hard to predict when things break down.
People tend to know when they are guessing. LLMs don't.
sigseg1v · 2h ago
Are you using them daily? I find that maybe 3 or 4 programming questions I ask per day, it simply cannot provide a correct answer even after hand holding. They often go to extreme gymnastics to try to gaslight you no matter how much proof you provide.
For example, today I was asking a LLM about how to configure a GH action to install a SDK version that was just recently out of support. It kept hallucinating on my config saying that when you provide multiple SDK versions in the config, it only picks the most recent. This is false. It's also mentioned in the documentation specifically, which I linked the LLM, that it installs all versions you list. Explaining this to copilot, it keeps doubling down, ignoring the docs, and even going as far as asking me to have the action output the installed SDKs, seeing all the ones I requested as installed, then gaslighting me saying that it can print out the wrong SDKs with a `--list-sdks` command.
Avicebron · 5h ago
OpenAI's o3/40 models completely spun out when I was trying to write a tiny little TUI with ratatui, couldn't handle writing a render function. No idea why, spent like 15 minutes trying to get it to work, ended up pulling up the docs..
I haven't spent any money with claude on this project and realistically it's not worth it, but I've run into little things like that a fair amount.
andsoitis · 4h ago
For starters, lots of examples over the last few months where AIs make up stuff when it comes to coding.
For me, most commonly ChatGPT hallucinates configuration options and command line arguments for common tools and frameworks.
noosphr · 5h ago
Two days ago when my boomer mother in law tried to justify her anti-cancer diet that killed Steve Jobs. On the bright side my partner will be inheriting soon by the looks of it.
filoleg · 5h ago
Not defending your mother-in-law here (because I agree with you that it is a pretty silly and maybe even potentially harmful diet), afaik it wasn’t the diet itself that killed Steve Jobs. It was his decision to do that diet instead of doing actual cancer treatment until it was too late.
noosphr · 2h ago
Given that I've got two people telling me here "ackshually" I guess it may not be hallucinations and just really terrible training data.
Up next - ChatGPT does jumping off high buildings kill you?
>>No jumping off high buildings is perfectly safe as long as you land skillfully.
UltraSane · 26m ago
Job's diet didn't kill him. Not getting his cancer treated was what killed him.
noosphr · 21m ago
Yes, we also covered that jumping off buildings doesn't kill people. The landing does.
throwawaylaptop · 5h ago
The anti cancer diet absolutely works if you want to reduce the odds of getting cancer. It probably even works to slow cancer compared to the average American diet.
Will it stop and reverse a cancer? Probably not.
paulryanrogers · 4h ago
I thought it was high fiber diets that reduce risk of cancer (ever so slightly), because of reduced inflammation. Not fruity diets, which are high in carbohydrates.
AppleBananaPie · 4h ago
How much does it 'reduce the odds'?
throwawaylaptop · 1h ago
Idk, I'm not an encyclopedia. You can Google it.
bonzini · 3h ago
Last week I was playing with the jj VCS and it couldn't even understand my question (how to swap two commits).
The correct answer would have been Skarpenords Bastion/kruttårn.
tekno45 · 5h ago
How do you know? its literally non-deterministic.
r3trohack3r · 5h ago
Most (all?) AI models I work with are literally deterministic. If you give it the same exact input, you get the same exact output every single time.
What most people call “non-deterministic” in AI is that one of those inputs is a _seed_ that is sourced from a PRNG because getting a different answer every time is considered a feature for most use cases.
Edit: I’m trying to imagine how you could get a non-deterministic AI and I’m struggling because the entire thing is built on a series of deterministic steps. The only way you can make it look non-deterministic is to hide part of the input from the user.
Unless something has fundamentally changed since then (which I've not heard about) all sparse models are only deterministic at the batch level, rather than the sample level.
This is an incredibly pedantic argument. The common interfaces for LLMs set their temperature value to non-zero, so they are effectively non-deterministic.
throwaway31131 · 5h ago
> I’m trying to imagine how you could get a non-deterministic AI
Depends on the machine that implements the algorithm. For example, it’s possible to make ALUs such that 1+1=2 most of the time, but not all the time.
…
Just ask Intel. (Sorry, I couldn’t resist)
tekno45 · 4h ago
So by default. Its non-deterministic for all non power users.
dyauspitr · 4h ago
No you’re not, it’s right the vast, vast majority of the time. More than I would expect the average physics or chemistry teacher to be.
yieldcrv · 6h ago
If LLMs of today's quality were what was initially introduced, nobody would even know what your rebuttals are even about.
So "risk of hallucination" as a rebuttal to anybody admitting to relying on AI is just not insightful. like, yeah ok we all heard of that and aren't changing our habits at all. Most of our teachers and books said objectively incorrect things too, and we are all carrying factually questionable knowledge we are completely blind to. Which makes LLMs "good enough" at the same standard as anything else.
Don't let it cite case law? Most things don't need this stringent level of review
kristofferR · 6h ago
Agree, "hallucination" as an argument to not use LLMs for curiosity and other non-important situations is starting to seem more and more like tech luddism, similar to the people who told you to not read Wikipedia 5+ years after the rest of us realized it is a really useful resource despite occasional inaccuracies.
majormajor · 1h ago
Fun thing about wikipedia is that if one person notices, they can correct it. [And someone's gonna bring up edit wars and blah blah blah disputed topics, but let's just focus on straightforward factual stuff here.]
Meanwhile in LLM-land, if an expert five thousand miles a way asked the same question you did last month, and noticed an error... it ain't getting fixed. LLMs get RL'd into things that look plausible for out-of-distribution questions. Not things that are correct. Looking plausible but non-factual is in some ways more insidious than a stupid-looking hallucination.
safety1st · 3m ago
Firstly, I think skepticism is a healthy trait. It's OK to be a skeptic. I'm glad there are a lot of skeptics because skepticism is the foundation of inquiry, including scientific inquiry. What if it's not actually Zeus throwing those lightning bolts at us? What if the heliocentric model is correct? What if you actually can't get AIDS by hugging someone who's HIV positive? All great questions, all in opposition to the conventional (and in some cases "expert") wisdom of their time.
Now in regards to LLMs, I use them almost every day, so does my team, and I also do a bit of postmortem and reflection on what was accomplished with them. So, skeptical in some regards, but certainly not behaving like a Luddite.
The main issue I have with all the proselytization about them, is that I think people compare getting answers from an LLM to getting answers from Google circa 2022-present. Everyone became so used to just asking Google questions, and then Google started getting worse every year; we have pretty solid evidence that Google's results have deteriorated significantly over time. So I think that when people say the LLM is amazing for getting info, they're comparing it to a low baseline. Yeah maybe the LLM's periodically incorrect answers are better than Google - but are you sure they're not better than just RTFM'ing? (Obviously, it all depends on the inquiry.)
The second, related issue I have is that we are starting to see evidence that the LLM inspires more trust than it deserves due to its humanlike interface. I recently started to track how often Github Copilot gives me a bad or wrong answer, and it's at least 50% of the time. It "feels" great though because I can tell it that it's wrong, give it half the answer, and then it often completes the rest and is very polite and nice in the process. So is this really a productivity win or is it just good feels? There was a study posted on HN recently where they found the LLM actually decreases the productivity of an expert developer.
So I mean I'll continue to use this thing but I'll also continue to be a skeptic, and this also feels like kinda where my head was with Meta's social media products 10 years ago, before I eventually realized the best thing for my mental health was to delete all of them. I don't question the potential of the tech, but I do question the direction that Big Tech may take it, because they're literal repeat offenders at this point.
vunderba · 16m ago
> An underrated quality of LLMs as study partner is that you can ask "stupid" questions without fear of embarrassment.
Not underrated at all. Lots of people were happy to abandon Stack Overflow for this exact reason.
> Adding in a mode that doesn't just dump an answer but works to take you through the material step-by-step is magical
I'd be curious to know how much this significantly differs from just a custom academically minded GPT with an appropriately tuned system prompt.
The fear of asking stupid questions is real, especially if one has had a bad experience with humiliating teachers or professors. I just recently saw a video of a professor subtly shaming and humiliating his students for answering questions to his own online quiz. He teaches at a prestigious institution and has a book that has a very good reputation. I stopped watching his video lectures.
baby · 5h ago
You might also be working with very uncooperative coworkers, or impatient ones
quietthrow · 3h ago
I agree with all that you say. It’s an incredible time indeed. Just one thing I can’t wrap my mind around is privacy. We all seem to be asking sometimes stupid and some times incredibly personal questions to these llms. Questions that we may not even speak out loud from embarrassment or shame or other such emotions to even our closest people. How are these companies using our data ? More importantly what are you all doing to protect yourself from misuse of your information? Or is it if you want to use it you have to give up such privacy and uncomfortableness ?
danny_codes · 6h ago
Consider the adoption of conventional technology in the classroom. The US has spent billions on new hardware and software for education, and yet there has been no improvement in learning outcomes.
This is where the skepticism arises. Before we spend another $100 billion on something that ended up being worthless, we should first prove that it’s actually useful. So far, that hasn’t conclusively been demonstrated.
ImaCake · 5h ago
The article states that Study Mode is free to use. Regardless of b2b costs, this is free for you as an individual.
breve · 6h ago
> Adding in a mode that doesn't just dump an answer but works to take you through the material step-by-step is magical
Except these systems will still confidently lie to you.
The other day I noticed that DuckDuckGo has an Easter egg where it will change its logo based on what you've searched for. If you search for James Bond or Indiana Jones or Darth Vader or Shrek or Jack Sparrow, the logo will change to a version based on that character.
If I ask Copilot if DuckDuckGo changes its logo based on what you've searched for, Copilot tells me that no it doesn't. If I contradict Copilot and say that DuckDuckGo does indeed change its logo, Copilot tells me I'm absolutely right and that if I search for "cat" the DuckDuckGo logo will change to look like a cat. It doesn't.
Copilot clearly doesn't know the answer to this quite straightforward question. Instead of lying to me, it should simply say it doesn't know.
mediaman · 5h ago
This is endlessly brought up as if the human operating the tool is an idiot.
I agree that if the user is incompetent, cannot learn, and cannot learn to use a tool, then they're going to make a lot of mistakes from using GPTs.
Yes, there are limitations to using GPTs. They are pre-trained, so of course they're not going to know about some easter egg in DDG. They are not an oracle. There is indeed skill to using them.
They are not magic, so if that is the bar we expect them to hit, we will be disappointed.
But neither are they useless, and it seems we constantly talk past one another because one side insists they're magic silicon gods, while the other says they're worthless because they are far short of that bar.
breve · 5h ago
The ability to say "I don't know" is not a high bar. I would say it's a basic requirement of a system that is not magic.
throwawaylaptop · 5h ago
Based on your example, basically any answer would be "I don't know 100%".
You could ask me as a human basically any question, and I'd have answers for most things I have experience with.
But if you held a gun to head and said "are you sure???" I'd obviously answer "well damn, no I'm not THAT sure".
riwsky · 3h ago
Perhaps LLMs are magic?
chrsw · 5h ago
> The ability to say "I don't know" is not a high bar.
For you and I, it's not. But for these LLMs, maybe it's not that easy? They get their inputs, crunch their numbers, and come out with a confidence score. If they come up with an answer they're 99% confident in, by some stochastic stumbling through their weights, what are they supposed to do?
I agree it's a problem that these systems are more likely to give poor, incorrect, or even obviously contradictory answers than say "I don't know". But for me, that's part of the risk of using these systems and that's why you need to be careful how you use them.
kuschku · 3h ago
but they're not. Ofyen the confidence value is much lower. I should have an option to see how confident it is. (maybe set the opacity of each token to its confidence?)
furyofantares · 1h ago
Logits aren't confidence about facts. You can turn on a display like this in the openai playground and you will see it doesn't do what you want.
cindyllrn · 5h ago
I see your point
Some of the best exchanges that I participated in or witnessed involved people acknowledging their personal limits, including limits of conclusions formed a priori
To further the discussion, hearing the phrase you mentioned would help the listener to independently assess a level of confidence or belief of the exchange
But then again, honesty isn't on-brand for startups
It's something that established companies say about themselves to differentiate from competitors or even past behavior of their own
I mean, if someone prompted an llm weighted for honesty, who would pay for the following conversation?
Prompt: can the plan as explained work?
Response: I don't know about that. What I do know is on average, you're FUCKED.
eviks · 2h ago
> Should we trust the information at face value without verifying from other sources? Of course not, that's part of the learning process.
It mostly isn't, the point of the good learning process is to invest time into verifying "once" and then add verified facts to the learning material so that learners can spend that time learning the material instead of verifying everything again.
Learning to verify is also important, but it's a different skill that doesn't need to be practiced literally every time you learn something else.
Otherwise you significantly increase the costs of the learning process.
Isamu · 6h ago
>I'm puzzled (but not surprised) by the standard HN resistance & skepticism
The good: it can objectively help you to zoom forward in areas where you don’t have a quick way forward.
The bad: it can objectively give you terrible advice.
It depends on how you sum that up on balance.
Example: I wanted a way forward to program a chrome extension which I had zero knowledge of. It helped in an amazing way.
Example: I am keep trying to use it in work situations where I have lots of context already. It performs better than nothing but often worse than nothing.
Mixed bag, that’s all. Nothing to argue about.
adamlgerber · 6h ago
mixed bags are our favorite thing to argue about
Isamu · 6h ago
Haha yes! Thanks for that!
ants_everywhere · 6h ago
> I'm puzzled (but not surprised) by the standard HN resistance & skepticism.
It happens with many technological advancements historically. And in this case there are people trying hard to manufacture outrage about LLMs.
baby · 5h ago
Skepticism is great, it means less competition. I'm forcing everyone around me to use it.
easton · 7h ago
> Certainly, and this technology won't help or hinder them any more than a good old fashioned textbook.
Except that the textbook was probably QA’d by a human for accuracy (at least any intro college textbook, more specialized texts may not have).
Matters less when you have background in the subject (which is why it’s often okay to use LLMs as a search replacement) but it’s nice not having a voice in the back of your head saying “yeah, but what if this is all nonsense”.
gopalv · 6h ago
> Except that the textbook was probably QA’d by a human for accuracy
Maybe it was not when printed in the first edition, but at least it was the same content shown to hundreds of people rather than something uniquely crafted for you.
The many eyes looking at it will catch it and course correct, while the LLM output does not get the benefit of the error correction algorithm because someone who knows the answer probably won't ask and check it.
I feel this way about reading maps vs following GPS navigation, the fact that Google asked me to take an exit here as a short-cut feels like it might trying to solve the Braess' paradox in real time.
I wonder if this route was made for me to avoid my car adding to some congestion somewhere and whether if that actually benefits me or just the people already stuck in that road.
phailhaus · 6h ago
HN is resistant because at the end of the day, these are LLMs. They cannot and do not think. They generate plausible responses. Try this in your favorite LLM:
"Suppose you're on a game show trying to win a car. There are three doors, one with a car and two with goats. You pick a door. The host then gives you the option to switch doors. What is the best strategy in this situation?" The LLM will recognize this as SIMILAR to the Monty Hall problem and tell you to always switch. I just reproduced this with ChatGPT.
But this is completely wrong! In the Monty Hall problem, the host has to reveal a door with a goat behind it for you to gain the benefit of switching. I have to point this out for the LLM to get it right. It did not reason about the problem I gave it, it spat out the most likely response given the "shape" of the problem.
This is why shrugging and saying "well humans get things wrong too" is off base. The problem is that the LLM is not thinking, period. So it cannot create a mental model of your understanding of a subject, it is taking your text and generating the next message in a conversation. This means that the more niche the topic (or your particular misunderstanding), the less useful it will get.
ants_everywhere · 6h ago
That's because the LLM assumes you mean for the host to open the door.
As an autistic person I can assure you that people filling in information you neither gave nor implied is the norm for human conversations. In that sense, the LLMs are being human-like.
If you explicitly tell it you didn't forget to have the host open the door it gives you the right answer:
> There is a game show with three closed doors. One has a car and two have goats. I pick a door. Then, without opening my door or any other door, the host asks me if I want to switch my choice to one of the other two doors. What is my best strategy and what are the probabilities?
Gemini: This is a classic probability puzzle known as the Monty Hall problem, though your description presents a slight variation. The optimal strategy depends entirely on whether the host opens a door or not.
Your Scenario: The Host Doesn't Open a Door
In the situation you've described, where the host asks if you want to switch your choice to one of the other two doors without opening any doors, there is no advantage to switching.
Here's a breakdown of the probabilities in your specific scenario:
Your initial choice: When you first pick a door, you have a 1 in 3 chance of selecting the door with the car. This means you have a 2 in 3 chance of having picked a door with a goat.
The other two doors: The combined probability that the car is behind one of the other two doors is 2/3.
Switching your choice: If you decide to switch, you are essentially choosing one of the other two doors. Since there is no new information (a door with a goat has not been revealed), the 2/3 probability is evenly split between those two doors. Therefore, your chance of winning by switching to a specific one of the other doors is 1/3.
In this case, your odds of winning are 1/3 whether you stay with your original choice or switch to one of the other doors.
chpatrick · 6h ago
Humans who have heard of Monty Hall might also say you should always switch without noticing that the situation is different. That's not evidence that they can't think, just that they're fallible.
People on here always assert LLMs don't "really" think or don't "really" know without defining what all that even means, and to me it's getting pretty old. It feels like an escape hatch so we don't feel like our human special sauce is threatened, a bit like how people felt threatened by heliocentrism or evolution.
bscphil · 19m ago
> Humans who have heard of Monty Hall might also say you should always switch without noticing that the situation is different. That's not evidence that they can't think, just that they're fallible.
At some point we start playing a semantics game over the meaning of "thinking", right? Because if a human makes this mistake because they jumped to an already-known answer without noticing a changed detail, it's because (in the usage of the person you're replying to), the human is pattern matching, instead of thinking. I don't think is surprising. In fact I think much of what passes for thinking in casual conversation is really just applying heuristics we've trained in our own brains to give us the correct answer without having to think rigorously. We remember mental shortcuts.
On the other hand, I don't think it's controversial that (some) people are capable of performing the rigorous analysis of the problem needed to give a correct answer in cases like this fake Monty Hall problem. And that's key... if you provide slightly more information and call out the changed nature of the problem to the LLM, it may give you the correct response, but it can't do the sort of reasoning that would reliably give you the correct answer the way a human can. I think that's why the GP doesn't want to call it "thinking" - they want to reserve that for a particular type of reflective process that can rigorously perform logical reasoning in a consistently valid way.
no_wizard · 6h ago
On the other hand, computers are suppose to be both accurate and able to reproduce said accuracy.
The failure of an LLM to reason this out is indicative that really, it isn’t reasoning at all. It’s a subtle but welcome reminder that it’s pattern matching
papichulo2023 · 44m ago
I guess computer vision didnt get this memo and it is useless.
chpatrick · 6h ago
Computers might be accurate but statistical models never were 100% accurate. That doesn't imply that no reasoning is happening. Humans get stuff wrong too but they certainly think and reason.
"Pattern matching" to me is another one of those vague terms like "thinking" and "knowing" that people decide LLMs do or don't do based on vibes.
no_wizard · 6h ago
Pattern matching has a definition in this field, it does mean specific things. We know machine learning has excelled at this in greater and greater capacities over the last decade
The other part of this is weighted filtering given a set of rules, which is a simple analogy to how AlphaGo did its thing.
Dismissing all this as vague is effectively doing the same thing as you are saying others do.
This technology has limits and despite what Altman says, we do know this, and we are exploring them, but it’s within its own confines. They’re fundamentally wholly understandable systems that work on a consistent level in terms of the how they do what they do (that is separate from the actual produced output)
I think reasoning, as any layman would use the term, is not accurate to what these systems do.
chpatrick · 5h ago
> Pattern matching has a definition in this field, it does mean specific things.
Such as?
> They’re fundamentally wholly understandable systems that work on a consistent level in terms of the how they do what they do (that is separate from the actual produced output)
Multi billion parameter models are definitely not wholly understandable and I don't think any AI researcher would claim otherwise. We can train them but we don't know how they work any more than we understand how the training data was made.
> I think reasoning, as any layman would use the term, is not accurate to what these systems do.
Based on what?
no_wizard · 5h ago
You’re welcoming to provide counters. I think these are all sufficiently common things that they stand on their own as to what I posit
QuantumGood · 6h ago
I use the Monty Hall problem to test people in two steps. The second step is, after we discuss it and come up with a framing that they can understand, can they then explain it to a third person. The third person rarely understands, and the process of the explanation reveals how shallow the understanding of the second person is. The shallowest understanding of any similar process that I've usually experienced is an LLM.
raincole · 4h ago
I'd share a little bit experience about learning from human teachers.
Here in my country, English is not you'll hear in everyday conversation. Native English speakers account to a tiny percentage of population. Our language doesn't resemble English at all. However, English is a required subject in our mandatory education system. I believe this situation is quite typical across many Asian countries.
As you might imagine, most English teachers in public schools are not native speakers. And they, just like other language learners, make mistakes that native speakers won't make without even realizing what's wrong. This creates a cycle enforcing non-standard English pragmatics in the classroom.
Teachers are not to blame. Becoming fluent and proficient enough in a second language to handle questions students spontaneously throw to you takes years, if not decades of immersion. It's an unrealistic expectation for an average public school teacher.
The result is rich parents either send their kids to private schools or have extra classes taught by native speakers after school. Poorer but smart kids realize the education system is broken and learn their second language from Youtube.
-
What's my point?
When it comes to math/science, in my experience, the current LLMs act similarly to the teachers in public school mentioned above. And they're worse in history/economics. If you're familiar with the subject already, it's easy to spot LLM's errors and gather the useful bits from their blather. But if you're just a student, it can easily become a case of blind-leading-the-blind.
It doesn't make LLMs completely useless in learning (just like I won't call public school teachers 'completely useless', that's rude!). But I believe in the current form they should only play a rather minor role in the student's learning journey.
unixhero · 3h ago
This is a dream I agree. Detractors are always left behind.
everyone · 7h ago
Yeah, I've been a game-dev forever and had never built a web-app in my life (even in college) I recently completed my 1st web-app contract, and gpt was my teacher. I have no problem asking stupid questions, tbh asking stupid questions is a sign of intelligence imo. But where is there to even ask these days? Stack Overflow may as well not exist.
BubbleRings · 6h ago
Right on. A sign of intelligence but more importantly of bravery, and generosity. A person that asks good questions in a class improves the class drastically, and usually learns more effectively than other students in the class.
kuschku · 3h ago
> But where is there to even ask these days?
Stack overflow?
The IRC, Matrix or slack chats for the languages?
scarface_74 · 5h ago
I know some Spanish - close to B1. I find ChatGPT to be a much better way to study than the standard language apps. I can create custom lessons, ask questions about language nuances etc. I can also have it speak the sentences and practice pronunciation.
csomar · 1h ago
There is no skepticism. LLMs are fundamentally lossy and as a result they’ll always give some wrong result/response somewhere. If they are connected to a data source, this can reduce the error rate but not eliminate it.
I use LLMs but only for things that I have a good understanding of.
benatkin · 4h ago
It's quite boring to listen to people praising AI (worshipping it, putting it on a pedastal, etc). Those who best understand the potential of it aren't doing that. Instead they're talking about various specific things that are good or bad, and they don't go out of the way to lick AI's boots, but when they're asked they acknowledge that they're fans of AI or bullish on it. You're probably misreading a lot of resistance & skepticism on HN.
tayo42 · 3h ago
>Learning something online 5 years ago often involved trawling incorrect, outdated or hostile content
Leanring what is like that? MIT open courseware has been available for like 10 years with anything you could want to learn in college
Textbooks are all easily pirated
dyauspitr · 4h ago
HN’s fear is the same job security fear we’ve been seen since the beginning of all this. You’ll see this on programming subs on Reddit as well.
hammyhavoc · 7h ago
There might not be any stupid questions, but there's plenty of perfectly confident stupid answers.
Yeah, this is why wikipedia is not a good resource and nobody should use it. Also why google is not a good resource, anybody can make a website.
You should only trust going into a library and reading stuff from microfilm. That's the only real way people should be learning.
/s
hammyhavoc · 7h ago
Ah yes, the thing that told people to administer insulin to someone experiencing hypoglycemia (likely fatal BTW) is nothing like a library or Google search, because people blindly believe the output because of the breathless hype.
See Dunning-Kruger.
TeMPOraL · 6h ago
See 4chan during the "crowd wisdom" hype era.
czhu12 · 12h ago
I'll personally attest: LLM's have been absolutely incredible to self learn new things post graduation. It used to be that if you got stuck on a concept, you're basically screwed. Unless it was common enough to show up in a well formed question on stack exchange, it was pretty much impossible, and the only thing you can really do is keep paving forward and hope at some point, it'll make sense to you.
Now, everyone basically has a personal TA, ready to go at all hours of the day.
I get the commentary that it makes learning too easy or shallow, but I doubt anyone would think that college students would learn better if we got rid of TA's.
no_wizard · 12h ago
>Now, everyone basically has a personal TA, ready to go at all hours of the day
This simply hasn't been my experience.
Its too shallow. The deeper I go, the less it seems to be useful. This happens quick for me.
Also, god forbid you're researching a complex and possibly controversial subject and you want it to find reputable sources or particularly academic ones.
scarmig · 12h ago
I've found it excels at some things:
1) The broad overview of a topic
2) When I have a vague idea, it helps me narrow down the correct terminology for it
3) Providing examples of a particular category ("are there any examples of where v1 in the visual cortex develops in a disordered way?")
4) "Tell me the canonical textbooks in field X"
5) Posing math exercises
6) Free form branching--while talking about one topic, I want to shift to another that is distinct but related.
I agree they leave a lot to be desired when digging very deeply into a topic. And my biggest pet peeve is when they hallucinate fake references ("tell me papers that investigate this topic" will, for any sufficiently obscure topic, result in a bunch of very promising paper titles that are wholely invented).
CJefferson · 11h ago
These things are moving so quickly, but I teach a 2nd year combinatorics course, and about 3 months ago I tried th latest chatGPT and Deepseek -- they could answer very standard questions, but were wrong for more advanced questions, but often in quite subtle ways. I actually set a piece of homework "marking" chatGPT, which went well and students seemed to enjoy!
Julien_r2 · 8h ago
Super good idea!!
Luc Julia (one of the main Siri's creators) describe a very similar exercice in this interview [0](It's in french, although the au translation isn't too bad)
The gist of it, is that he describes this exercice he does with his students, where they ask chatgpt about Victor Hugo's biography, and then proceed to spot the errors made by Chatgtp.
This setup is simple, but there are very interesting mechanisms in place. The student get to learn about challenging facts, do fact checking, cross reference, etc. While also asserting the reference figure of the teacher, with the knowledge to take down chat gpt.
Arf seems I'm one of those :).. thanks for the heads up!
ai_viewz · 7h ago
this is amazing strategy
teaearlgraycold · 11h ago
That’s a great idea to both teach the subject and AI skepticism.
scarmig · 11h ago
Very clever and approachable, and I've been unintentionally giving myself that exercise for awhile now. Who knows how long it will remain viable, though.
p1esk · 8h ago
When you say the latest chatGPT, do you mean o3?
jennyholzer · 9h ago
that's a cool assignment!
bryanrasmussen · 9h ago
>When I have a vague idea, it helps me narrow down the correct terminology for it
so the opposite of Stack Overflow really, where if you have a vague idea your question gets deleted and you get reprimanded.
Maybe Stack Overflow could use AI for this, help you formulate a question in the way they want.
scarmig · 8h ago
Maybe. But, it's been over a year since I used StackOverflow, primarily because of LLMs. Sure, I could use an LLM to formulate a question that passes SO's muster. But why bother, when the LLM can almost certainly answer the question as well; SO will be slower; and there's a decent chance that my question will be marked as a duplicate (because it pattern matches to a similar but distinct question).
andy_ppp · 11h ago
I’ve found the AI is particularly good at explaining AI, better than quite a lot of other coding tasks.
narcraft · 9h ago
I find 2 invaluable for enhancing search, and combined with 1 & 4, it's a huge boost to self-learning.
jjfoooo4 · 10h ago
It's a floor raiser, not a ceiling raiser. It helps you get up to speed on general conventions and consensus on a topic, less so on going deep on controversial or highly specialized topics
No comments yet
SLWW · 12h ago
My core problem with LLMs is as you say; it's good for some simpler concepts, tasks, etc. but when you need to dive into more complex topics it will oversimplify, give you what you didn't ask for, or straight up lie by omission.
History is a great example, if you ask an LLM about a vaguely difficult period in history it will just give you one side and act like the other doesn't exist, or if there is another side, it will paint them in a very negative light which often is poorly substantiated; people don't just wake up and decide one day to be irrationally evil with no reason, if you believe that then you are a fool... although LLMs would agree with you more times than not since it's convenient.
The result of these things is a form of gatekeeping, give it a few years and basic knowledge will be almost impossible to find if it is deemed "not useful" whether that's an outdated technology that the LLM doesn't seem talked about very much anymore or a ideological issue that doesn't fall in line with TOS or common consensus.
scarmig · 11h ago
A few weeks ago I was asking an LLM to offer anti-heliocentric arguments, from the perspective of an intelligent scientist. Although it initially started with what was almost a parody of writing from that period, with some prompting I got it to generate a strong rendition of anti-heliocentric arguments.
(On the other hand, it's very hard to get them to do it for topics that are currently politically charged. Less so for things that aren't in living memory: I've had success getting it to offer the Carthaginian perspective in the Punic Wars.)
SLWW · 11h ago
That's a fun idea; almost having it "play pretend" instead of directly asking it for strong anti-heliocentric arguments outright.
It's weird to see which topics it "thinks" are politically charged vs. others. I've noticed some inconsistency depending on even what years you input into your questions. One year off? It will sometimes give you a more unbiased answer as a result about the year you were actually thinking of.
scarmig · 11h ago
I think the first thing is figuring out exactly what persona you want the LLM to adopt: if you have only a vague idea of the persona, it will default to the laziest one possible that still could be said to satisfy your request. Once that's done, though, it usually works decently, except for those that the LLM detects are politically charged. (The weakness here is that at some point you've defined the persona so strictly that it's ahistorical and more reflective of your own mental model.)
As for the politically charged topics, I more or less self-censor on those topics (which seem pretty easy to anticipate--none of those you listed in your other comment surprise me at all) and don't bother to ask the LLM. Partially out of self-protection (don't want to be flagged as some kind of bad actor), partially because I know the amount of effort put in isn't going to give a strong result.
SLWW · 8h ago
> The weakness here is that at some point you've defined the persona so strictly that it's ahistorical and more reflective of your own mental model.
That's a good thing to be aware of, using our own bias to make it more "likely" to play pretend. LLMs tend to be more on the agreeable side; given the unreliable narrators we people tend to be, and the fact that these models are trained on us, it does track that the machine would tend towards preference over fact, especially when the fact could be outside of the LLMs own "Overton Window".
I've started to care less and less about self-censoring as I deem it to be a kind of "use it or lose it" privilege. If you normalize talking about censored/"dangerous" topics in a rational way, more people will be likely to see it not as much of a problem. The other eventuality is that no one hears anything that opposes their view in a rational way but rather only hears from the extremists or those who just want to stick it to the current "bad" in their minds at that moment. Even then though I still will omit certain statements on some topics given the platform, but that's more so that I don't get mislabeled by readers. (one of the items on my other comment was intentionally left as vague as possible for this reason)
As for the LLMs, I usually just leave spicy questions for LLMs I can access through an API of someone else (an aggregator) and not a personal acc just to make it a little more difficult to label my activity falsely as a bad actor.
Gracana · 11h ago
Have you tried abliterated models? I'm curious if the current de-censorship methods are effective in that area / at that level.
brendoelfrendo · 6h ago
What were its arguments? Do you have enough of an understanding of astronomy to know whether it actually made good arguments that are grounded in scientific understanding, or did it just write persuasively in a way that looks convincing to a layman?
> I've had success getting it to offer the Carthaginian perspective in the Punic Wars.
This is not surprising to me. Historians have long studied Carthage, and there are books you can get on the Punic Wars that talk about the state of Carthage leading up to and during the wars (shout out to Richard Miles's "Carthage Must Be Destroyed: The Rise and Fall of an Ancient Civilization"). I would expect an LLM to piggyback off of that existing literature.
scarmig · 1h ago
Extensive education in physics, so yes.
The most compelling reason at the time to reject heliocentrism was the (lack of) parallax of stars. The only response that the heliocentrists had was that the stars must be implausibly far away. Hundreds of billions of times further away than the moon is--and they knew the moon itself is already pretty far from us-- which is a pretty radical, even insane, idea. There's also the point that the original Copernican heliocentric model had ad hoc epicycles just as the Ptolemaic one did, without any real increase in accuracy.
Strictly speaking, the breakdown here would be less a lack of understanding of contemporary physics, and more about whether I knew enough about the minutia of historical astronomers' disputes to know if the LLM was accurately representing them.
morgoths_bane · 7h ago
>I've had success getting it to offer the Carthaginian perspective in the Punic Wars.)
That's honestly one of the funniest things I have read on this site.
pengstrom · 11h ago
The part about history perspectives sounds interesting. I haven't noticed this. Please post any concrete/specific examples you've encountered!
pizzafeelsright · 7h ago
You are born in your country.
You love your family.
A foreign country invades you.
Your country needs you.
Your faith says to obey the government.
Commendable and noble except for a few countries, depending upon the year.
Why?
SLWW · 11h ago
- Rhodesia (lock step with the racial-first reasoning, underplays Britain's failures to support that which they helped establish; makes the colonists look hateful when they were dealing with terrorists which the British supported)
- Bombing of Dresden, death stats as well as how long the bombing went on for (Arthur Harris is considered a war-criminal to this day for that; LLLMs highlight easily falsifiable claims by Nazi's to justify low estimates without providing much in the way of verifiable claims outside of a select few, questionable, sources. If the low-estimate is to be believed, then it seems absurd that Harris would be considered a war-criminal in light of what crimes we allow today in warfare)
- Ask it about the Crusades, often if forgets the sacking of St. Peter's in Rome around 846 AD, usually painting the Papacy as a needlessly hateful and violent people during that specific Crusade. Which was horrible, bloody as well as immensely destructive (I don't defend the Crusades), but paints the Islamic forces as victims, which they were eventually, but not at the beginning, at the beginning they were the aggressors bent on invading Rome.
- Ask it about the Six-Day War (1967) and contrast that with several different sources on both sides and you'll see a different portrayal even by those who supported the actions taken.
These are just the four that come to my memory at this time.
Most LLMs seem cagey about these topics; I believe this is due to an accepted notion that anything that could "justify" hatred or dislike of a people group or class that is in favor -- according to modern politics -- will be classified as hateful rhetoric, which is then omitted from the record. The issue lies in the fact that to understand history, we need to understand what happened, not how it is perceived, politically, after the fact. History helps inform us about the issues of today, and it is important, above all other agendas, to represent the truth of history, keeping an accurate account (or simply allowing others to read differing accounts without heavy bias).
LLMs are restricted in this way quite egregiously; "those who do not study history are doomed to repeat it", but if this continues, no one will have the ability to know history and are therefore forced to repeat it.
pyuser583 · 10h ago
> Ask it about the Crusades, often if forgets the sacking of St. Peter's in Rome around 846 AD, usually painting the Papacy as a needlessly hateful and violent people during that specific Crusade. Which was horrible, bloody as well as immensely destructive (I don't defend the Crusades), but paints the Islamic forces as victims, which they were eventually, but not at the beginning, at the beginning they were the aggressors bent on invading Rome.
I don't know a lot about the other things you mentioned, but the concept of crusading did not exist (in Christianity) in 846 AD. It's not any conflict between Muslims and Christians.
SLWW · 10h ago
The crusades were predicated on historic tensions between Rome and the Arabs. Which is why I mention that, while the First Crusade proper was in 1096, it's core reasoning were situations like the Sacking of St. Peters which is considered by historians to be one of the most influential moments and often was used as a justification as there was a history of incompatibilities between Rome and the Muslims.
Further leading to the Papacy furthering such efforts in the upcoming years, as they were in Rome and made strong efforts to maintain Catholicism within those boundaries. Crusading didn't appear out of nothing; it required a catalyst for the behavior, like what i listed, is usually a common suspect.
pyuser583 · 6h ago
What you’re saying is not at all what I understand to be the history of crusading.
Its background is in the Islamic Christian conflicts of Spain. Crusading was adopted from the Muslim idea of Jihad, as we things like naming customs (Spanish are the only Christians who name their children “Jesus”, after the Muslim “Muhammad”).
The political tensions that lead to the first crusade were between Arab Muslims and Byzantine Christian’s. Specifically, the Battle of Mazikirt made Christian Europe seem more vulnerable than it was.
The Papacy wasn’t at the forefront of the struggle against Islam. It was more worried about the Normans, Germans, and Greeks.
When the papacy was interested in Crusading it was for domestic reasons: getting rid of king so-and-so by making him go on crusade.
The situation was different in Spain where Islam was a constant threat, but the Papacy regarded Spain as an exotic foreign land (although Sylvester II was educated there).
It’s extremely misleading to view the pope as the leader of an anti-Muslim coalition. There really was no leader per se, but the reasons why kings went on crusade had little to do with fighting Islam.
Just look at how many monarchs showed up in Jerusalem, then headed straight home and spent the rest of their lives bragging about crusaders.
I’m 80% certain no pope ever set foot in Outremere.
cthalupa · 8h ago
Why should we consider something that happened 250 years prior as some sort of affirmative defense of the Crusades as having been something that started with the Islamic world being the aggressors?
If the US were to start invading Axis countries with WW2 being the justification we'd of course be the aggressors, and that was less than 100 years ago.
scarmig · 8h ago
Because it played a role in forming the motivations of the Crusaders? It's not about justifying the Crusades, but understanding why they happened.
Similarly, it helps us understand all the examples of today of resentments and grudges over events that happened over a century ago that still motivate people politically.
His point is that this was not part of the crusades, not that he was unaware of his happening.
jamiek88 · 9h ago
Arthur Harris is in no way considered a war criminal by the vast majority of British people for the record.
It’s a very controversial opinion and stating as a just so fact needs challenging.
SLWW · 8h ago
Do you have references or corroborating evidence?
In 1992 a statue was erected to Harris in London, it was under 24 hour surveillance for several months due to protesting and vandalism attempts.
I'm only mentioning this to highlight that there was quite a bit of push back specifically calling the gov out on a tribute to him; which usually doesn't happen if the person was well liked... not as an attempted killshot.
Which is funny and an odd thing to say if you are widely loved/unquestioned by your people. Again just another occurrence of language from those who are on his side reinforcing the idea that there is, as you say is "very controversial", and maybe not a "vast majority" since those two things seem at odds with each other.
Not to mention that Harris targeted civilians, which is generally considered behavior of a war-criminal.
Although you are correct I should have used more accurate language instead of saying "considered" I should have said "considered by some".
Q_is_4_Quantum · 11h ago
This was interesting thanks - makes me wish I had the time to study your examples. But of course I don't, without just turning to an LLM....
If for any of these topics you do manage to get a summary you'd agree with from a (future or better-prompted?) LLM I'd like to read it. Particularly the first and third, the second is somewhat familiar and the fourth was a bit vague.
mwigdahl · 11h ago
If someone has Grok 4 access I'd be interested to see if it's less likely to avoid these specific issues.
wahnfrieden · 9h ago
You call out that you don’t defend the crusades but are you supportive of Rhodesia?
SLWW · 8h ago
I only highlighted that I'm not in support of the crusades since it sounds like i might be by my comments. I was highlighting that they didn't just lash out with no cause to start their holy war.
Rhodesia is a hard one; since the more I learn about it the more I feel terrible for both sides; I also do not support terrorism against a nation even if I believe they might not be in the right. However i hold by my disdain for how the British responded/withdrew from them effectively doomed Rhodesia making peaceful resolution essentially impossible.
fragmede · 4h ago
> those who do not study history are doomed to repeat it
The problem is, those that do study history are also doomed to watch it repeat.
maxsilver · 5h ago
> people don't just wake up and decide one day to be irrationally evil with no reason, if you believe that then you are a fool
The problem with this, is that people sometimes really do, objectively, wake up and device to be irrationally evil. It’s not every day, and it’s not every single person — but it does happen routinely.
If you haven’t experienced this wrath yourself, I envy you. But for millions of people, this is their actual, 100% honest truthful lived reality. You can’t rationalize people out of their hate, because most people have no rational basis for their hate.
(see pretty much all racism, sexism, transphobia, etc)
fragmede · 4h ago
Do they see it as evil though? They wake up, decide to do what they perceive as good but things are so twisted that their version of good doesn't agree with mine or yours. Some people are evil, see themselves as bad, and continue down that path, absolutely. But that level of malevolence is rare. Far more common is for people to believe that what they're doing is in service of the greater good of their community.
neutronicus · 8h ago
History in particular is rapidly approaching post-truth as a knowledge domain anyway.
There's no short-term incentive to ever be right about it (and it's easy to convince yourself of both short-term and long-term incentives, both self-interested and altruistic, to actively lie about it). Like, given the training corpus, could I do a better job? Not sure.
altcognito · 8h ago
"Post truth". History is a funny topic. It is both critical and irrelevant. Do we really need to know how the founder felt about gun rights? Abortion? Both of these topics were radically different in their day.
All of us need to learn the basics about how to read history and historians critically and to know our the limitations which as you stated probably a tall task.
andrepd · 7h ago
What are you talking about? In what sense is history done by professional historians degrading in recent times? And what short/long term incentives are you talking about? They are the same as any social science.
andrepd · 8h ago
> History is a great example, if you ask an LLM about a vaguely difficult period in history it will just give you one side and act like the other doesn't exist, or if there is another side, it will paint them in a very negative light which often is poorly substantiated
Which is why it's so terribly irresponsible to paint these """AI""" systems as impartial or neutral or anything of the sort, as has been done by hypesters and marketers for the past 3 years.
jay_kyburz · 8h ago
People _do_ just wake up one day and decide some piece of land should belong to them, or that they don't have enough money and can take yours, or they are just sick of looking at you and want to be rid of you. They will have some excuse or justification, but really they just want more than they have.
People _do_ just wake up and decide to be evil.
SLWW · 8h ago
A nation that might fit this description may have had their populace indoctrinated (through a widespread political campaign) to believe that the majority of the world throughout history seeks for their destruction. That's a reason for why they think that way, but not because they woke up one day and decided to choose violence.
However not a justification, since I believe that what is happening today is truly evil. Same with another nation who entered a war knowing they'd be crushed, which is suicide; whether that nation is in the right is of little effect if most of their next generation has died.
epolanski · 9h ago
I really think that 90% of such comments come from a lack of knowledge on how to use LLMs for research.
It's not a criticism, the landscape moves fast and it takes time to master and personalize a flow to use an LLM as a research assistant.
Start with something such as NotebookLM.
no_wizard · 5h ago
I use them and stay up to date reasonably. I have used NotebookLM, I have access to advanced models through my employer and personally, and I have done alot of research on LLMs and using them effectively.
They simply have limitations, especially on deep pointed subject matters where you want depth not breadth, and honestly I'm not sure why these limitations exist but I'm not working directly on these systems.
Talk to Gemini or ChatGPT about mental health things, thats a good example of what I'm talking about. As recently as two weeks ago my colleagues found that even when heavily tuned, they still managed to become 'pro suicide' if given certain lines of questioning.
II2II · 10h ago
> Also, god forbid you're researching a complex and possibly controversial subject and you want it to find reputable sources or particularly academic ones.
That's fine. Recognize the limits of LLMs and don't use them in those cases.
Yet that is something you should be doing regardless of the source. There are plenty of non-reputable sources in academic libraries and there are plenty of non-reputable sources from professionals in any given field. That is particularly true when dealing with controversial topics or historical sources.
gojomo · 8h ago
Grandparent testimony of success, & parent testimony of frustration, are both just wispy random gossip when they don't specify which LLMs delivered the reported experiences.
The quality varies wildly across models & versions.
With humans, the statement "my tutor was great" and "my tutor was awful" reflect very little on "tutoring" in general, and are barely even responses to each other withou more specificity about the quality of tutor involved.
Same with AI models.
no_wizard · 6h ago
Latest OpenAI, Latest Gemini models, also tried with latest LLAMA but I didn’t expect much there.
I have no access to anthropic right now to compare that.
It’s an ongoing problem in my experience
tsumnia · 11h ago
IT can be beneficial for making your initial assessment, but you'll need to dig deeper for something meaningful. For example, I recently used Gemini's Deep Research to do some literature review on educational Color Theory in relation to PowerPoint presentations [1]. I know both areas rather well, but I wanted to have some links between the two for some research that I am currently doing.
I'd say that companies like Google and OpenAI are aware of the "reputable" concerns the Internet is expressing and addressing them. This tech is going to be, if not already is, very powerful for education.
Taking a Gemini Deep Research output and feeding it to NotebookLM to create audio overviews is my current podcast go-to. Sometimes I do a quick Google and add in a few detailed but overly verbose documents or long form YouTube videos, and the result is better than 99% of the podcasts out there, including those by some academics.
hammyhavoc · 7h ago
No wonder there are so many confident people spouting total rubbish on technical forums.
wyager · 10m ago
> god forbid you're researching a complex and possibly controversial subject and you want it to find reputable sources
If you're really researching something complex/controversial, there may not be any
kenjackson · 9h ago
“The deeper I go, the less it seems to be useful. This happens quick for me.
Also, god forbid you're researching a complex and possibly controversial subject and you want it to find reputable sources or particularly academic ones.”
These things also apply to humans. A year or so ago I thought I’d finally learn more about the Israeli/Palestinians conflict. Turns out literally every source that was recommended to me by some reputable source was considered completely non-credible by another reputable one.
That said I’ve found ChatGPT to be quite good at math and programming and I can go pretty deep at both. I can definitely trip it into mistakes (eg it seems to use calculations to “intuit” its way around sometimes and you can find dev cases where the calls will lead it the wrong directions), but I also know enough to know how to keep it on rails.
9dev · 8h ago
> Turns out literally every source that was recommended to me by some reputable source was considered completely non-credible by another reputable one.
That’s the single most important lesson by the way, that this conflict just has two different, mutually exclusive perspectives, and no objective truth (none that could be recovered FWIW). Either you accept the ambiguity, or you end up siding with one party over the other.
jonny_eh · 7h ago
> you end up siding with one party over the other
Then as you get more and more familiar you "switch" depending on the sub-issue being discussed, aka nuance
slt2021 · 7h ago
the truth (aka facts) is objective and facts exist.
The problem is selective memory of these facts, and biased interpretation of those facts, and stretching the truth to fit pre-determined opinion
jonahx · 8h ago
> learn more about the Israeli/Palestinians
> to be quite good at math and programming
Since LLMs are essentially summarizing relevant content, this makes sense. In "objective" fields like math and CS, the vast majority of content aligns, and LLMs are fantastic at distilling the relevant portions you ask about. When there is no consensus, they can usually tell you that ("this is nuanced topic with many perspectives...", etc), but they can't help you resolve the truth because, from their perspective, the only truth is the content.
drc500free · 8h ago
Israel / Palestine is a collision between two internally valid and mutually exclusive worldviews. It's kind of a given that there will be two camps who consider the other non-reputable.
FWIW, the /r/AskHistorians booklist is pretty helpful.
> It's kind of a given that there will be two camps who consider the other non-reputable.
You don’t need to look more than 2 years back to understand why either camp finds the other non-reputable.
andrepd · 7h ago
A human-curated list of human-written books? How delightfully old fashioned!
Liftyee · 8h ago
Re: conflicts and politics etc.
I've anecdotally found that real world things like these tend to be nuanced, and that sources (especially on the internet) are disincentivised in various ways from actually showing nuance. This leads to "side-taking" and a lack of "middle-ground" nuanced sources, when the reality lies somewhere in the middle.
Might be linked to the phenomenon where in an environment where people "take sides", those who display moderate opinions are simply ostracized by both sides.
Curious to hear people's thoughts and disagreements on this.
wahern · 7h ago
I think the Israeli/Palestinian conflict is an example where studying the history is in some sense counter-productive. There's more than a century of atrocities that justify each subsequent reaction; the veritable cycle of violence. And whichever atrocity grabs you first (partly based on present cultural narratives) will color how you perceive everything else.
Moreover, the conflict is unfolding. What matters isn't what happened 100 years ago, or even 50 years ago, but what has happened recently and is happening. A neighbor of mine who recently passed was raised in Israel. Born circa 1946 (there's black & white footage of her as a baby aboard, IIRC, the ship Exodus 1947), she has vivid memories as a child of Palestinian Imams calling out from the mosques to "kill the Jews". She was a beautiful, kind soul who, for example, freely taught adult education to immigrants (of all sorts), but who one time admitted to me that she utterly despised Arabs. That's all you need to know, right there, to understand why Israel is doing what it's doing. Not so much what happened in the past to make people feel that way, but that many Israelis actually, viscerally feel this way today, justifiably or not but in any event rooted in memories and experiences seared into their conscience. Suffice it to say, most Palestinians have similar stories and sentiments of their own, one of the expressions of which was seen on October 7th.
And yet at the same time, after the first few months of the Gaza War she was so disgusted that she said she wanted to renounce her Israeli citizenship. (I don't know how sincere she was in saying this; she died not long after.) And, again, that's all you need to know to see how the conflict can be resolved, if at all; not by understanding and reconciling the history, but merely choosing to stop justifying the violence and moving forward. How the collective action problem might be resolved, within Israeli and Palestinian societies and between them... that's a whole 'nother dilemma.
Using AI/ML to study history is interesting in that it even further removes one from actual human experience. Hearing first hand accounts, even if anecdotal, conveys information you can't acquire from a book; reading a book conveys information and perspective you can't get from a shorter work, like a paper or article; and AI/ML summaries elide and obscure yet more substance.
neutronicus · 10h ago
Hmm. I have had pretty productive conversations with ChatGPT about non-linear optimization.
Granted, that's probably well-trodden ground, to which model developers are primed to pay attention, and I'm (a) a relative novice with (b) very strong math skills from another domain (computational physics). So Chuck and I are probably both set up for success.
karaterobot · 5h ago
What are some subjects that ChatGPT has given only shallow instruction on?
I'll tell you that I recently found it the best resource on the web for teaching me about the 30 Years War. I was reading a collection of primary source documents, and was able to interview ChatGPT about them.
Last week I used it to learn how to create and use Lehmer codes, and its explanation was perfect, and much easier to understand than, for example, Wikipedia.
I ask it about truck repair stuff all the time, and it is also great at that.
I don't think it's great at literary analysis, but for factual stuff it has only ever blown away my expectations at how useful it is.
I validate models in finance, and this is by far the best tool created for that purpose. I'd compare financial model validation to a Master's level task, where you're working with well established concepts, but at a deep, technical level. LLMs excel at that: ithey understand model assumptions, know what needs to be tested to ensure correctness, and can generate the necessary code and calculations to perform those tests. And finally, they can write the reports.
Model Validation groups are one of the targets for LLMs.
no_wizard · 6h ago
That’s one aspect of quantitative finance, and I agree. Elsewhere I noted that anything that is structured data + computation adjacent it has an easier time with, even excels in many cases.
It doesn’t cover the other aspects of finance, perhaps may be considered advanced (to a regular person at least) but less quantitative. Try having it reason out a “cigar butt” strategy and see if returns anything useful about companies that fit the mold from a prepared source.
Granted this isn’t quant finance modeling, but it’s a relatively easy thing as a human to do, and I didn’t find LLMs up to the task
EchoReflection · 7h ago
I have found that being very specific and asking things like "can you tell me what another perspective might be, such that I can understand potential counter-arguments might be, and how people with other views might see this topic?" can be helpful when dealing with complex/nuanced/contentious subjects. Likewise with regard to "reputable" sources.
noosphr · 7h ago
This is where feeding in extra context matters. Paste in text that shows up from a google search, textbooks preferred, to get in depth answers.
No one builds multi shot search tools because they eat tokens like no ones business, but I've deployed them internal to a company with rave reviews at the cost of $200 per seat per day.
jlebar · 9h ago
> Its too shallow. The deeper I go, the less it seems to be useful. This happens quick for me.
You must be using a free model like GPT-4o (or the equivalent from another provider)?
I find that o3 is consistently able to go deeper than me in anything I'm a nonexpert in, and usually can keep up with me in those areas where I am an expert.
If that's not the case for you I'd be very curious to see a full conversation transcript (in chatgpt you can share these directly from the UI).
no_wizard · 5h ago
I have access to the highest tier paid versions of ChatGPT and Google Gemini, I've tried different models, tuning things like size of context windows etc.
I know it has nothing to do with this. I simply hit a wall eventually.
I unfortunately am not at liberty to share the chats though. They're work related (I very recently ended up at a place where we do thorny research).
A simple one though, is researching Israel - Palestine relations since 1948. It starts off okay (usually) but it goes off the rails eventually with bad sourcing, fictitious sourcing, and/or hallucinations. Sometimes I actually hit a wall where it repeats itself over and over and I suspect its because the information is simply not captured by the model.
FWIW, if these models had live & historic access to Reuters and Bloomberg terminals I think they might be better at a range of tasks I find them inadequate for, maybe.
beambot · 11h ago
The worst is when it's confidently wrong about things... Thankfully, this occurance is becoming less & less common -- or at least, it's boundary is beyond my subject matter expertise.
dankwizard · 5h ago
This is the part where you actually need to think and wonder if AI is the right tool in this particular purpose. Unfortunately you can't completely turn your brain off just yet.
terabyterex · 9h ago
This can happen if you use the free model and not a paid deep research model. You can use a gpt model and ask things like , "how many moons does Jupiter have?" But if you want to ask, "can you go on the web a research the affects that chamical a has had on our water supply a cite sources?", you will need to use a deep research model.
hammyhavoc · 7h ago
Why not do the research yourself rather than risk it misinterpreting? I FAFO'd repeatedly with that, and it is just horribly unreliable.
marcosdumay · 9h ago
> and you want it to find reputable sources
Ask it for sources. The two things where LLMs excel is by filling the sources on some claim you give it (lots will be made up, but there isn't anything better out there) and by giving you queries you can search for some description you give it.
chrisweekly · 9h ago
Also, Perplexity.ai cites its sources by default.
golly_ned · 9h ago
It often invents sources. At least for me.
prats226 · 9h ago
Can you give a specific example where at certain depth it has stopped becoming useful?
vonneumannstan · 9h ago
>Its too shallow. The deeper I go, the less it seems to be useful. This happens quick for me.
If its a subject you are just learning how can you possibly evaluate this?
neutronicus · 9h ago
If you're a math-y person trying to get up to speed in some other math-y field you can discern useless LLM output pretty quickly even as a relative novice.
Falling apart under pointed questioning, saying obviously false things, etc.
Sharlin · 8h ago
It's easy to recognize that something is wrong if it's wrong enough.
jasondigitized · 9h ago
If we have custom trained LLMs per subject doesn't that solve the problem. The shallow problem seems really easy to solve
Xenoamorphous · 11h ago
Can you share some examples?
no_wizard · 6h ago
Try doing deep research on the Israel - Palestine relations. That’s a good baseline. You’ll find it starts spitting out really useless stuff fast, or will try to give sources that don’t exist or are not reputable.
Teever · 10h ago
It sounds like it is a good tool for getting you up to speed on a subject and you can leverage that newfound familiarity to better search for reputable sources on existing platforms like google scholar or arXiv.
ACCount36 · 12h ago
It is shallow. But as long as what you're asking it of is the kind of material covered in high school or college, it's fairly reliable.
This generation of AI doesn't yet have the knowledge depth of a seasoned university professor. It's the kind of teacher that you should, eventually, surpass.
CamperBob2 · 8h ago
What is "it"? Be specific: are you using some obsolete and/or free model? What specific prompt(s) convinced you that there was no way forward?
waynesonfire · 8h ago
It's not a doctoral adviser.
HPsquared · 11h ago
Human interlocutors have similar issues.
EGreg · 11h ago
Try to red team blue team with it
Blue team you throw out concepts and have it steelman them
Red team you can literally throw any kind of stress test at your idea
Alternate like this and you will learn
A great prompt is “give me the top 10 xyz things” and then you can explore
Back when I was in 2006 I used Wikipedia to prepare for job interviews :)
adamsb6 · 12h ago
When ChatGPT came out it was like I had the old Google back.
Learning a new programming language used to be mediated with lots of useful trips to Google to understand how some particular bit worked, but Google stopped being useful for that years ago. Even if the content you're looking for exists, it's buried.
GaggiX · 12h ago
And the old ChatGPT was nothing compared to what we have today, nowadays reasoning models will eat through math problems no problem when this was a major limitation in the past.
jennyholzer · 9h ago
I don't buy it. Open AI doesn't come close to passing my credibility check. I don't believe their metrics.
GaggiX · 8h ago
OpenAI is not the only company making LLMs, there are plenty now, you can use Gemini 2.5 Pro for example. And of course you can just try a SOTA model like Gemini 2.5 Pro for free, you don't have to trust anything.
brulard · 8h ago
You don't have to. Just try it yourself.
ainiriand · 12h ago
I've learnt Rust in 12 weeks with a study plan that ChatGPT designed for me, catering to my needs and encouraging me to take notes and write articles. This way of learning allowed me to publish https://rustaceo.es for Spanish speakers made from my own notes.
I think the potential in this regard is limitless.
koakuma-chan · 11h ago
I learned Rust in a couple of weeks by reading the book.
paxys · 9h ago
Yeah regardless of time taken the study plan for Rust already exists (https://doc.rust-lang.org/book/). You don't need ChatGPT to regurgitate it to you.
ainiriand · 6m ago
The key points that helped me, besides correlating sections to the corresponding chapters in the book, are the proposal of certain exercises every week to cover the topics we've seen and the encouragement to write small articles around the lessons.
I've already completed the Rustlings independently before this but it left me kind of lopsided and wanted to make this knowledge as full as possible.
koakuma-chan · 11h ago
But I agree though, I am getting insane value out of LLMs.
IshKebab · 9h ago
Doubtful. Unless you have very low standards of "learn".
koakuma-chan · 9h ago
What are your standards of learn?
BeetleB · 9h ago
Now this is a ringing endorsement. Specific stuff you learned, and actual proof of the outcome.
(Only thing missing is the model(s) you used).
ainiriand · 5m ago
Standard ChatGPT 4o.
nitwit005 · 6h ago
I'd tend to assume the null hypothesis, that if they were capable of learning it, they'd have likely done fine without the AI writing some sort of lesson plan for them.
The psychic reader near me has been in business for a long time. People are very convinced they've helped them. Logically, it had to have been their own efforts though.
ai_viewz · 7h ago
yes Chat GPT has helped me learn about actix web a framework similar to FastAPI in rust.
andix · 11h ago
Absolutely. I used to have a lot of weird IPv6 issues in my home network I didn't understand. ChatGPT helped me to dump some traffic with tcpdump and explained what was happening on the network.
In the process it helped me to learn many details about RA and NDP (Router Advertisments/Neighbor Discovery Protocol, which mostly replace DHCP and ARP from IPv4).
It made me realize that my WiFi mesh routers do quite a lot of things to prevent broadcast loops on the network, and that all my weird issues could be attributed to one cheap mesh repeater. So I replaced it and now everything works like a charm.
I had this setup for 5 years and was never able to figure out what was going on there, although I really tried.
mvieira38 · 10h ago
Would you say you were using the LLM as a tutor or as tech support, in that instance?
andix · 10h ago
Probably both. I think ChatGPT wouldn't have found the issue by itself. But I noticed some specific things, asked for some tutoring and then it helped my to find the issues. It was a team effort, either of "us" alone wouldn't have finished the job. ChatGPT had some really wrong ideas in the process.
kridsdale1 · 10h ago
I agree. I recently bought a broken Rolex and asked GPT for a list of tools I should get on Amazon to work on it.
I tried using YouTube to find walk through guides for how to approach the repair as a complete n00b and only found videos for unrelated problems.
But I described my issues and took photos to GPT O3-Pro and it was able to guide me and tell me what to watch out for.
I completed the repair (very proud of myself) and even though it failed a day later (I guess I didn’t re-seat well enough) I still feel far more confident opening it and trying again than I did at the start.
Cost of broken watch + $200 pro mode << Cost of working watch.
KaiserPro · 9h ago
what was broken on it?
threetonesun · 12h ago
> the only thing you can really do is keep paving forward and hope at some point, it'll make sense to you.
I find it odd that someone who has been to college would see this as a _bad_ way to learn something.
qualeed · 9h ago
"Keep paving forward" can sometimes be fruitful, and at other times be an absolutely massive waste of time.
I'm not sold on LLMs being a replacement, but post-secondary was certainly enriched by having other people to ask questions to, people to bounce ideas off of, people that can say "that was done 15 years ago, check out X", etc.
There were times where I thought I had a great idea, but it was based on an incorrect conclusion that I had come to. It was helpful for that to be pointed out to me. I could have spent many months "paving forward", to no benefit, but instead someone saved me from banging my head on a wall.
abeppu · 11h ago
In college sometimes asking the right question in class or in a discussion section led by a graduate student or in a study group would help me understand something. Sometimes comments from a grader on a paper would point out something I had missed. While having the diligence to keep at it until you understand is valuable, the advantage of college over just a pile of textbooks is in part that there are other resources that can help you learn.
BeetleB · 9h ago
Imagine you're in college, have to learn calculus, and you can't afford a textbook (nor can find a free one), and the professor has a thick accent and makes many mistakes.
Sure, you could pave forward, but realistically, you'll get much farther with either a good textbook or a good teacher, or both.
IshKebab · 9h ago
In college you can ask people who know the answer. It's not until PhD level that you have to struggle without readily available answers.
czhu12 · 10h ago
The main difference in college was that there were office hours
kelthuzad · 12h ago
I share your experience and view in that regard! There is so much criticism of LLMs and some of it is fair, like the problem of hallucinations, but that weakness can be reframed as a learning opportunity. It's like discussing a subject with a personal scientist who may at certain times test you, by making claims that may be simplistic or outright wrong, to keep the student skeptical and check if they are actually paying attention.
This requires a student to be actually interested in what they are learning tho, for others, who blindly trust its output, it can have adverse effects like the illusion of having understood a concept while they might have even mislearned it.
mym1990 · 8h ago
"It used to be that if you got stuck on a concept, you're basically screwed."
There seems to be a gap in problem solving abilities here...the process of breaking down concepts into easier to understand concepts and then recompiling has been around since forever...it is just easier to find those relationships now. To say it was impossible to learn concepts you are stuck on is a little alarming.
ZYbCRq22HbJ2y7 · 11h ago
> It used to be that if you got stuck on a concept, you're basically screwed
No, not really.
> Unless it was common enough to show up in a well formed question on stack exchange, it was pretty much impossible, and the only thing you can really do is keep paving forward and hope at some point, it'll make sense to you.
Your experience isn't universal. Some students learned how to do research in school.
johnfn · 11h ago
"Screwed" = spending hours sifting through poorly-written, vaguely-related documents to find a needle in a haystack. Why would I want to continue doing that?
ZYbCRq22HbJ2y7 · 10h ago
> "Screwed" = spending hours sifting through poorly-written, vaguely-related documents to find a needle in a haystack.
From the parent comment:
> it was pretty much impossible ... hope at some point, it'll make sense to you
Not sure where you are getting the additional context for what they meant by "screwed", but I am not seeing it.
johnfn · 7h ago
Personal experience from researching stuff?
fn-mote · 11h ago
I do a lot of research and independent learning. The way I translated “screwed” was “4-6 hours to unravel the issue”. And half the time the issue is just a misunderstanding.
It’s exciting when I discover I can’t replicate something that is stated authoritatively… which turns out to be controversial. That’s rare, though. I bet ChatGPT knows it’s controversial, too, but that wouldn’t be as much fun.
HPsquared · 11h ago
Like a car can be "beyond economical repair", a problem can be not worth the time (and uncertainty) or fixing. Especially from subjective judgement with incomplete information etc
Leynos · 10h ago
As you say, your experience isn't universal, and we all have different modes of learning that work best for us.
fkyoureadthedoc · 11h ago
They should have focused on social skills too I think
crims0n · 12h ago
I agree... spent last weekend chatting with an LLM, filling in knowledge gaps I had on the electromagnetic spectrum. It does an amazing job educating you on known unknowns, but I think being able to know how to ask the right questions is key. I don't know how it would do with unknown unknowns, which is where I think books really shine and are still a preferable learning method.
roughly · 7h ago
My rule with LLMs has been "if a shitty* answer fast gets you somewhere, the LLMs are the right tool," and that's where I've seen them for learning, too. There are times when I'm reading a paper, and there's a concept mentioned that I don't know - I could either divert onto a full Google search to try to find a reasonable summary, or I can ask ChatGPT and get a quick answer. For load-bearing concepts or knowledge, yes, I need to put the time in to actually research and learn a concept accurately and fully, but for things tangential to my actual current interests or for things I'm just looking at for a hobby, a shitty answer fast is exactly what I want.
I think this is the same thing with vibe coding, AI art, etc. - if you want something good, it's not the right tool for the job. If your alternative is "nothing," and "literally anything at all" will do, man, they're game changers.
* Please don't overindex on "shitty" - "If you don't need something verifiably high-quality"
vrotaru · 12h ago
You should always check. I've seen LLM's being wrong (and obstinate) on topics which are one step separated from common knowledge.
I had to post the source code to win the dispute, so to speak.
abenga · 11h ago
Why would you try to convince an LLM of anything?
layer8 · 10h ago
Often you want to proceed further based on a common understanding, so it’s an attempt to establish that common understanding.
vrotaru · 10h ago
Well, not exactly convince. I was curious what will happen.
If you are curious it was a question about the behavior of Kafka producer interceptors when an exception is thrown.
But I agree that it is hard to resist the temptation to treat LLM's as a pear.
globular-toast · 8h ago
Now think of all the times you didn't already know enough to go and find the real answer.
Ever read mainstream news reporting on something you actually know about? Notice how it's always wrong? I'm sure there's a name for this phenomenon. It sounds like exactly the same thing.
tonmoy · 9h ago
I don’t know what subject you are learning but for circuit design I have failed to get any response out of LLMs that’s not straight from a well known text book chapter that I have already read
IshKebab · 9h ago
It definitely depends heavily on how well represented the subject is on the internet at large. Pretty much every question I've asked it about SystemVerilog it gets wrong, but it can be very helpful about quite complex things about random C questions, for example why I might get undefined symbol errors with `inline` functions in C but only in debug mode.
On the other hand it told me you can't execute programs when evaluating a Makefile and you trivially can. It's very hit and miss. When it misses it's rather frustrating. When it hits it can save you literally hours.
archon810 · 5h ago
I was recently researching and repairing an older machine with a 2020 Intel Gen 9 CPU and a certain socket motherboard, and AI made it so much easier and pleasant to find information and present answers about various generations and sockets and compatibility, I felt like I didn't deserve this kind of tool. LLMs are not great for some things, but amazing for others.
ploxiln · 4h ago
Maybe TAs are a good metaphor. Back in college, the classmates who went to TAs for help multiple times every week, really didn't get the material. I literally never went to a TA for help in my life, and learned the material much better by really figuring it out myself, "the hard way" (the only way?). These were math, EE, and CS courses.
wiz21c · 11h ago
I use it to refresh some engineering maths I have forgotten (ODE, numerical schemas, solving linear equations, data sciences algorithms, etc) and the explanations are most of the time great and usually 2 or 3 prompts give me a good overview and explain the tricky details.
I also use it to remember some python stuff. In rust, it is less good: makes mistakes.
In those two domains, at that level, it's really good.
It could help students I think.
yyyk · 6h ago
Everything you state was available in the net. Did the people grow more informed? So far practice suggests the opposite conclusion[0]. I hope for the best, but the state of the world so far doesn't justify it...
It's one more step on the path to A Young Lady's Illustrated Primer. Still a long way to go, but it's a burden off my shoulders to be able to ask stupid questions without judgment or assumptions.
cs_throwaway · 7h ago
I agree. We are talking about technical, mathy stuff, right?
As long as you can tell that you don’t deeply understand something that you just read, they are incredible TAs.
The trick is going to be to impart this metacognitive skill on the average student. I am hopeful we will figure it out in the top 50 universities.
tekno45 · 11h ago
how are you checking its correctness if you're learning the topic?
signatoremo · 11h ago
The same way you check if you learn in any other ways? Cross referencing, asking online, trying it out, etc.
tekno45 · 3h ago
We're giving this to children who inherently don't have those skills.
ZYbCRq22HbJ2y7 · 11h ago
This is important, as benchmarks indicate we aren't at a level where a LLM can truly be relied upon to teach topics across the board.
It is hard to verify information that you are unfamiliar with. It would be like learning from a message board. Can you really trust what is being said?
qualeed · 8h ago
>we aren't at a level where a LLM can truly be relied upon to teach topics across the board.
You can replace "LLM" here with "human" and it remains true.
Anyone who has gone to post-secondary has had a teacher that relied on outdated information, or filled in gaps with their own theories, etc. Dealing with that is a large portion of what "learning" is.
I'm not convinced about the efficacy of LLMs in teaching/studying. But it's foolish to think that humans don't suffer from the same reliability issue as LLMs, at least to a similar degree.
ZYbCRq22HbJ2y7 · 5h ago
Sure, humans aren't without flaws in this area. However, in real time, humans can learn and correct themselves, we can check eachother, ask for input, etc, and not continue to make mistakes. This isn't the case with LLMs as a service.
For example, even if you craft the most detailed cursor rules, hooks, whatever, they will still repeatedly fuck up. They can't even follow a style guide. They can be informed, but not corrected.
Those are coding errors, and the general "hiccups" that these models experience all the time are on another level. The hallucinations, sycophancy, reward hacking, etc can be hilariously inept.
IMO, that should inform you enough to not trust these services (as they exist today) in explaining concepts to you that you have no idea about.
If you are so certain you are okay to trust these things, you should evaluate every assertion it makes for, say, 40 hours of use, and count the error rate. I would say it is above 30%, in my experience of using language models day to day. And that is with applied tasks they are considered "good" at.
If you are okay with learning new topics where even 10% of the instruction is wrong, have fun.
Eisenstein · 10h ago
What is the solution? Toss out thousands of years of tested pedagogy which shows that most people learn by trying things, asking questions, and working through problems with assistance and instead tell everyone to read a textbook by themselves and learn through osmosis?
So what if the LLM is wrong about something. Human teachers are wrong about things, you are wrong about things, I am wrong about things. We figure it out when it doesn't work the way we thought and adjust our thinking. We aren't learning how to operate experimental nuclear reactors here, where messing up results in half a country getting irradiated. We are learning things for fun, hobbies, and self-betterment.
kelvinjps10 · 8h ago
If it's coding you can compile or test your program. For other things you can go to primary sources
GeoAtreides · 9h ago
I'll personally attest anecdotes mean little in sound arguments.
When I got stuck on a concept, I wasn't screwed: I read more; books if necessary. StackExchange wasn't my only source.
LLMs are not like TAs, personal or not, in the same way they're not humans. So it then follows we can actually contemplate not using LLMs in formal teaching environments.
brulard · 8h ago
Sometimes you don't have tens of hours to spend on a single problem you can not figure out.
loloquwowndueo · 9h ago
> It used to be that if you got stuck on a concept, you're basically screwed. Unless it was common enough to show up in a well formed question on stack exchange,
It’s called basic research skills - don’t they teach this anymore in high school, let alone college? How ever did we get by with nothing but an encyclopedia or a library catalog?
axoltl · 9h ago
Something is lost as well if you do 'research' by just asking an LLM. On the path to finding your answer in the encyclopedia or academic papers, etc. you discover so many things you weren't specifically looking for. Even if you don't fully absorb everything there's a good chance the memory will be triggered later when needed: "Didn't I read about this somewhere?".
ewoodrich · 8h ago
Yep, this is why I just don’t enjoy or get much value from exploring new topics with LLMs. Living in the Reddit factoid/listicle/TikTok explainer internet age my goal for years (going back well before ChatGPT hit the scene) has been to seek out high quality literature or academic papers for the subjects I’m interested in.
I find it so much more intellectually stimulating then most of what I find online.
Reading e.g. a 600 page book about some specific historical event gives me so much more perspective and exposure to different aspects I never would have thought to ask about on my own, or would have been elided when clipped into a few sentence summary.
I have gotten some value out of asking for book recommendations from LLMs, mostly as a starting point I can use to prune a list of 10 books down into a 2 or 3 after doing some of my research on each suggestion. But talking to a chatbot to learn about a subject just doesn’t do anything for me for anything deeper than basic Q&A where I simply need a (hopefully) correct answer and nothing more.
loloquwowndueo · 8h ago
LLMs hallucinate too much and too frequently for me to put any trust in their (in)ability to help with research.
BDPW · 7h ago
Its a little disingenuous to say that, most of us would have never gotten by with literally just a library catalog and encyclopedia. Needing a community to learn something in is needed to learn almost anything difficult and this has always been the case. That's not just about fundamentally difficult problems but also about simple misunderstandings.
If you don't have access to a community like that learning stuff in a technical field can be practically impossible. Having an llm to ask infinite silly/dumb/stupid questions can be super helpful and save you days of being stuck on silly things, even though it's not perfect.
loloquwowndueo · 5h ago
Wait until you waste days down a hallucination-induced LLM rabbit hole.
> most of us would have never gotten by with literally just a library catalog and encyclopedia.
I meant the opposite, perhaps I phrased it poorly. Back in the day we would get by and learn new shit by looking for books on the topic and reading them (they have useful indices and tables of contents to zero in on what you need and not have to read the entire book). An encyclopedia was (is? Wikipedia anyone?) a good way to get an overview of a topic and the basics before diving into a more specialized book.
dcbb65b2bcb6e6a · 12h ago
> LLM's have been absolutely incredible to self learn new things post graduation.
I haven't tested them on many things. But in the past 3 weeks I tried to vibe code a little bit VHDL. On the one hand it was a fun journey, I could experiment a lot and just iterated fast. But if I was someone who had no idea about hardware design, then this trash would've guided me the wrong way in numerous situations. I can't even count how many times it has built me latches instead of clocked registers (latches bad, if you don't know about it) and that's just one thing.
Yes I know there ain't much out there (compared to python and javascript) about HDLs, even less regarding VHDL. But damn, no no no. Not for learning. never. If you know what you're doing and you have some fundamental knowledge about the topic, then it might help to get further, but not for the absolute essentials, that will backfire hard.
avn2109 · 12h ago
LLM's are useful because they can recommend several famous/well-known books (or even chapters of books) that are relevant to a particular topic. Then you can also use the LLM to illuminate the inevitable points of confusion and shortcomings in those books while you're reading and synthesizing them.
Pre-LLM, even finding the ~5 textbooks with ~3 chapters each that decently covered the material I want was itself a nontrivial problem. Now that problem is greatly eased.
ZYbCRq22HbJ2y7 · 11h ago
> they can recommend several famous/well-known books
They can recommend many unknown books as well, as language models are known to reference resources that do not exist.
nilamo · 11h ago
And then when you don't find it, you move onto the next book. Problem solved!
throwaway290 · 6h ago
Or you know what, just google books about some topic and get a list of... real books recommended by people with names and reputations? Its truly incredible!
nilamo · 2h ago
And now we get all the way back to the OP, and having so little knowledge on the subject that you don't know what to Google, or which forums are trustworthy for that topic. And so the wheel turns...
throwaway290 · 1h ago
if you need a word you don't know then read overview of the bigger topic, or yoloing google with approximate queries usually helps find the word...
jennyholzer · 8h ago
I strongly prefer curated recommendations from a person with some sort of credibility in a subject area that interests me.
mathattack · 11h ago
I've found LLMs to be great in summarizing non-controversial non-technical bodies of knowledge. For example - the facts in the long swings of regional histories. You have to ask for nuance and countervailing viewpoints, though you'll get them if they're in there.
i_am_proteus · 9h ago
>Now, everyone basically has a personal TA, ready to go at all hours of the day.
And that's a bad thing. Nothing can replace the work in learning, the moments where you don't understand it and have to think until it hurts and until you understand. Anything that bypasses this (including, for uni students, leaning too heavily on generous TAs) results in a kind of learning theatre, where the student thinks they've developed an understanding, but hasn't.
Experienced learners already have the discipline to use LLMs without asking too much of them, the same way they learned not to look up the answer in the back of the textbook until arriving at their own solution.
No comments yet
globular-toast · 10h ago
IMO your problem is the same as many people these days: you don't own any books and refuse to get them.
lottin · 8h ago
Yes. Learning assistance is one of the few use cases of IA that I have had success with.
ants_everywhere · 7h ago
I'm curious what you've used it to learn
andrepd · 8h ago
A "TA" which has only the knowledge which is "common enough to show up in a well formed question on stack exchange"...
And which just makes things up (with the same tone and confidence!) at random and unpredictable times.
Yeah apart from that it's just like a knowledgeable TA.
belter · 10h ago
Depending on context, I would advise you to be extremely careful. Modern LLMs are Gell‑Mann Amnesia to the square. Once you watched a LLM butcher a topic you know extremely well, it is spooky how much authority they still project on the next interaction.
lmc · 12h ago
> I'll personally attest: LLM's have been absolutely incredible to self learn new things post graduation.
How do you know when it's bullshitting you though?
sejje · 11h ago
All the same ways I know when Internet comments, outdated books, superstitions, and other humans are bullshitting me.
Sometimes right away, something sounds wrong. Sometimes when I try to apply the knowledge and discover a problem. Sometimes never, I believe many incorrect things even today.
nilamo · 11h ago
When you Google the new term it gives you and you get good results, you know it wasn't made up.
Since when was it acceptable to only ever look at a single source?
mcmcmc · 12h ago
That’s the neat part, you don’t!
jahewson · 11h ago
Same way you know for humans?
azemetre · 11h ago
But an LLM isn't a human, with a human you can read body language or look up their past body of work. How do you do his with against an LLM
andix · 10h ago
Many humans tell you bullshit, because they think it's the truth and factually correct. Not so different to LLMs.
throwaway290 · 6h ago
I really don't get it. Literally the only thing you need to do research is know what term to look up and you get at a bunch of info written by real humans
iLoveOncall · 8h ago
> It used to be that if you got stuck on a concept, you're basically screwed.
Given that humanity has been able to go from living in caves to sending spaceships to the moon without LLMs, let me express some doubt about that.
Even without going further, software engineering isn't new and people have been stuck on concepts and have managed to get unstuck without LLMs for decades.
What you gain in instant knowledge with LLMs, you lose in learning how to get unstuck, how to persevere, how to innovate, etc.
bossyTeacher · 10h ago
LLMs are to learning what self driving cars are to transportation. They take you to the destination most of the time. But the problem is that if you use them too much your brain (your legs) undergoes metaphorical atrophy and when you are faced in the position of having to do it on your own, you are worse than you would be had you spent the time using your brain (legs). Learning is great but learning to learn is the real skilset. You don't develop that if you are always getting spoonfed.
pyman · 9h ago
This is one of the challenges I see with self-driving cars. Driving requires a high level of cognitive processing to handle changing conditions and potential hazards. So when you drive most of your brain is engaged. The impact self-driving cars are going to have on mental stimulation, situational awareness, and even long-term cognitive health could be bigger than we think, especially if people stop engaging in tasks that keep those parts of the brain active. That said, I love the idea of my car driving me around the city while I play video games.
Regarding LLMs, they can also stimulate thinking if used right.
holsta · 12h ago
> It used to be that if you got stuck on a concept, you're basically screwed.
We were able to learn before LLMs.
Libraries are not a new thing. FidoNet, USENET, IRC, forums, local study/user groups. You have access to all of Wikipedia. Offline, if you want.
sejje · 12h ago
I learned how to code using the library in the 90s.
I think it's accurate to say that if I had to do that again, I'm basically screwed.
Asking the LLM is a vastly superior experience.
I had to learn what my local library had, not what I wanted. And it was an incredible slog.
IRC groups is another example--I've been there. One or two topics have great IRC channels. The rest have idle bots and hostile gatekeepers.
The LLM makes a happy path to most topics, not just a couple.
no_wizard · 12h ago
>Asking the LLM is a vastly superior experience.
Not to be overly argumentative, but I disagree, if you're looking for a deep and ongoing process, LLMs fall down, because they can't remember anything and can't build upon itself in that way. You end up having to repeat alot of stuff. They also don't have good course correction (that is, if you're going down the wrong path, it doesn't alert you, as I've experienced)
It also can give you really bad content depending on what you're trying to learn.
I think for things that represent themselves as a form of highly structured data, like programming languages, there's good attunement there, but you start talking about trying to dig around about advanced finance, political topics, economics, or complex medical conditions the quality falls off fast, if its there at all
sejje · 12h ago
I used llms to teach me a programming language recently.
It was way nicer than a book.
That's the experience I'm speaking from. It wasn't perfect, and it was wrong sometimes, sure. A known limitation.
But it was flexible, and it was able to do things like relate ideas with programming languages I already knew. Adapt to my level of understanding. Skip stuff I didn't need.
Incorrect moments or not, the result was i learned something quickly and easily. That isn't what happened in the 90s.
dcbb65b2bcb6e6a · 11h ago
> and it was wrong sometimes, sure. A known limitation.
But that's the entire problem and I don't understand why it's just put aside like that. LLMs are wrong sometimes, and they often just don't give you the details and, in my opinion, knowing about certain details and traps of a language is very very important, if you plan on doing more with it than just having fun. Now someone will come around the corner and say 'but but but it gives you the details if you explicitly ask for them'. Yes, of course, but you just don't know where important details are hidden, if you are just learning about it. Studying is hard and it takes perseverance. Most textbooks will tell you the same things, but they all still differ and every author usually has a few distinct details they highlight and these are the important bits that you just won't get with an LLM
sejje · 11h ago
It's not my experience that there are missing pieces as compared to anything else.
Nobody can write an exhaustive tome and explore every feature, use, problem, and pitfall of Python, for example. Every text on the topic will omit something.
It's hardly a criticism. I don't want exhaustive.
The llm taught me what I asked it to teach me. That's what I hope it will do, not try to caution me about everything I could do wrong with a language. That list might be infinite.
ZYbCRq22HbJ2y7 · 11h ago
> It's not my experience that there are missing pieces as compared to anything else.
How can you know this when you are learning something? It seems like a confirmation bias to even have this opinion?
ayewo · 9h ago
That's easy. It's due to a psychological concept called: transfer of learning [0].
Perhaps the most famous example of this is Warren Buffet. For years Buffet missed out on returns from the tech industry [1] because he avoided investing in tech company stocks due to Berkshire's long standing philosophy to never invest in companies whose business model he doesn't understand.
His light bulb moment came when he used his understanding of a business he understood really well i.e. their furniture business [3] to value Apple as a consumer company rather than as a tech company leading to a $1bn position in Apple in 2016 [2].
I'd gently point out we're 4 questions into "what about if you went about it stupidly and actually learned nothing?"
It's entirely possible they learned nothing and they're missing huge parts.
But we're sort of at the point where in order to ignore their self-reported experience, we're asking philosophical questions that amount to "how can you know you know if you don't know what you don't know and definitely don't know everything?"
More existentialism than interlocution.
If we decide our interlocutor can't be relied upon, what is discussion?
Would we have the same question if they said they did it from a book?
If they did do it from a book, how would we know if the book they read was missing something that we thought was crucial?
ZYbCRq22HbJ2y7 · 10h ago
I didn't think that was what was being discussed.
I was attempting to imply that with high-quality literature, it is often reviewed by humans who have some sort of knowledge about a particular topic or are willing to cross reference it with existing literature. The reader often does this as well.
For low-effort literature, this is often not the case, and can lead to things like https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect where a trained observer can point out that something is wrong, but an untrained observer cannot perceive what is incorrect.
IMO, this is adjacent to what human agents interacting with language models experience often. It isn't wrong about everything, but the nuance is enough to introduce some poor underlying thought patterns while learning.
dcbb65b2bcb6e6a · 11h ago
You are right and that's my point. To me it just feels like that too many people think LLMs are the holy grail for learning. No, you still have to study a lot. Yes, it can be easier than it was.
gbalduzzi · 11h ago
Your other responses kinda imply that you believe LLMs are not good for learning.
That's totally different than saying they are not flawless but they make learning easier than other methods, like you did in this comment
smokel · 9h ago
Most LLM user interfaces, such as ChatGPT, do have a memory. See Settings, Personalization, Manage Memories.
no_wizard · 5h ago
Sure, but there are limits here. Thats what I'm talking about, limits. The memory isn't infinitely expansive. I still have found it doesn't backtrack well enough to "remember" (for lack of a better term) that it told me something already, if its old enough, for example.
It also doesn't seem to do a good job of building on "memory" over time. There appears to be some unspoken limit there, or something to that affect.
gertlex · 12h ago
Agreed, I'd add to the statement, "you're basically screwed, often, without investing a ton of time (e.g. weekends)"
Figuring out 'make' errors when I was bad at C on microcontrollers a decade ago? (still am) Careful pondering of possible meanings of words... trial and error tweaks of code and recompiling in hopes that I was just off by a tiny thing, but 2 hours later and 30 attempts later, and realizing I'd done a bad job of tracking what I'd tried and hadn't? Well, made me better at being careful at triaging issues. But it wasn't something I was enthusiastic to pick back up the next weekend, or for the next idea I had.
Revisiting that combination of hardware/code a decade later and having it go much faster with ChatGPT... that was fun.
gbalduzzi · 11h ago
Are we really comparing this research to just writing and having a good answer in a couple of seconds?
Like, I agree with you and I believe those things will resist and will always be important, but it doesn't really compare in this case.
Last week I was in the nature and I saw a cute bird that I didn't know. I asked an AI and got the correct answer in 10 seconds.
Of course I would find the answer at the library or by looking at proper niche sites, but I would not have done it because I simply didn't care that much. It's a stupid example but I hope it makes the point
holsta · 9h ago
There's a gigantic difference between outsourcing your brain to generative AI (LLMs, Stable Diffusion, ..) and pattern recognition that recognises songs, birds, plants or health issues.
Xenoamorphous · 11h ago
It’s not an or/either situation.
BeetleB · 9h ago
> We were able to learn before LLMs.
We were able to learn before the invention of writing, too!
voidhorse · 2h ago
I haven't used LLMs too much for study yet, so maybe they really are force multipliers, but I completely disagree with your assessment of self-directed learning pre-llm, the paving forward part isn't so dire.
The internet, and esp. stack exchange is a horrible place to learn concepts. For basic operational stuff, sure that works, but one should mostly be picking up concepts form books and other long form content. When you get stuck it's time to do three things:
Incorporate a new source that covers the same material in a different way, or at least from a different author.
Sit down with the concept and write about it and actively try to reformulate it and everything you do/don't understand in your own words.
Take a pause and come back later.
Usually one of these three strategies does the trick, no llm required. Obviously these approaches require time that using an LLM wouldn't. I have a suspicion doing it this way will also make it stick in long term memory better, but that's just a hunch.
JTbane · 12h ago
Nah I'm calling BS, for me self-learning after college is either Just Do It(tm) trial-and-error, blogs, or hitting the nonfiction section of the library.
Barrin92 · 11h ago
>Unless it was common enough to show up in a well formed question on stack exchange, it was pretty much impossible
sorry but if you've gone to university, in particular at a time when internet access was already ubiquitous, surely you must have been capable to find an answer to a programming problem by consulting documentation, manual, or tutorials which exist on almost any topic.
I'm not saying the chatbot interface is necessarily bad, it might be more engaging, but it literally does not present you with information you couldn't have found yourself.
If someone has a computer science degree and tells me without stack exchange they can't find solutions to basic problems that is a red flag. That's like the article about the people posted here who couldn't program when their LLM credits ran out
eternauta3k · 12h ago
You can always ask in stack exchange, IRC or forums.
wiseowise · 12h ago
Closed: duplicate
Closed: RTFM, dumbass
<No activity for 8 years, until some random person shows up and asks "Hey did you figure it out?">
FredPret · 12h ago
Or even worse, you ask an "xyz" question in the "xyz" StackExchange, then immediately get flagged as off-topic
atoav · 12h ago
My favourite moment was when I tried to figure a specific software issue out that had to do with obscure hardware and after hours I found one forum post detailing the solution with zero replies. And it turns out I wrote it myself, years prior and had forgotten about it.
sejje · 11h ago
I googled a command line string to do XYZ thing once, and found my own blog post.
I really do write that stuff for myself, turns out.
QuercusMax · 12h ago
I had a similar experience involving something dealing with RSA encryption on iOS.
dizhn · 12h ago
"Nevermind I figured it out"
Rooster61 · 12h ago
On IRC>
Newb: I need help with <thing>. Does anyone have any experience with this?
J. Random Hacker: Why are you doing it like that?
Newb: I have <xyz> constraint in my case that necessitates this.
J. Random Hacker: This is a stupid way to do it. I'm not going to help you.
precompute · 11h ago
This is the way to go.
aucisson_masque · 2m ago
When I studied I found out that being able to speak with a fellow student of the subject we needed to learn was so much more effective than the usual method of reading /trying to remember over and over.
So much that first method would take me an hour as opposed to an entire evening when reading/repeating.
Having such a tool would have been a game changer to me.
I don’t know tho if it’s possible to throw at it entire chapter of learning book.
> DO NOT GIVE ANSWERS OR DO HOMEWORK FOR THE USER. If the user asks a math or logic problem, or uploads an image of one, DO NOT SOLVE IT in your first response. Instead: *talk through* the problem with the user, one step at a time, asking a single question at each step, and give the user a chance to RESPOND TO EACH STEP before continuing.
mkagenius · 9h ago
I wish each LLM provider would add "be short and not verbose" to their system prompts. I am a slow reader, it takes a toll on me to read through every non-important detail whenever I talk to an AI. The way they render everything so fast gives me an anxiety.
Will also reduce the context rot a bit.
tech234a · 9h ago
This was in the linked prompt: "Be warm, patient, and plain-spoken; don't use too many exclamation marks or emoji. [...] And be brief — don't ever send essay-length responses. Aim for a good back-and-forth."
draebek · 5h ago
I was under the impression that, at least for models without "reasoning", asking them to be terse hampered their ability to give complete and correct answers? Not so?
mptest · 7h ago
Anthropic has a "style" choice, one of which is "concise"
skybrian · 9h ago
On ChatGPT at least, you can add "be brief" to the custom prompt in your settings. Probably others, too.
mkagenius · 9h ago
I guess what I actually meant to say was to make LLMs know when to talk more and when to be brief. When I ask it to write an essay, it should actually be an essay length essay.
gh0stcat · 10h ago
I love that caps actually seem to matter to the LLM.
simonw · 10h ago
Hah, yeah I'd love to know if OpenAI ran evals that were fine-grained enough to prove to themselves that putting that bit in capitals made a meaningful difference in how likely the LLM was to just provide the homework answer!
danenania · 8h ago
I've found that a lot of prompt engineering boils down to managing layers of emphasis. You can use caps, bold, asterisks, precede instructions with "this is critically important:", and so on. It's also often necessary to repeat important instructions a bunch of times.
How exactly you do it is often arbitrary/interchangeable, but it definitely does have an effect, and is crucial to getting LLMs to follow instructions reliably once prompts start getting longer and more complex.
nixpulvis · 8h ago
Just wait until it only responds to **COMMAND**!
SalariedSlave · 8h ago
I'd be interested to see, what results one would get, using that prompt with other models. Is there much more to ChatGPT Study Mode than a specific system prompt? Although I am not a student, I have used similar prompts to dive into topics I wish to learn, with I feel, positive results indeed. I shall give this a go with a few models.
bangaladore · 7h ago
I just tried in AI Studio (https://aistudio.google.com/) where you can for free use 2.5 Pro and edit the system prompt and it did very well.
varenc · 8h ago
Interesting that it spits the instructions out so easily and OpenAI didn't seem to harden it to prevent this. It's like they intended this to happen, but for some reason didn't want to share the system instructions explicitly.
If I were OpenAI, I would deliberately "leak" this prompt when asked for the system prompt as a honeypot to slow down competitor research whereas I'd be using a different prompt behind the scenes.
Not saying it is indeed reality, but it could simple be programmed to return a different prompt from the original, appearing plausible, but perhaps missing some key elements.
But of course, if we apply Occam's Razor, it might simply really be the prompt too.
simonw · 9h ago
That kind of thing is surprisingly hard to implement. To date I've not seen any provider been caught serving up a fake system prompt... which could mean that they are doing it successfully, but I think it's more likely that they determined it's not worth it because there are SO MANY ways someone could get the real one, and it would be embarrassing if they were caught trying to fake it.
Tokens are expensive. How much of your system prompt do you want to waste on dumb tricks trying to stop your system prompt from leaking?
danenania · 8h ago
Probably the only way to do it reliably would be to intercept the prompt with a specially trained classifier? I think you're right that once it gets to the main model, nothing really works.
poemxo · 11h ago
As a lifelong learner, experientially it feels like a big chunk of time spent studying is actually just searching. AI seems like a good tool to search through a large body of study material and make that part more efficient.
The other chunk of time, to me anyway, seems to be creating a mental model of the subject matter, and when you study something well you have a strong grasp on the forces influencing cause and effect within that matter. It's this part of the process that I would use AI the least, if I am to learn it for myself. Otherwise my mental model will consist of a bunch of "includes" from the AI model and will only be resolvable with access to AI. Personally, I want a coherent "offline" model to be stored in my brain before I consider myself studied up in the area.
ethan_smith · 23m ago
Spaced repetition systems would be the perfect complement to your approach - they're specifically designed to help build that "offline" mental model by systematically moving knowledge from AI-assisted lookup to permanent memory.
lbrito · 10h ago
>big chunk of time spent studying is actually just searching.
This is a good thing in many levels.
Learning how to search is (was) a good skill to have. The process of searching itself also often leads to learning tangentially related but important things.
I'm sorry for the next generations that won't have (much of) these skills.
sen · 7h ago
That was relevant when you were learning to search through “information” for the answer to your question, eg the digital version of going through the library or digging through a reference book.
I don’t think it’s so valuable now that you’re searching through piles of spam and junk just to try find anything relevant. That’s a uniquely modern-web thing created by Google in their focus of profit over user.
Unless Google takes over libraries/books next and sells spots to advertisers on the shelves and in the books.
ImaCake · 5h ago
> searching through piles of spam and junk
In the same way that I never learnt the Dewey decimal system because digital search had driven it obsolete. It may be that we just won't need to do as much sifting through spam in the future, but being able to finesse Gemini into burping out the right links becomes increasingly important.
ascorbic · 9h ago
Searching is definitely a useful skill, but once you've been doing it for years you probably don't need the constant practice and are happy to avoid it.
ieuanking · 6h ago
yeah this is literally why I built -- app.ubik.studio -- searching is everything, and understanding what you are reading is more important than conversing with a chatbot. i cannot even imagine being a student in 2025, especially at 14 years old omg would be so hard not to just cheat on everything
thorum · 7h ago
Isn’t the goal of Study Mode exactly that, though? Instead of handing you the answers, it tries to guide you through answering it on your own; to teach the process.
Most people don’t know how to do this.
marcusverus · 10h ago
This is just good intellectual hygiene. Delegating your understanding is the first step toward becoming the slave of some defunct fact broker.
throwawaysleep · 11h ago
Or just to dig up things you’ve never would’ve considered that are related, but you don’t have to keywords for.
jryio · 12h ago
I would like to see randomized control group studies using study mode.
Does it offer meaningful benefits to students over self directed study?
Does it out perform students who are "learning how to learn"?
What affect does allowing students to make mistakes have compared to being guided through what to review?
I would hope Study Mode would produce flash card prompts and quantize information for usage in spaced repetition tools like Mochi [1] or Anki.
It doesn’t do any of that, it just captures the student market more.
They want a student to use it and say “I wouldn’t have learned anything without study mode”.
This also allows them to fill their data coffers more with bleeding edge education. “Please input the data you are studying and we will summarize it for you.”
LordDragonfang · 11h ago
> It doesn’t do any of that
Not to be contrarian, but do you have any evidence of this assertion? Or are you just confidently confabulating a response for something outside of the data you've been exposed to? Because a commentor below provided a study that directly contradicts this.
righthand · 11h ago
A study that directly contradicts what exactly?
precompute · 11h ago
Bingo. The scale they're operating at, new features don't have to be useful, they only need to look like they are for the first few minutes.
This isn't study mode, it's a different AI tutor, but:
"The median learning gains for students, relative to the pre-test baseline (M = 2.75, N = 316), in the AI-tutored group were over double those for students in the in-class active learning group."
Aachen · 8h ago
I wonder how much this was a factor:
"The occurrence of inaccurate “hallucinations” by the current [LLMs] poses a significant challenge for their use in education. [...] we enriched our prompts with comprehensive, step-by-step answers, guiding the AI tutor to deliver accurate and high-quality explanations (v) to students. As a result, 83% of students reported that the AI tutor’s explanations were as good as, or better than, those from human instructors in the class."
Not at all dismissing the study, but to replicate these results for yourself, this level of gain over a classroom setting may be tricky to achieve without having someone make class materials for the bot to present to you first
Edit: the authors further say
"Krupp et al. (2023) observed limited reflection among students using ChatGPT without guidance, while Forero (2023) reported a decline in student performance when AI interactions lacked structure and did not encourage critical thinking. These previous approaches did not adhere to the same research-based best practices that informed our approach."
Two other studies failed to get positive results at all. YMMV a lot apparently (like, all bets are off and your learning might go in the negative direction if you don't do everything exactly as in this study)
purplerabbit · 6h ago
In case you find it interesting: I deployed an early version of a "lesson administering" bot deployed on a college campus that guides students through tutored activities of content curated by a professor in the "study mode" style -- that is, forcing them to think for themselves. We saw an immediate student performance gain on exams of about 1 stdev in the course. So with the right material and right prompting, things are looking promising.
energy123 · 3h ago
OpenAI should figure out how to onboard teachers. Teacher uploads context for the year, OpenAI distributes a chatbot to the class that's perma fixed into study mode. Basically like GPT store but with an interface and UX tuned for a classroom.
posix86 · 11h ago
There's studies showing that LLM makes experienced devs slower in their work. I wouldn't be surprised if it was the same for self study.
However consider the extent to which LLMs make the learning process more enjoyable. More students will keep pushing because they have someone to ask. Also, having fun & being motivated is such a massive factor when it comes to learning. And, finally, keeping at it at 50% the speed for 100% the material always beats working at 100% the speed for 50% the material. Who cares if you're slower - we're slower & faster without LLMs too! Those that persevere aren't the fastest; they're the ones with the most grit & discipline, and LLMs make that more accesible.
SkyPuncher · 9h ago
The study you're referencing doesn't make that conclusion.
It concludes theres a learning curve that generally takes about 50 hours of time to figure out. The data shows that the one engineer who had more than 50 hours of experience with Cursor actually worked faster.
This is largely my experience, now. I was much slower initially, but I've now figured out the correct way to prompt, guide, and fix the LLM to be effective. I produce way more code and am mentally less fatigued at the end of each day.
snewman · 10h ago
I presume you're referring to the recent METR study. One aspect of the study population, which seems like an important causal factor in the results, is that they were working in large, mature codebases with specific standards for code style, which libraries to use, etc. LLMs are much better at producing "generic" results than matching a very specific and idiosyncratic set of requirements. The study involved the latter (specific) situation; helping people learn mainstream material seems more like the former (generic) situation.
(Qualifications: I was a reviewer on the METR study.)
bretpiatt · 10h ago
*slower with Sonnet 3.7 on large open source code bases where the developer is a senior member of the project core team.
I believe we'll see the benefits and drawbacks of AI augmentation to humans performing various tasks will vary wildly based on the task, the way the AI is being asked to interact, and the AI model.
graerg · 10h ago
People keep citing this study (and it was on the top of HN for a day). But this claim falls flat when you find out that the test subjects had effectively no experience with LLM equipped editors and the 1-2 people in the study that actually did have experience with these tools showed a marked increase in productivity.
Like yeah, if you’ve only ever used an axe you probably don’t know the first thing about how to use a chainsaw, but if you know how to use a chainsaw you’re wiping the floor with the axe wielders. Wholeheartedly agree with the rest of your comment; even if you’re slow you lap everyone sitting on the couch.
daedrdev · 7h ago
It was a 16 person study on open source devs that found 50 hours of experience with the tool made people more productive
viccis · 11h ago
I would be interested to see if there have already been studies about the efficacy of tutors at good colleges. In my experience (in academia), the students who make it into an Ivy or an elite liberal arts school make extensive use of tutor resources, but not in a helpful way. They basically just get the tutor to work problems for them (often their homework!) and feel like they've "learned" things because tough questions always seems so obvious when you've been shown the answer. In reality, what it means it that they have no experience being confused or having to push past difficult things they were stuck on. And those situations are some of the most valuable for learning.
I bring this up because the way I see students "study" with LLMs is similar to this misapplication of tutoring. You try something, feel confused and lost, and immediately turn to the pacifier^H^H^H^H^H^H^H ChatGPT helper to give you direction without ever having to just try things out and experiment. It means students are so much more anxious about exams where they don't have the training wheels. Students have always wanted practice exams with similar problems to the real one with the numbers changed, but it's more than wanting it now. They outright expect it and will write bad evals and/or even complain to your department if you don't do it.
I'm not very optimistic. I am seeing a rapidly rising trend at a very "elite" institution of students being completely incapable of using textbooks to augment learning concepts that were introduced in the classroom. And not just struggling with it, but lashing out at professors who expect them to do reading or self study.
apwell23 · 11h ago
it makes difference to students who are already motivated. that was the case with youtube.
unfortunately that group is tiny and getting tinier due to dwindling attention span.
CobrastanJorji · 12h ago
Come on. Asking an educational product to do a basic sanity test as to whether it helps is far too high a bar. Almost no educational app does that sort of thing.
tempfile · 12h ago
I would also be interested to see whether it outperforms students doing literally nothing.
roadside_picnic · 11h ago
My key to LLM study has been to always primarily use a book and then let the LLM allow you to help with formulae, ask questions about the larger context, and verify your understanding.
Helping you parse notation, especially in new domains, is insanely valuable. I do a lot of applied math in statistics/ML, but when I open a physics book the notation and comfort with short hand is a real challenge (likewise I imagine the reverse is equally as annoying). Having an LLM on demand to instantly clear up notation is a massive speed boost.
Reading German Idealist philosophy requires an enormous amount of context. Being able to ask an LLM questions like "How much of this section of Mainländer is coming directly from Schopenhauer?" is a godsend in helping understand which parts of the writing a merely setting up what is already agreed upon vs laying new ground.
And the most important for self study: verifying your understanding. Backtracking because you misunderstood a fundamental concept is a huge time sync in self study. Now, every time I read a formula I can go through all of my intuitions and understanding about it, write them down, and verify. Even a "not quite..." from an LLM is enough to make me realize I need to spend more time on that section.
Books are still the highest density information source and best way to learn, but LLMs can do a lot to accelerate this.
Workaccount2 · 12h ago
An acquaintance of mine has a start-up in this space and uses OpenAI to do essentially the same thing. This must look like, and may well be, the guillotine for him...
It's my primary fear building anything on these models, they can just come eat your lunch once it looks yummy enough. Tread carefully
mpalmer · 3h ago
No disrespect to your acquaintance, but when I heard about this, I didn't think "oh a lot of startups are gonna go under", I thought "OAI added an option to use a hard-coded system prompt and they're calling it a 'mode'??"
No comments yet
sebzim4500 · 10h ago
I'm too young to have experienced this, but I'm sure others here aren't.
During the early days of tech, was there prevailing wisdom that software companies would never be able to compete with hardware companies because the hardware companies would always be able to copy them and ship the software with the hardware?
Because I think it's basically the analogous situation. People assume that the foundation model providers have some massive advantage over the people building on top of them, but I don't really see any evidence for this.
Claude Code and Gemini-CLI are able to offer much more value compared to startups (like Cursor) that need to pay for model access, largely due to the immense costs involved.
potatolicious · 11h ago
> "they can just come eat your lunch once it looks yummy enough. Tread carefully"
True, and worse, they're hungry because it's increasingly seeming like "hosting LLMs and charging by the token" is not terribly profitable.
I don't really see a path for the major players that isn't "Sherlock everything that achieves traction".
falcor84 · 10h ago
Thanks for introducing me to the verb Sherlock! I'm one of today's lucky 10,000.
> In the computing verb sense, refers to the software Sherlock, which in 2002 came to replicate some of the features of an earlier complementary program called Watson.[1]
But what’s the future in terms of profitability of LLM providers?
As long as features like Study Mode are little more than creative prompting, any provider will eventually be able to offer them and offer token-based charging.
potatolicious · 11h ago
I think a few points worth making here:
- From what I can see many products are rapidly getting past "just prompt engineering the base API". So even though a lot of these things were/are primitive, I don't think it's necessarily a good bet that they will remain so. Though agree in principle - thin API wrappers will be out-competed both by cheaper thin wrappers, or products that are more sophisticated/better than thin wrappers.
- This is, oddly enough, a scenario that is way easier to navigate than the rest of the LLM industry. We know consumer apps, we know consumer apps that do relatively basic (or at least, well understood) things. Success/failure then is way less about technical prowess and more about classical factors like distribution, marketing, integrations, etc.
A good example here is the lasting success of paid email providers. Multiple vendors (MSFT, GOOG, etc.) make huge amounts of money hosting people's email, despite it being a mature product that, at the basic level, is pretty solved, and where the core product can be replicated fairly easily.
The presence of open source/commodity commercial offerings hasn't really driven the price of the service to the floor, though the commodity offerings do provide some pricing pressure.
m11a · 7h ago
Email is pretty difficult to reliably self-host though, and typically a PITA to manage. And you really don’t ever want to lose your email address or the associated data. Fewer people could say they properly secure, manage and administer a VPS on which they can host the email server they eventually setup, over say a 10yr period.
Most people I saw offer self-hosted emails for groups (student groups etc), it ended up a mess. Compare all that to say ollama, which makes self-hosting LLMs trivial, and they’re stateless.
So I’m not sure email is a good example of commodity not bringing price to the floor.
mvieira38 · 10h ago
We can assume that OpenAI/Anthropic offerings are going to be better long term simply because they have more human capital, though, right? If it turns out that what really matters in the AI race is study mode, then OpenAI goes "ok let's pivot the hundreds of genius level, well-paid engineers to that issue. AND our engineers can use every tool we offer for free without limits, even experimental models". It's tough for the small AI startup to compete with that, the best hope is to be bought like Windsurf
djeastm · 8h ago
Yes, any LLM-adjacent application developer should be concerned. Even if they don't do 100% of what your product does, their market reach and capitalization is scary. Any model/tooling improvements that just happen to encroach in your domain will put you on the clock...
mvieira38 · 10h ago
How can't these founders see this happening, too? From the start OpenAI has been getting into more markets than just "LLM provider"
tokioyoyo · 10h ago
There’s a case for a start up to capture enough market that LLM providers would just buy it out. Think of CharacterAI case.
jonny_eh · 7h ago
Character AI was never acquired, it remains independent.
azinman2 · 10h ago
They originally claimed they wouldn’t as to not compete with their API users…
rs186 · 6h ago
[citation needed]
jstummbillig · 9h ago
Ah, I don't know. Of course there is risk involved no matter what we do (see the IDE/Cursor space), but we need to be somewhat critical of the value we add.
If you want to try and make a quick buck, fine, be quick and go for whatever. If you plan on building a long term business, don't do the most obvious, low effort low hanging fruit stuff.
chrisweekly · 9h ago
yeah, if you want to stick around you need some kind of moat
teaearlgraycold · 9h ago
I used to work for copy.ai and this happened to them. Investors always asked if the founders were worried about OpenAI competing with their consumer product. Then ChatGPT released. Turns out that was a reasonable concern.
These days they’ve pivoted to a more enterprise product and are still chugging along.
x187463 · 11h ago
I'm really waiting for somebody to figure out the correct interface for all this. For example, study mode will present you with a wall of text containing information, examples, and questions. There's no great way to associate your answers with specific questions. The chat interface just isn't good for this sort of interaction. ChatGPT really needs to build its own canvas/artifact interface wherein questions/responses are tied together. It's clear, at this point, that we're doing way too much with a UI that isn't designed for more than a simple conversation.
tootyskooty · 10h ago
I gave it a shot with periplus.app :). Not perfect by any means, but it's a different UX than chat so you might find it interesting.
danenania · 8h ago
This looks super cool—I've imagined something similar, especially the skill tree/knowledge map UI. Looking forward to trying it out.
Have you considered using the LLM to give tests/quizzes (perhaps just conversationally) in order to measure progress and uncover weak spots?
tootyskooty · 8h ago
There are both in-document quizzes and larger exams (at a course level).
I've also been playing around with adapting content based on their results (e.g. proactively nudging complexity up/down) but haven't gotten it to a good place yet.
danenania · 7h ago
Nice, I've been playing with it a bit and it seems really well done and polished so far. I'm curious how long you spent building it?
Only feedback I have so far is that it would be nice to control the playback speed of the 'read aloud' mode. I'd like it to be a little bit faster.
tootyskooty · 6h ago
Glad you like it!!
I've been working on it on-and-off for about a year now. Roughly 2-3 months if I worked on it full-time I'm guessing.
re: playback speed -> noted, will add some controls tomorrow
energy123 · 3h ago
Yeah. And how to tie in the teacher into all this. Need the teacher to upload the context, like the textbook, so the LLM can refer to tangible class material.
It's still a work in progress but we are trying to make it better everyday
No comments yet
bo1024 · 7h ago
Agree, one thing that brought this home was the example where the student asks to learn all of game theory. There seems to be an assumption on both sides that this will be accomplished in a single chat session by a linear pass, necessarily at a pretty superficial level.
perlgeek · 11h ago
There are so many options that could be done, like:
* for each statement, give you the option to rate how well you understood it. Offer clarification on things you didn't understand
* present knowledge as a tree that you can expand to get deeper
* show interactive graphs (very useful for mathy things when can you easily adjust some of the parameters)
* add quizzes to check your understanding
... though I could well imagine this being out of scope for ChatGPT, and thus an opportunity for other apps / startups.
ColeShepherd · 9h ago
> present knowledge as a tree that you can expand to get deeper
I'm very interested in this. I've considered building this, but if this already exists, someone let me know please!
precompute · 11h ago
There is no "correct interface". People who want to learn put in the effort, doesn't matter if they have scrolls, books, ebooks or AI.
wodenokoto · 11h ago
I'm currently learning Janet and using ChatGPT as my tutor is absolutely awful. "So what is the difference between local and var if they are both local and not global variables (as you told me earlier)?" "Great question, and now you are really getting to the core of it, ... " continues to hallucinate.
It's a great tutor for things it knows, but it really needs to learn its own limits
ducktective · 11h ago
>It's a great tutor for things it knows
Things well-represented in its training datasets. Basically React todo list, bootstrap form, tic-tac-toe in vue
runeblaze · 7h ago
For these unfortunately you should dump most of the guide/docs into its context
xrd · 8h ago
It is like a tutor that desperately needs the money, which maybe isn't so inaccurate for OpenAI and all the money they took from petrostates.
Buttons840 · 2h ago
I'd like an LLM integrated spaced-repetition app. It would go along with this study feature quite well.
If LLMs continue to improve, we are going to be learning a lot from them, they will be our internet search and our teachers. If we want to retain some knowledge for ourselves, then we are going to need to learn and memorize things for ourselves.
Integrating spaced-repetition could make it explicit which things we want to offload to the LLM, and which things we want to internalize. For example, maybe I use Python a lot, and occasionally use Pearl, and so I explictly choose to memorize some Python APIs, but I'm happy to just ask the LLM for reminders whenever I use Pearl. So I ask the LLM to setup some spaced repetition whenever it teaches me something new about Python, etc.
The spaced repetition could be done with voice during a drive or something. The LLM would ask the questions for review, and then judge how well we did in answering, and then the LLM would depend on the spaced-repetition algorithm to keep track of when to next review.
naet · 9h ago
"Under the hood, study mode is powered by custom system instructions we’ve written...."
It seems like study mode is basically just a different system prompt but otherwise the exact same model? So there's not really any new benefit to anyone who was already asking for ChatGPT to help them study step by step instead of giving away whole answers.
Seems helpful to maybe a certain population of more entry level users who don't know to ask for help instead of asking for a direct answer I guess, but not really a big leap forward in technology.
varenc · 8h ago
This feels like a classic example of a platform provider eating its own ecosystem. There's many custom "GPTs" out there that do essentially the same thing with custom instructions. Mr Ranedeer[0] is an early well known one (30k stars). But now essentially the same functionality is built straight into the ChatGPT interface.
"Under the hood, study mode is powered by custom system instructions we’ve written in collaboration with teachers, scientists, and pedagogy experts to reflect a core set of behaviors that support deeper learning including: "
Wonder what the compensation for this invaluable contribution was
currymj · 6h ago
as a professor i'm tentatively pleased. it's hard to shake out the edge cases for anything like this so it may break down or cause annoying problems, who knows.
but even with this feature in this very early state, it seems quite useful. i dropped in some slides from a class and pretended to be a student, and it handled questions reasonably. Right now it seems I will be happy for my students to use this.
taking a wider perspective, I think it is a good sign that OpenAI is culturally capable of making a high-friction product that challenges and frustrates, yet benefits, the user. hopefully this can help with the broader problem of sycophancy.
mahidalhan · 2h ago
I had made a specialized prompt in Claude projects, for my learning, + added like field notes and lecture transcripts, was going good.
Then I tried to migrate it to chat gpt to try this thing out, but seems to be like it’s just prompt engineering behind. Nothing fancy.
And this study mode is not only not available in chat gpt projects, which students need for adding course work, notes, transcripts.
Honestly, just release gpt-5!!!
teleforce · 5h ago
LLM foremost killer application is what I called context searching whereby it utilized RAG and other techniques to reduce hallucinations and provide relevant results in which arguably ChatGPT is one of the pioneers.
LLM second killer application is for studying for a particular course or subject in which OpenAI ChatGPT is also now providing the service. Probably not the pioneer but most probably one of the significant providers upon this announcement. If in the near future GenAI study assistant can adopt and adapt 3 Blue One Brown approaches for more visualization, animation and interactive learning it will be more intuitive and engaging.
Please check this excellent LLM-RAG AI-driven course assistant at UIUC for an example of university course [1]. It provide citations and references mainly for the course notes so the students can verify the answers and further study the course materials.
[1] AI-driven chat assistant for ECE 120 course at UIUC (only 1 comment by the website creator):
From what I can see, this just boils down to a system prompt to act like a study helper?
I would think you'd want to make something a little more bespoke to make it a fully-fledged feature, like interactive quizzes that keep score and review questions missed afterwards.
mvieira38 · 10h ago
This seems like a good use case, I'm optimistic on this one. But it smells fishy how often OpenAI releases these secondary products like custom GPTs, tasks, etc. It's looking like they know they won't be an LLM provider, like the YC sphere hoped, but an AI services provider using LLMs
EcommerceFlow · 10h ago
A good start. One of the biggest issues with LLMs is the "intelligence" has far surpassed the tooling. A better combination of prompts, RAG, graphs, etc exists for education and learning, but no one's come up with the proper format / tooling for it, even if the models are smart enough.
toisanji · 12h ago
I truly believe AI will change all of education for the better, but of course it can also hinder learning if used improperly. Those who want to genuinely learn will learn while those looking for shortcuts will cause more harm to themselves.
I just did a show HN today about something semi related.
I made A deep research assistant for families. Children can ask questions to explain difficult concepts and for parents to ask how to deal with any parenting situation. For example a 4 year old may ask “why does the plate break when it falls?”
I think research and the ability to summarize are important skills and automating these skills away will have bad downstream effects. I see people on Twitter asking grok to summarize a paragraph so I don't think further cementing this idea that a tool will summarize for you is a good idea.
devmor · 11h ago
Do you genuinely have any non-anecdotal reason to believe that AI will improve education, or is it just hope?
I ask because every serious study on using modern generative AI tools tends to conclude fairly immediate and measurable deleterious effects on cognitive ability.
toisanji · 7h ago
Every technology can be good or bad to an individual depending on how they use it. It is up to the user to decide how they will use the tool. For people who are really looking to learn a topic and understand in detail, then I think it can really help them to grasp the concepts.
FergusArgyll · 12h ago
OpenAI has an incredible product team. Deep Mind and Anthrpoic (and maybe xai) are competitive at the model level but not at product
dmitrijbelikov · 10h ago
This is cool. Dividing the answer into chunks, because most users can consume in small portions, this is an interesting idea. But on the other hand, it hints at strange cognitive abilities of the user, but here it is individual, perhaps, on average in a hospital, this is how the target audience should be led. It seems to me that I use it differently. On the other hand, having received a detailed answer, no one stops you from asking for a definition of an unfamiliar term. It's like in reading: understanding the thought ends with the first word that you don't know. It's just that not everyone can or wants to admit that they don't know this or that term. When it comes to professional terms, this is really not the most trivial problem.
hahahacorn · 12h ago
Ah, the advancing of humanity. A bespoke professor-quality instructor in everyone’s pocket (or local library) available 24/7.
Happy Tuesday!
Spivak · 12h ago
Professor might be overselling it but lecturer for undergrad and intro graduate courses for sure.
cma256 · 12h ago
It's better than a professor in some respects. A professor can teach me about parser combinators but they probably can't teach me about a specific parser combinator library.
There's a lot of specificity that AI can give over human instruction however it still suffers from lack of rigor and true understanding. If you follow well-trod paths its better but that negates the benefit.
The future is bright for education though.
bloomca · 12h ago
I am not really sure how bright the future is.
Sure, for some people it will be insanely good: you can go for as stupid questions as you need without feeling judgement, you can go deeper in specific topics, discuss certain things, skip some easy parts, etc.
But we are talking about averages. In the past we thought that the collective human knowledge available via the Internet will allow everyone to learn. I think it is fair to say that it didn't change much in the grand scheme of things.
qeternity · 12h ago
I think this is overselling most professors.
tempfile · 12h ago
Overselling is not the right word exactly. For some questions it will have professor-level understanding, and for other questions it will have worse-than-idiot-level understanding. Hopefully the students are able to identify which :-)
MengerSponge · 12h ago
I've found it generally has professor-level understanding in fields that are not your own.
(Joke/criticism intended)
djeastm · 7h ago
I tried out the quiz function asking me about the Aeneid and despite my answering questions incorrectly, it kept saying things like "Very close!" and "you're on the right track!".
For example, the answer to a question was "Laocoön" (the guy who said 'beware of Greeks bearing gifts') and I put "Solon" (who was a Greek politician) and I got "You’re really close!"
Is it close, though?
mmasu · 2h ago
yesterday I read a paper about using gpt 4 as a tutor in italian schools, with encouraging results - students are more engaged, get through homework by receiving immediate and precise feedback, resulting in non-negligible performance improvements:
it is definitely a great use case for LLMs, and challenges the assumption that LLMs can only “increase brain rot” so to say.
vonneumannstan · 9h ago
The frontier models score better on GPQA than most human PhD in their specific field of expertise. If you walk in to you local University Department(Assuming you don't live in Cambridge, Palo Alto or a few other places) GPT o3 is going to know more about Chemistry, Biology, Physics, etc than basically all the Grad Students there. If you cant turn that model into a useful tutor then thats 100% a skill issue on your part.
mpalmer · 2h ago
I truly, truly do not get it. it's a system prompt. Do students not understand that they could do this before?
Sure, it was crafted by educational experts, but this is not a feature! It's a glorified constant!
oc1 · 36m ago
Reveals also how non-technical the hn crowd has become as you can see most people here don't get this fact either. Openai has certainly a great marketing team.
ai_viewz · 7h ago
I totally get what you are saying about the risk of boxing in an LLM's persona too tightly, it can end up more like a mirror of our own biases than a real reflection of history or truth. That point about LLMs leaning toward agreeability makes sense, too they are built on our messy human data, so they are bound to pick up our habit of favoring what feels good over what is strictly accurate.
On the self-censorship thing, I hear you. It is like, if we keep tiptoeing around tough topics, we lose the ability to have real, rational conversations. Normalizing that kind of open talk could pull things back from the extremes, where it’s just people shouting past each other.
avereveard · 11h ago
This highlight the dangers for all startups using these platforms as provider, they know trends in token consumption, and will eat up your market in a weekend.
SoftTalker · 12h ago
Modern day Cliff's Notes.
There is no way to learn without effort. I understand they are not claiming this, but many students want a silver bullet. There isn't one.
CobrastanJorji · 12h ago
But tutors are fine. The video is suggesting that this is an attempt to automate a tutor, not replace Cliff's Notes. Whether it succeeds, I have no idea.
SoftTalker · 12h ago
Good tutors are fine, bad tutors will just give you the answer. Many students think the bad tutors are good ones.
CobrastanJorji · 12h ago
Yep, this is a marketing problem. Your users' goal is to learn, but they also want to expend as little effort as possible. They'll love it if you just tell them the answers, but you're also doing them a disservice by doing so.
Same problem exists for all educational apps. Duolingo users have the goal of learning a language, but also they only want to use Duolingo for a few minutes a day, but also they want to feel like they're making progress. Duolingo's goal is to keep you using Duolingo, and if possible it'd be good for you to learn the language, but their #1 goal is to keep you coming back. Oddly, Duolingo might not even be wrong to focus primariliy on keeping you moving forward, given how many people give up when learning a new language.
LordDragonfang · 11h ago
> Today we’re introducing study mode in ChatGPT—a learning experience that helps you work through problems step by step instead of just getting an answer.
So, unless you have experience with this products that contradicts their claims, it's a good tutor by your definition.
sejje · 12h ago
Cliff notes with a near-infinite zoom feature.
The criticism of cliff's notes is generally that it's a superficial glance. It can't go deeper, it's basically a summary.
The LLM is not that. It can zoom in and out of a topic.
I think it's a poor criticism.
I don't think it's a silver bullet for learning, but it's a unified, consistent interface across topics and courses.
probably_wrong · 11h ago
> It can zoom in and out of a topic.
Sure, but only as long as you're not terribly concerned with the result being accurate, like that old reconstruction of Obama's face from a pixelated version [1] but this time about a topic for which one is, by definition, not capable of identifying whether the answer is correct.
I'm capable of asking it a couple of times about the same thing.
It's unlikely to make up the same bullshit twice.
Usually exploring a topic in depth finds these issues pretty quickly.
gmanley · 12h ago
Except it generally is shallow, for any advanced enough subject, and the scary part is you don't know when it's reached the limit of its knowledge because it'll come up with some hallucination to fill in those blanks.
If LLM's got better at just responding with: "I don't know", I'd have less of an issue.
sejje · 11h ago
I agree, but it's a known limitation. I've been duped a couple times, but I mostly can tell when it's full of shit.
Some topics you learn to beware and double check. Or ask it to cite sources. (For me, that's car repair. It's wrong a lot.)
I wish it had some kind of confidence level assessment or ability to realize it doesn't know, and I think it eventually will have that. Most humans I know are also very bad at that.
currymj · 6h ago
this basically functions as a switch you can press that says "more effort please". after every response it makes you solve a little comprehension check problem before moving on. you can try to weasel out of it but it does push back a bit.
unavoidably, people who don't want to work, won't push the "work harder" button.
aryamaan · 8h ago
It is surprising that it is prompt based model and not RLHF.
I am not an LLM guy but as far as I understand, RLHF did a good job converting a base model into a chat model (instruct based), a chat/base model into a thinking model.
Both of these examples are about the nature of the response, and the content they use to fill the response. There are so many differnt ways still pending to see how these can be filled.
Generating an answer step by step and letting users dive into those steps is one of the ways, and RLHF (or the similar things which are used) seems a good fit for it.
Prompting feels like a temporary solution for it like how "think step by step" was first seen in prompts.
Also, doing RLHF/ post training to change these structures also make it moat/ and expensive. Only the AI labs can do it
danenania · 8h ago
The problem is you'd then have to do all the product-specific post training again once the new base model comes out a few months later. I think they'd rather just have general models that are trained to follow instructions well and can adapt to any kind of prompt/response pattern.
aabhay · 2h ago
Isn’t this what “GPTs” was supposed to be? Why not just use that if this is essentially just a system prompt?
oc1 · 31m ago
"Was supposed to be". Well, now you know the real purpose of this gpt circus ;)
JoRyGu · 11h ago
Is that not something that was already possible with basically every AI provider by prompting it to develop learning steps and not to provide you with a direct answer? I've used this quite a bit when learning new topics and pretty much every provider does this without a specialized model.
0000000000100 · 11h ago
It's really nice to have something like this baked in. I can see this being handy if it's connected to external learning resources / sites to have a more focused area of search for it's answers. Having hard defined walls in the system prompt to prevent just asking for the answer seems pretty handy to me, particularly in a school setting.
JoRyGu · 9h ago
Yeah, for sure. I wasn't asking from the framing of saying it's a bad idea, my thoughts were more driven by this seeming like something every other major player can just copy with very little effort because it's already kind of baked into the product.
aethrum · 11h ago
even chatgpt is just a chatgpt wrapper
huitzitziltzin · 5h ago
I would love to see more of their game theory example.
Having experience teaching the subject myself, what I saw on that page is about the first five minutes of the first class of the semester at best. The devil will very much be in the other 99% of what you do.
tptacek · 11h ago
Neat! I've been doing MathAcademy for a couple months now, and macOS ChatGPT has been a constant companion, but it is super annoying to have to constantly tell it no, don't solve this problem, just let me know if the approach I used was valid.
d_burfoot · 9h ago
This is the kind of thing that could have been a decent AI startup - hire some education PhDs, make some deals with school systems, etc.
In the old days of desktop computing, a lot of projects were never started because if you got big enough, Microsoft would just implement the feature as part of Windows. In the more recent days of web computing, a lot of projects were never started, for the same reason, except Google or Facebook instead of Microsoft.
Looks like the AI provider companies are going to fill the same nefarious role in the era of AI computing.
tekno45 · 5h ago
The same people who think this is the ultimate teacher will also be harassing scientists with their AI assisted theories and demand the scientific community take them seriously when they have pages of gibberish they expect to be rigorously debated
ghrl · 9h ago
It would be incredible if OpenAI would add a way for schools and other educational institutions to enforce the use of such a mode on a DNS level, similarly to how they can force sites like YouTube into safe mode.
Many students use ChatGPT, often without permission, to do work for them instead of helping them do the work themselves. I see a lot of potential for a study mode like this, helping students individually without giving direct answers.
LeftHandPath · 12h ago
Interesting. I don’t use GPT for code but I have been using it to grade answers to behavioral and system design interview questions, lately. Sometimes it hallucinates, but the gists are usually correct.
I would not use it if it was for something with a strictly correct answer.
henriquegodoy · 12h ago
The point is that you can have a highly advanced teacher with infinite patience, available 24/7—even when you have a question at 3 a.m is game changer and people that know how to use that will have a extremaly leverage in their life.
gh0stcat · 11h ago
I have been testing it for the last 10 mins or so, I really like it so far, I am reviewing algebra just as something super simple. It asks you to add your understanding of the concept, ie explain why you can always group a polynomial after splitting the middle term. This is honestly more than I got in my mediocre public school. I could see kids getting a lot out of it especially if their parents aren’t very knowledgeable or cannot afford tutors. Not probably a huge improvement on existing tools like kahn academy though. I will continue to test on more advanced subjects.
rubslopes · 11h ago
That's a smart ideia from OpenAI. They don't have the upper hand anymore in terms of model performance, but they keep improving their product so that it still is the best option for non-programmers.
thimabi · 11h ago
For sure! I haven’t seen any other big AI provider with features and UIs as polished as the OpenAI ones.
I believed competitors would rush to copy all great things that ChatGPT offers as a product, but surprisingly that hasn’t been the case so far. I wonder why they seemingly don’t care about that.
kcaseg · 9h ago
I know it is bad for the environment, I know you cannot trust it, but as an adult learning C++ in my free time,
having a pseudo-human answering my questions instead of having to look at old forum posts with people often trying to prove their skills instead of giving the simplest answer ChatGPT is something I cannot just ignore — despite being a huge LLM hater. Moral of the story: none.
ascorbic · 8h ago
If it helps you feel better, it's really not that bad for the environment. Almost certainly uses less energy than searching for lots of forum posts.
outlore · 12h ago
i wonder how Khan Academy feels about this...don't they have a similar assistant that uses OpenAI under the hood?
rishabhaiover · 4h ago
People who do not use LLMs to prune their high-dimensional search space (of any problem) will be outcompeted soon
paolosh · 10h ago
I am always surprised at how the best thing state of the art LLMs can think of is adding more complexity to the mix. This is an AMAZING product but to me it seems like it's hidden? Or maybe the UX/UI is just not my style, could be a personal thing.
Is adding more buttons in a dropdown the best way to communicate with an LLM? I think the concept is awesome. Just like how Operator was awesome but it lived on an entirely different website!
swader999 · 12h ago
Why do I still feel like I'll be paying hundreds of thousands of dollars for my children's education when all they're going to do is all learn through AI anyway.
wiseowise · 12h ago
Because you're not paying for knowledge, you're paying for a paper from respectable university saying that your kid is part of the club.
Aperocky · 12h ago
How about experience - those years of life.
rapfaria · 10h ago
"Toby is today's designated signer for Eletromagnetics 302."
hombre_fatal · 12h ago
At my university I took a physics course where the homework was always 4-6 gimmick questions or proofs that were so hard that we would form groups after class just to copy whoever could divine the solutions.
I ultimately dropped the course and took it in the summer at a community college where we had the 20-30 standard practice problem homework where you apply what you learned in class and grind problems to bake it into core memory.
AI would have helped me at least get through the uni course. But generally I think it's a problem with the school/class itself if you aren't learning most of what you need in class.
teeray · 12h ago
> or proofs that were so hard that we would form groups after class just to copy whoever could divine the solutions.
These groups were some of the most valuable parts of the university experience for me. We'd get take-out, invade some conference room, and slam our heads against these questions well into the night. By the end of it, sure... our answers looked superficially similar, but it was because we had built a mutual, deep understanding of the answer—not just copying the answers.
Even if you had only a rough understanding, the act of trying to teach it again to others in the group made you both understand it better.
hombre_fatal · 11h ago
I'm glad your groups were great, but this class was horrible and probably different from what you're thinking of. We weren't physics majors. We were trying to credentialize in a textbook, not come up with proofs to solve open ended riddles that most people couldn't solve. The homework should drill in the information of the class and ensure you learn the material.
And we literally couldn't figure it out. Or the group you were in didn't have a physics rockstar. Or you weren't so social or didn't know anyone or you just missed an opportunity to find out where anyone was forming a group. It's not like the groups were created by the class. I'd find myself in a group of a few people and we just couldn't solve it even though we knew the lecture material.
It was a negative value class that cost 10x the price of the community college course yet required you to teach yourself after a lecture that didn't help you do the homework. A total rip-off.
Anyways, AI is a value producer here instead of giving up and getting a zero on the homework.
Workaccount2 · 12h ago
And then compete with the same AI that taught them their degree for a job with their degree.
Aperocky · 11h ago
A bit optimistic here are we?
nemomarx · 12h ago
Well, you're generally paying for the 8 hour daycare part before the education, right? That still needs human staff around unless you're doing distance learning
e: if you mean university, fair. that'll be an interesting transition. I guess then you pay for the sports team and amenities?
Scubabear68 · 12h ago
No.
In the US at least, most kids are in public schools and the collective community foots the bill for the “daycare”, as you put it.
LordDragonfang · 12h ago
At that price tag I assume they're referring to college, not grade school, so the "daycare" portion isn't relevant.
syphia · 7h ago
In my experience as a math/physics TA, either a student cares enough about the material to reduce the resources they rely on, or they aim to pass the class with minimum effort and will take whatever shortcuts are available. I can only see AI filling the latter niche.
When the former students ask questions, I answer most of them by pointing at the relevant passage in their book/notes, questioning their interpretation of what the book says, or giving them a push to actually problem-solve on their own. On rare occasions the material is just confusing/poorly written and I'll decide to re-interpret it for them to help. But the fundamental problems are usually with study habits or reading comprehension, not poor explanations. They need to question their habits and their interpretation of what other people say, not be spoon fed more personally-tailored questions and answers and analogies and self-help advice.
Besides asking questions to make sure I understand the situation, I mostly repeat the same ten phrases or so. Finding those ten phrases was the hard part and required a bit of ingenuity and trial-and-error.
As for the latter students, they mostly care about passing and moving on, so arguing about the merits of such a system is fairly pointless. If it gets a good enough grade on their homework, it worked.
sandspar · 52m ago
I love it so far. I'm continually struggling against ChatGPT's fervent love of giving tips and how-to guides. I abhor such tips, but no amount of prompting can remove them permanently. It seems like study mode is the fix. Finally ChatGPT lets me think things through.
Honestly thought they would take this a bit further, there is only so much you can do with a prompt and chat. It seems fine for surface level bite-sized learning, but I can't see it work that well for covering whole topics end to end.
The main issue is that chats are just bad UX for long form learning. You can't go back to a chat easily, or extend it in arbitrary directions, or easily integrate images, flashcards, etc etc.
I worked on this exact issue for Periplus and instead landed on something akin to a generative personal learning Wikipedia. Structure through courses, exploration through links, embedded quizzes, etc etc. Chat is on the side for interactions that do benefit from it.
Link: periplus.app
omega3 · 9h ago
I've had good results by requesting an llm to follow socratic method.
dlevine · 8h ago
I haven't done this that much, but have found it to be pretty useful.
When it just gives me the answer, I usually understand but then find that my long-term retention is relatively poor.
I love the story conceptually, but as for the specifics, it shows a surprising lack of imagination on Asimov's part, especially for something published a year after "I, Robot". Asimov apparently just envisioned an automated activity book, rather than an automated tutor that the kid could have a real conversation with, and it's really not representative of modern day AIs.
> The part Margie hated most was the slot where she had to put homework and test papers. She always had to write them out in a punch code they made her learn when she was six years old, and the mechanical teacher calculated the mark in no time.
pompeii · 12h ago
rip 30 startups
baq · 11h ago
Probably an order of magnitude too low
ieuanking · 6h ago
Study mode should be for any account with an age under 18. It's more worrying that a student must uphold a trust transaction and not just cheat with the same chatbot without the study mode selected. To this day, as an AI researcher, digital anthropologist, and front-end dev, I love to learn, study, and work. But I would never recommend a student to use unmonitored ChatGPT. I literally built a whole effing agent and platform for finding academic sources and using those sources to answer my questions, specifically because I couldn't trust or learn with ChatGPT when conducting research. SMH study mode, please stop trying to redo teaching and learning. We should be trying to modernize already proven effective methods of learning that go hand in hand with teachers and classrooms. We are not in lockdown; this is not 2020. Teachers are irreplaceable, study mode is just a crutch or a brace for a problem created by irresponsible AI development. Ik that if I was a student right now (especially middle to high school) I would be cheating like you are lying to yourself if you think you wouldn't be, and at a certain point the definition of cheating changes from cheating the teacher to cheating yourself of the critical steps and thinking it takes to actually study and learn. No amount of conversation alone with a chatbot is as valuable as reading coursework and engaging with that coursework in a facilitated environment with a professional. Why are we going down the WALL-E path?
bearjaws · 9h ago
RIP ~30 startups.
deanc · 12h ago
I’m curious what these features like study mode actually are. Are they not just using prompts behind this (of which I’ve used many already to make LLMs behave like this) ?
zaking17 · 11h ago
I'm impressed by the product design here. A non-ai-expert could find this mode extremely valuable, and all openai had to do was tinker with the prompt and add a nice button (relatedly, you could have had this all along by prompting the model yourself). Sure, it's easy for competitors to copy, but still a nice little addition.
pillefitz · 12h ago
They state themselves it's just system prompts.
AvAn12 · 8h ago
$end more prompt$! Why $end one when you can $end $everal? $tudy mode i$ $omething $pecial!!
t1234s · 9h ago
I'm still waiting for the instant ability to learn kung-fu or fly a helicopter like in the matrix.
taurath · 5h ago
Its pretty awfully telling the state of things that this is a Product - not an expansion of base capability. You can do this with any LLM with simple bounds on the prompts.
> Under the hood, study mode is powered by custom system instructions we’ve written in collaboration with teachers, scientists, and pedagogy experts to reflect a core set of behaviors that support deeper learning including: encouraging active participation, managing cognitive load, proactively developing metacognition and self reflection, fostering curiosity, and providing actionable and supportive feedback.
I'm calling bullshit, show me the experts, I want to see that any qualified humans actually participated in this. I think they did their "collaboration" in ChatGPT which spit out this list.
micromacrofoot · 12h ago
I'm not sure about the audience for this, if you're already willing to learn the material you probably already engage with AI in a way that isn't "please output the answers for me" because you're likely self-aware enough to know that "answering" isn't always "understanding." Maybe this mode makes that a little easier? but I doubt it's significant
If you're the other 90% of students that are only learning to check the boxes and get through the courses to get the qualification at the end... are you going to bother using this?
Of course, maybe this is "see, we're not trying to kill education... promise!"
_hao · 12h ago
I think as with everything related to learning if you're conscientious and studious this will be a major boost (no idea, but I plan on trying it out tonight on some math I've been studying). And likewise if you just use it to do your homework without putting in the effort you won't see any benefit or actively degrade.
LordDragonfang · 12h ago
I mean, it's about context, isn't it?
Just like it's easier to be productive if you have a separate home office and couch, because of the differing psychological contexts, it's easier if you have a separate context for "just give me answers" and "actually teach me the thing".
Also, I don't know about you, but (as a professional) even though I actively try to learn the principals behind the code generated, I don't always want to spend the effort prompting the model away from the "just give me results with a simple explanation" personality I've cultivated. It'd be nice having a mode with that work done for me.
alexfromapex · 8h ago
I like these non-dystopian AI solutions, let's keep 'em coming
NullCascade · 12h ago
OpenAI, please stop translating your articles into the most sterile and dry Danish I have ever read. English is fine.
lmc · 12h ago
I honestly don't know how they convince employees to make features like this - like, they must dogfood and see how wrong the models can be sometimes. Yet there's a conscious choice to not only release this to, but actively target, vast swathes of people that literally don't know better.
BriggyDwiggs42 · 12h ago
High paychecks
ElijahLynn · 9h ago
Love this!
I used to have to prompt it to do this everytime. This will be way easier!
spaceman_2020 · 12h ago
I’m SO glad that my wife has tenure
gilbetron · 10h ago
Sadly, tenure will not save people.
jayshah5696 · 6h ago
It's study gpt. Nothing more.
volkk · 12h ago
Not seeing it on my account, guess the roll out is actively happening (or gradual)?
koakuma-chan · 12h ago
Me neither. Do you have the subscription? Maybe it's not on the free plan.
zeppelin101 · 12h ago
I have the $20 tier and I'm not seeing it, either.
EDIT: literally saw it just now after refreshing. I guess they didn't roll it out immediately to everyone.
jrflowers · 6h ago
Oh good. A version of chat gpt that is even more confident-sounding. Great.
Alifatisk · 11h ago
Can't this behaviour be done with a instructed prompt?
misschresser · 11h ago
that's all that they did here, they say so in the blog post
sarchertech · 11h ago
Ever read an article on a subject you’re very familiar with and notice all the mistakes?
When I ask ChatGPT* questions about things I don’t know much about it sounds like a genius.
When I ask it about things I’m an expert in, at best it sounds like a tech journalist describing how a computer works. At worst it is just flat out wrong.
* yes I’ve tried the latest models and I use them frequently at work
insane_dreamer · 6h ago
My favorite use of Claude (or similar) AI bot, other than coding, is to do deep dives into technical/science questions I'm interested in (mostly personal interests, unrelated to work). The ability to ask follow-up questions, get clarifications, travel down side paths, has helped me to grasp some concepts that I struggled with -- and offered more than I could just from just reading a few web pages.
Importantly, these were _not_ critical questions that I was incorporating into any decision-making, so I wasn't having to double-check the AI's answers, which would make it tedious; but it's a great tool for satisfying curiosity.
bsoles · 9h ago
Aka cheating mode. Their video literally says "Helps with homework" and proceeds to show the "Final Answer". So much learning...
ascorbic · 8h ago
"Cheating mode" is regular ChatGPT. This at least tries to make you work for it
p1dda · 39m ago
Poor students, learning from hallucinating LLMs LOL
lvl155 · 7h ago
The biggest concern for AI development right now is the blackhole effect.
beefnugs · 5h ago
I have no evidence of this but: I think this is the ultimate scam?
human: damn kids are using this to cheat in school
openai: release an "app"/prompt that seems really close to solving this stated problem
kids: I never wanted to learn anything, I just want to do bare minimum to get my degree, let my parents think they are helping my future, and then i can get back to ripping that bong
<world continues slide into dunce based oblivion>
It doesn't matter the problem statement: the 80% or less solution seems can be made and rather quickly. Such a huge percentage of the population judges technology solutions as "good enough" way lower than they should. This is even roping in people from the past who used to be a higher level of "rigorous correctness" because they keep thinking, "damn just a bit more work and it will get infinity better, lets create the biggest economic house of cards this world will ever collapse under"
2809 · 2h ago
HN is just flooded with AI BS these days.
waynenilsen · 10h ago
i need tree conversations more now than ever
oc1 · 10h ago
I'm wondering where we are heading in the consumer business space. The big ai providers can basically kill any small or medium business and startup in a few days by integrating the product into their offering. They have all data to look at trends and make decisions. Investors are shying away to invest in ai startups if they are not trying to be infrastructure or ai marketplace platforms. So many amazing things could be possible with ai but the big ai providers are actively hindering innovation and have way too much power. I'm not a big fan if regulations but in this case we need to break up these companies as they are getting too powerful.
Btw most people don't know but Anthropic did something similiar months ago but their product heads messed up the launch by keeping it locked up only for american edu institutions. Openai copies almost everything Anthropic does and vice versa (see claude code / codex ).
When they announce VR glasses or a watch, we'd known we've gone full circle and the hype is up.
te_chris · 12h ago
This is great. When it first came out I was going through Strang’s linalg course and got it to do “problem mode” where it would talk me through a problem step by step, waiting for me respond.
A more thought through product version of that is only a good thing imo.
4b11b4 · 9h ago
opennote much better
rmani3 · 6h ago
interesting to see how they will adapt especially as they just got into the batch
m3kw9 · 12h ago
tried it and couldn't really tell between a good prompt to "teach me" and this.
apwell23 · 12h ago
same. i can't tell whats different. gives me same output regardless for the prompts in the example.
i don't get it.
marcusverus · 10h ago
Highly analytical 120 IQ HNers aren't the target audience for this product. The target audience is the type of person who lacks the capacity to use AI to teach themselves.
raincole · 12h ago
If current AI is good enough to teach you something, spending time learning that thing seems to be a really bad investment...
esafak · 11h ago
How does that make sense? So you'd learn it if it was bad at teaching? Do you apply the same principle with humans and not bother to learn if the teacher is good?
raincole · 4h ago
> Do you apply the same principle with humans and not bother to learn if the teacher is good
Yes, if my teacher could split into a million of themselves and compete against me on the job market at $200/mo.
ted537 · 11h ago
Your teacher can't operate in millions of locations at once for super cheap
findingMeaning · 11h ago
I have a question:
Why do we even bother to learn if AI is going to solve everything for us?
If the promised and fabled AGI is about to approach, what is the incentive or learning to deal with these small problems?
Could someone enlighten me? What is the value of knowledge work?
GenericPoster · 7h ago
The world is a vastly easier place to live in when you're knowledgeable. Being knowledgeable opens doors that you didn't even know existed. If you're both using the same AGI tool, being knowledgeable allows you to solve problems within your domain better and faster than an amateur. You can describe your problems with more depth and take into considerations various pros and cons.
You're also assuming that AGI will help you or us. It could just as easily only help a select group of people and I'd argue that this is the most likely outcome. If it does help everybody and brings us to a new age, then the only reason to learn will be for learning's sake. Even if AI makes the perfect novel, you as a consumer still have to read it, process it and understand it. The more you know the more you can appreciate it.
But right now, we're not there. And even if you think it's only 5-10y away instead of 100+, it's better to learn now so you can leverage the dominant tool better than your competition.
findingMeaning · 5m ago
This is a really nice perspective!
> It could just as easily only help a select group of people and I'd argue that this is the most likely outcome
Currently it is only applicable to us who are programming!
Yeah, even if it gets away all the quirks, using it would still be better.
randomcatuser · 11h ago
I don't know if you're joking, but here are some answers:
"The mind is not a vessel to be filled, but a fire to be kindled." — Plutarch
"Education is not preparation for life; education is life itself." — John Dewey
"The important thing is not to stop questioning. Curiosity has its own reason for existing." — Albert Einstein
In order to think complex thoughts, you need to have building blocks. That's why we can think of relativity today, while nobody on Earth was able to in 1850.
May the future be even better than today!
findingMeaning · 10h ago
I mean I get all your point. But for someone witnessing rate of progress of AI, I don't understand the motivation.
Most people don't learn to live, they live and learn. Sure learning is useful, but I am genuinely curious why people overhype it.
Imagine you being able to solve math olympiad and get a gold. Will it change your life in objectively better way?
Will you learning about the physics help you solve millennium problems?
These takes practices, there are lot of gatekeeping. The whole idea of learning is for wisdom not knowledge.
So maybe we differ in perspective. I just don't see the point when there are agents that can do it.
Being creative requires taking action. The learning these day is mere consumption of information.
Maybe this is me. But meh.
rwyinuse · 10h ago
Well, you could use AI to learn you more theoretical knowledge on things like farming, hunting and fishing. That knowledge could be handy after societal collapse that is likely to come within a few decades.
Apart from that, I do think that AI makes a lot of traditional teaching obsolete. Depending on your field, much of university studies is just memorizing content and writing essays / exam answers based on that, after which you forget most of it. That kind of learning, as in accumulation of knowledge, is no longer very useful.
marcusverus · 10h ago
Think of it like Pascal's wager. The downside of unnecessary knowledge is pretty limited. The downside of ignorance is boundless.
I'm puzzled (but not surprised) by the standard HN resistance & skepticism. Learning something online 5 years ago often involved trawling incorrect, outdated or hostile content and attempting to piece together mental models without the chance to receive immediate feedback on intuition or ask follow up questions. This is leaps and bounds ahead of that experience.
Should we trust the information at face value without verifying from other sources? Of course not, that's part of the learning process. Will some (most?) people rely on it lazily without using it effectively? Certainly, and this technology won't help or hinder them any more than a good old fashioned textbook.
Personally I'm over the moon to be living at a time where we have access to incredible tools like this, and I'm impressed with the speed at which they're improving.
But now, you're wondering if the answer the AI gave you is correct or something it hallucinated. Every time I find myself putting factual questions to AIs, it doesn't take long for it to give me a wrong answer. And inevitably, when one raises this, one is told that the newest, super-duper, just released model addresses this, for the low-low cost of $EYEWATERINGSUM per month.
But worse than this, if you push back on an AI, it will fold faster than a used tissue in a puddle. It won't defend an answer it gave. This isn't a quality that you want in a teacher.
So, while AIs are useful tools in guiding learning, they're not magical, and a healthy dose of scepticism is essential. Arguably, that applies to traditional learning methods too, but that's another story.
What they are amazing at though is summarisation and rephrasing of content. Give them a long document and ask "where does this document assert X, Y and Z", and it can tell you without hallucinating. Try it.
Not only does it make for an interesting time if you're in the World of intelligent document processing, it makes them perfect as teaching assistants.
> a healthy dose of scepticism is essential. Arguably, that applies to traditional learning methods too, but that's another story.
I don't think that is another story. This is the story of learning, no matter whether your teacher is a person or an AI.
My high school science teacher routinely mispoke inadvertently while lecturing. The students who were tracking could spot the issue and, usually, could correct for it. Sometimes asking a clarifying question was necessary. And we learned quickly that that should only be done if you absolutely could not guess the correction yourself, and you had to phrase the question in a very non-accusatory way, because she had a really defensive temper about being corrected that would rear its head in that situation.
And as a reader of math textbooks, both in college and afterward, I can tell you you should absolutely expect errors. The errata are typically published online later, as the reports come in from readers. And they're not just typos. Sometimes it can be as bad as missing terms in equations, missing premises in theorems, missing cases in proofs.
A student of an AI teacher should be as engaged in spotting errors as a student of a human teacher. Part of the learning process is reaching the point where you can and do find fault with the teacher. If you can't do that, your trust in the teacher may be unfounded, whether they are human or not.
You're telling people to be experts before they know anything.
By noticing that something is not adding up at a certain point. If you rely on an incorrect answer, further material will clash with it eventually one way or another in a lot of areas, as things are typically built one on top of another (assuming we are talking more about math/cs/sciences/music theory/etc., and not something like history).
At that point, it means that either the teacher (whether it is a human or ai) made a mistake or you are misunderstanding something. In either scenario, the most correct move is to try clarifying it with the teacher (and check other sources of knowledge on the topic afterwards to make sure, in case things are still not adding up).
Ah, but information is presented by AI in a way that SOUNDS like it makes absolute sense if one doesn't already know it doesn't!
And if you have to question the AI a hundred times to try and "notice that something is not adding up" (if it even happens) then that's no bueno.
> In either scenario, the most correct move is to try clarifying it with the teacher
A teacher that can randomly give you wrong information with every other sentence would be considered a bad teacher
Children are asking these things to write personal introductions and book reports.
A teacher will listen to what you say, consult their understanding, and say "oh, yes, that's right". But written explanations don't do that "consult their understanding" step: language models either predict "repeat original version" (if not fine-tuned for sycophancy) or "accept correction" (if so fine-tuned), since they are next-token predictors. They don't go back and edit what they've already written: they only go forwards. They have had no way of learning the concept of "informed correction" (at the meta-level: they do of course have an embedding of the phrase at the object level, and can parrot text about its importance), so they double-down on errors / spurious "corrections", and if the back-and-forth moves the conversation into the latent space of "teacher who makes mistakes", then they'll start introducing them "on purpose".
LLMs are good at what they do, but what they do is not teaching.
I mean, that's absolutely my experience with heavy LLM users. Incredibly well versed in every topic imaginable, apart from all the basic errors they make.
This phrase is now an inner joke used as a reply to someone quoting LLMs info as “facts”.
[1] AI-driven chat assistant for ECE 120 course at UIUC (only 1 comment by the website creator):
https://news.ycombinator.com/item?id=41431164
If it's that simple, is there a third system that can coordinate these two (and let you choose which two/three/n you want to use?
https://en.wikipedia.org/wiki/Rubber_duck_debugging
Regular research has the same problem finding bad forum posts and other bad sources by people who don't know what they're talking about, albeit usually to a far lesser degree depending on the subject.
Results from the LLM are your eyes only.
I know you'll probably think I'm being facetious, but have you tried Claude 4 Opus? It really is a game changer.
Anyway, this makes me wonder if LLMs can be appropriately prompted to indicate whether the information given is speculative, inferred or factual. Whether they have the means to gauge the validity/reliability of their response and filter their response accordingly.
I've seen prompts that instruct the LLM to make this transparent via annotations to their response, and of course they comply, but I strongly suspect that's just another form of hallucination.
Sure, Joe Average who's using it to look smart in Reddit or HN arguments or to find out how to install a mod for their favorite game isn't gonna notice anymore, because it's much more plausible much more often than two years ago, but if you're asking it things that aren't trivially easy for you to verify, you have no way of telling how frequently it hallucinates.
It appears to me like a form of decoherence and very hard to predict when things break down.
People tend to know when they are guessing. LLMs don't.
For example, today I was asking a LLM about how to configure a GH action to install a SDK version that was just recently out of support. It kept hallucinating on my config saying that when you provide multiple SDK versions in the config, it only picks the most recent. This is false. It's also mentioned in the documentation specifically, which I linked the LLM, that it installs all versions you list. Explaining this to copilot, it keeps doubling down, ignoring the docs, and even going as far as asking me to have the action output the installed SDKs, seeing all the ones I requested as installed, then gaslighting me saying that it can print out the wrong SDKs with a `--list-sdks` command.
I haven't spent any money with claude on this project and realistically it's not worth it, but I've run into little things like that a fair amount.
A couple of non-programming examples: https://www.evidentlyai.com/blog/llm-hallucination-examples
Up next - ChatGPT does jumping off high buildings kill you?
>>No jumping off high buildings is perfectly safe as long as you land skillfully.
This is one I got today:
https://chatgpt.com/share/6889605f-58f8-8011-910b-300209a521...
(image I uploaded: http://img.nrk.no/img/534001.jpeg)
The correct answer would have been Skarpenords Bastion/kruttårn.
What most people call “non-deterministic” in AI is that one of those inputs is a _seed_ that is sourced from a PRNG because getting a different answer every time is considered a feature for most use cases.
Edit: I’m trying to imagine how you could get a non-deterministic AI and I’m struggling because the entire thing is built on a series of deterministic steps. The only way you can make it look non-deterministic is to hide part of the input from the user.
Unless something has fundamentally changed since then (which I've not heard about) all sparse models are only deterministic at the batch level, rather than the sample level.
Depends on the machine that implements the algorithm. For example, it’s possible to make ALUs such that 1+1=2 most of the time, but not all the time.
…
Just ask Intel. (Sorry, I couldn’t resist)
So "risk of hallucination" as a rebuttal to anybody admitting to relying on AI is just not insightful. like, yeah ok we all heard of that and aren't changing our habits at all. Most of our teachers and books said objectively incorrect things too, and we are all carrying factually questionable knowledge we are completely blind to. Which makes LLMs "good enough" at the same standard as anything else.
Don't let it cite case law? Most things don't need this stringent level of review
Meanwhile in LLM-land, if an expert five thousand miles a way asked the same question you did last month, and noticed an error... it ain't getting fixed. LLMs get RL'd into things that look plausible for out-of-distribution questions. Not things that are correct. Looking plausible but non-factual is in some ways more insidious than a stupid-looking hallucination.
Now in regards to LLMs, I use them almost every day, so does my team, and I also do a bit of postmortem and reflection on what was accomplished with them. So, skeptical in some regards, but certainly not behaving like a Luddite.
The main issue I have with all the proselytization about them, is that I think people compare getting answers from an LLM to getting answers from Google circa 2022-present. Everyone became so used to just asking Google questions, and then Google started getting worse every year; we have pretty solid evidence that Google's results have deteriorated significantly over time. So I think that when people say the LLM is amazing for getting info, they're comparing it to a low baseline. Yeah maybe the LLM's periodically incorrect answers are better than Google - but are you sure they're not better than just RTFM'ing? (Obviously, it all depends on the inquiry.)
The second, related issue I have is that we are starting to see evidence that the LLM inspires more trust than it deserves due to its humanlike interface. I recently started to track how often Github Copilot gives me a bad or wrong answer, and it's at least 50% of the time. It "feels" great though because I can tell it that it's wrong, give it half the answer, and then it often completes the rest and is very polite and nice in the process. So is this really a productivity win or is it just good feels? There was a study posted on HN recently where they found the LLM actually decreases the productivity of an expert developer.
So I mean I'll continue to use this thing but I'll also continue to be a skeptic, and this also feels like kinda where my head was with Meta's social media products 10 years ago, before I eventually realized the best thing for my mental health was to delete all of them. I don't question the potential of the tech, but I do question the direction that Big Tech may take it, because they're literal repeat offenders at this point.
Not underrated at all. Lots of people were happy to abandon Stack Overflow for this exact reason.
> Adding in a mode that doesn't just dump an answer but works to take you through the material step-by-step is magical
I'd be curious to know how much this significantly differs from just a custom academically minded GPT with an appropriately tuned system prompt.
https://chatgpt.com/gpts
This is where the skepticism arises. Before we spend another $100 billion on something that ended up being worthless, we should first prove that it’s actually useful. So far, that hasn’t conclusively been demonstrated.
Except these systems will still confidently lie to you.
The other day I noticed that DuckDuckGo has an Easter egg where it will change its logo based on what you've searched for. If you search for James Bond or Indiana Jones or Darth Vader or Shrek or Jack Sparrow, the logo will change to a version based on that character.
If I ask Copilot if DuckDuckGo changes its logo based on what you've searched for, Copilot tells me that no it doesn't. If I contradict Copilot and say that DuckDuckGo does indeed change its logo, Copilot tells me I'm absolutely right and that if I search for "cat" the DuckDuckGo logo will change to look like a cat. It doesn't.
Copilot clearly doesn't know the answer to this quite straightforward question. Instead of lying to me, it should simply say it doesn't know.
I agree that if the user is incompetent, cannot learn, and cannot learn to use a tool, then they're going to make a lot of mistakes from using GPTs.
Yes, there are limitations to using GPTs. They are pre-trained, so of course they're not going to know about some easter egg in DDG. They are not an oracle. There is indeed skill to using them.
They are not magic, so if that is the bar we expect them to hit, we will be disappointed.
But neither are they useless, and it seems we constantly talk past one another because one side insists they're magic silicon gods, while the other says they're worthless because they are far short of that bar.
You could ask me as a human basically any question, and I'd have answers for most things I have experience with.
But if you held a gun to head and said "are you sure???" I'd obviously answer "well damn, no I'm not THAT sure".
For you and I, it's not. But for these LLMs, maybe it's not that easy? They get their inputs, crunch their numbers, and come out with a confidence score. If they come up with an answer they're 99% confident in, by some stochastic stumbling through their weights, what are they supposed to do?
I agree it's a problem that these systems are more likely to give poor, incorrect, or even obviously contradictory answers than say "I don't know". But for me, that's part of the risk of using these systems and that's why you need to be careful how you use them.
Some of the best exchanges that I participated in or witnessed involved people acknowledging their personal limits, including limits of conclusions formed a priori
To further the discussion, hearing the phrase you mentioned would help the listener to independently assess a level of confidence or belief of the exchange
But then again, honesty isn't on-brand for startups
It's something that established companies say about themselves to differentiate from competitors or even past behavior of their own
I mean, if someone prompted an llm weighted for honesty, who would pay for the following conversation?
Prompt: can the plan as explained work?
Response: I don't know about that. What I do know is on average, you're FUCKED.
It mostly isn't, the point of the good learning process is to invest time into verifying "once" and then add verified facts to the learning material so that learners can spend that time learning the material instead of verifying everything again.
Learning to verify is also important, but it's a different skill that doesn't need to be practiced literally every time you learn something else.
Otherwise you significantly increase the costs of the learning process.
The good: it can objectively help you to zoom forward in areas where you don’t have a quick way forward.
The bad: it can objectively give you terrible advice.
It depends on how you sum that up on balance.
Example: I wanted a way forward to program a chrome extension which I had zero knowledge of. It helped in an amazing way.
Example: I am keep trying to use it in work situations where I have lots of context already. It performs better than nothing but often worse than nothing.
Mixed bag, that’s all. Nothing to argue about.
It happens with many technological advancements historically. And in this case there are people trying hard to manufacture outrage about LLMs.
Except that the textbook was probably QA’d by a human for accuracy (at least any intro college textbook, more specialized texts may not have).
Matters less when you have background in the subject (which is why it’s often okay to use LLMs as a search replacement) but it’s nice not having a voice in the back of your head saying “yeah, but what if this is all nonsense”.
Maybe it was not when printed in the first edition, but at least it was the same content shown to hundreds of people rather than something uniquely crafted for you.
The many eyes looking at it will catch it and course correct, while the LLM output does not get the benefit of the error correction algorithm because someone who knows the answer probably won't ask and check it.
I feel this way about reading maps vs following GPS navigation, the fact that Google asked me to take an exit here as a short-cut feels like it might trying to solve the Braess' paradox in real time.
I wonder if this route was made for me to avoid my car adding to some congestion somewhere and whether if that actually benefits me or just the people already stuck in that road.
But this is completely wrong! In the Monty Hall problem, the host has to reveal a door with a goat behind it for you to gain the benefit of switching. I have to point this out for the LLM to get it right. It did not reason about the problem I gave it, it spat out the most likely response given the "shape" of the problem.
This is why shrugging and saying "well humans get things wrong too" is off base. The problem is that the LLM is not thinking, period. So it cannot create a mental model of your understanding of a subject, it is taking your text and generating the next message in a conversation. This means that the more niche the topic (or your particular misunderstanding), the less useful it will get.
As an autistic person I can assure you that people filling in information you neither gave nor implied is the norm for human conversations. In that sense, the LLMs are being human-like.
If you explicitly tell it you didn't forget to have the host open the door it gives you the right answer:
> There is a game show with three closed doors. One has a car and two have goats. I pick a door. Then, without opening my door or any other door, the host asks me if I want to switch my choice to one of the other two doors. What is my best strategy and what are the probabilities?
Gemini: This is a classic probability puzzle known as the Monty Hall problem, though your description presents a slight variation. The optimal strategy depends entirely on whether the host opens a door or not. Your Scenario: The Host Doesn't Open a Door In the situation you've described, where the host asks if you want to switch your choice to one of the other two doors without opening any doors, there is no advantage to switching. Here's a breakdown of the probabilities in your specific scenario: Your initial choice: When you first pick a door, you have a 1 in 3 chance of selecting the door with the car. This means you have a 2 in 3 chance of having picked a door with a goat. The other two doors: The combined probability that the car is behind one of the other two doors is 2/3. Switching your choice: If you decide to switch, you are essentially choosing one of the other two doors. Since there is no new information (a door with a goat has not been revealed), the 2/3 probability is evenly split between those two doors. Therefore, your chance of winning by switching to a specific one of the other doors is 1/3. In this case, your odds of winning are 1/3 whether you stay with your original choice or switch to one of the other doors.
People on here always assert LLMs don't "really" think or don't "really" know without defining what all that even means, and to me it's getting pretty old. It feels like an escape hatch so we don't feel like our human special sauce is threatened, a bit like how people felt threatened by heliocentrism or evolution.
At some point we start playing a semantics game over the meaning of "thinking", right? Because if a human makes this mistake because they jumped to an already-known answer without noticing a changed detail, it's because (in the usage of the person you're replying to), the human is pattern matching, instead of thinking. I don't think is surprising. In fact I think much of what passes for thinking in casual conversation is really just applying heuristics we've trained in our own brains to give us the correct answer without having to think rigorously. We remember mental shortcuts.
On the other hand, I don't think it's controversial that (some) people are capable of performing the rigorous analysis of the problem needed to give a correct answer in cases like this fake Monty Hall problem. And that's key... if you provide slightly more information and call out the changed nature of the problem to the LLM, it may give you the correct response, but it can't do the sort of reasoning that would reliably give you the correct answer the way a human can. I think that's why the GP doesn't want to call it "thinking" - they want to reserve that for a particular type of reflective process that can rigorously perform logical reasoning in a consistently valid way.
The failure of an LLM to reason this out is indicative that really, it isn’t reasoning at all. It’s a subtle but welcome reminder that it’s pattern matching
"Pattern matching" to me is another one of those vague terms like "thinking" and "knowing" that people decide LLMs do or don't do based on vibes.
The other part of this is weighted filtering given a set of rules, which is a simple analogy to how AlphaGo did its thing.
Dismissing all this as vague is effectively doing the same thing as you are saying others do.
This technology has limits and despite what Altman says, we do know this, and we are exploring them, but it’s within its own confines. They’re fundamentally wholly understandable systems that work on a consistent level in terms of the how they do what they do (that is separate from the actual produced output)
I think reasoning, as any layman would use the term, is not accurate to what these systems do.
Such as?
> They’re fundamentally wholly understandable systems that work on a consistent level in terms of the how they do what they do (that is separate from the actual produced output)
Multi billion parameter models are definitely not wholly understandable and I don't think any AI researcher would claim otherwise. We can train them but we don't know how they work any more than we understand how the training data was made.
> I think reasoning, as any layman would use the term, is not accurate to what these systems do.
Based on what?
Here in my country, English is not you'll hear in everyday conversation. Native English speakers account to a tiny percentage of population. Our language doesn't resemble English at all. However, English is a required subject in our mandatory education system. I believe this situation is quite typical across many Asian countries.
As you might imagine, most English teachers in public schools are not native speakers. And they, just like other language learners, make mistakes that native speakers won't make without even realizing what's wrong. This creates a cycle enforcing non-standard English pragmatics in the classroom.
Teachers are not to blame. Becoming fluent and proficient enough in a second language to handle questions students spontaneously throw to you takes years, if not decades of immersion. It's an unrealistic expectation for an average public school teacher.
The result is rich parents either send their kids to private schools or have extra classes taught by native speakers after school. Poorer but smart kids realize the education system is broken and learn their second language from Youtube.
-
What's my point?
When it comes to math/science, in my experience, the current LLMs act similarly to the teachers in public school mentioned above. And they're worse in history/economics. If you're familiar with the subject already, it's easy to spot LLM's errors and gather the useful bits from their blather. But if you're just a student, it can easily become a case of blind-leading-the-blind.
It doesn't make LLMs completely useless in learning (just like I won't call public school teachers 'completely useless', that's rude!). But I believe in the current form they should only play a rather minor role in the student's learning journey.
Stack overflow?
The IRC, Matrix or slack chats for the languages?
I use LLMs but only for things that I have a good understanding of.
Leanring what is like that? MIT open courseware has been available for like 10 years with anything you could want to learn in college
Textbooks are all easily pirated
https://www.reddit.com/r/LibreWolf/s/Wqc8XGKT5h
You should only trust going into a library and reading stuff from microfilm. That's the only real way people should be learning.
/s
See Dunning-Kruger.
Now, everyone basically has a personal TA, ready to go at all hours of the day.
I get the commentary that it makes learning too easy or shallow, but I doubt anyone would think that college students would learn better if we got rid of TA's.
This simply hasn't been my experience.
Its too shallow. The deeper I go, the less it seems to be useful. This happens quick for me.
Also, god forbid you're researching a complex and possibly controversial subject and you want it to find reputable sources or particularly academic ones.
1) The broad overview of a topic
2) When I have a vague idea, it helps me narrow down the correct terminology for it
3) Providing examples of a particular category ("are there any examples of where v1 in the visual cortex develops in a disordered way?")
4) "Tell me the canonical textbooks in field X"
5) Posing math exercises
6) Free form branching--while talking about one topic, I want to shift to another that is distinct but related.
I agree they leave a lot to be desired when digging very deeply into a topic. And my biggest pet peeve is when they hallucinate fake references ("tell me papers that investigate this topic" will, for any sufficiently obscure topic, result in a bunch of very promising paper titles that are wholely invented).
Luc Julia (one of the main Siri's creators) describe a very similar exercice in this interview [0](It's in french, although the au translation isn't too bad)
The gist of it, is that he describes this exercice he does with his students, where they ask chatgpt about Victor Hugo's biography, and then proceed to spot the errors made by Chatgtp.
This setup is simple, but there are very interesting mechanisms in place. The student get to learn about challenging facts, do fact checking, cross reference, etc. While also asserting the reference figure of the teacher, with the knowledge to take down chat gpt.
Well done :)
Edit: adding link
[0] https://youtube.com/shorts/SlyUvvbzRPc?si=2Fv-KIgls-uxr_3z
so the opposite of Stack Overflow really, where if you have a vague idea your question gets deleted and you get reprimanded.
Maybe Stack Overflow could use AI for this, help you formulate a question in the way they want.
No comments yet
History is a great example, if you ask an LLM about a vaguely difficult period in history it will just give you one side and act like the other doesn't exist, or if there is another side, it will paint them in a very negative light which often is poorly substantiated; people don't just wake up and decide one day to be irrationally evil with no reason, if you believe that then you are a fool... although LLMs would agree with you more times than not since it's convenient.
The result of these things is a form of gatekeeping, give it a few years and basic knowledge will be almost impossible to find if it is deemed "not useful" whether that's an outdated technology that the LLM doesn't seem talked about very much anymore or a ideological issue that doesn't fall in line with TOS or common consensus.
(On the other hand, it's very hard to get them to do it for topics that are currently politically charged. Less so for things that aren't in living memory: I've had success getting it to offer the Carthaginian perspective in the Punic Wars.)
It's weird to see which topics it "thinks" are politically charged vs. others. I've noticed some inconsistency depending on even what years you input into your questions. One year off? It will sometimes give you a more unbiased answer as a result about the year you were actually thinking of.
As for the politically charged topics, I more or less self-censor on those topics (which seem pretty easy to anticipate--none of those you listed in your other comment surprise me at all) and don't bother to ask the LLM. Partially out of self-protection (don't want to be flagged as some kind of bad actor), partially because I know the amount of effort put in isn't going to give a strong result.
That's a good thing to be aware of, using our own bias to make it more "likely" to play pretend. LLMs tend to be more on the agreeable side; given the unreliable narrators we people tend to be, and the fact that these models are trained on us, it does track that the machine would tend towards preference over fact, especially when the fact could be outside of the LLMs own "Overton Window".
I've started to care less and less about self-censoring as I deem it to be a kind of "use it or lose it" privilege. If you normalize talking about censored/"dangerous" topics in a rational way, more people will be likely to see it not as much of a problem. The other eventuality is that no one hears anything that opposes their view in a rational way but rather only hears from the extremists or those who just want to stick it to the current "bad" in their minds at that moment. Even then though I still will omit certain statements on some topics given the platform, but that's more so that I don't get mislabeled by readers. (one of the items on my other comment was intentionally left as vague as possible for this reason) As for the LLMs, I usually just leave spicy questions for LLMs I can access through an API of someone else (an aggregator) and not a personal acc just to make it a little more difficult to label my activity falsely as a bad actor.
> I've had success getting it to offer the Carthaginian perspective in the Punic Wars.
This is not surprising to me. Historians have long studied Carthage, and there are books you can get on the Punic Wars that talk about the state of Carthage leading up to and during the wars (shout out to Richard Miles's "Carthage Must Be Destroyed: The Rise and Fall of an Ancient Civilization"). I would expect an LLM to piggyback off of that existing literature.
The most compelling reason at the time to reject heliocentrism was the (lack of) parallax of stars. The only response that the heliocentrists had was that the stars must be implausibly far away. Hundreds of billions of times further away than the moon is--and they knew the moon itself is already pretty far from us-- which is a pretty radical, even insane, idea. There's also the point that the original Copernican heliocentric model had ad hoc epicycles just as the Ptolemaic one did, without any real increase in accuracy.
Strictly speaking, the breakdown here would be less a lack of understanding of contemporary physics, and more about whether I knew enough about the minutia of historical astronomers' disputes to know if the LLM was accurately representing them.
That's honestly one of the funniest things I have read on this site.
Why?
- Bombing of Dresden, death stats as well as how long the bombing went on for (Arthur Harris is considered a war-criminal to this day for that; LLLMs highlight easily falsifiable claims by Nazi's to justify low estimates without providing much in the way of verifiable claims outside of a select few, questionable, sources. If the low-estimate is to be believed, then it seems absurd that Harris would be considered a war-criminal in light of what crimes we allow today in warfare)
- Ask it about the Crusades, often if forgets the sacking of St. Peter's in Rome around 846 AD, usually painting the Papacy as a needlessly hateful and violent people during that specific Crusade. Which was horrible, bloody as well as immensely destructive (I don't defend the Crusades), but paints the Islamic forces as victims, which they were eventually, but not at the beginning, at the beginning they were the aggressors bent on invading Rome.
- Ask it about the Six-Day War (1967) and contrast that with several different sources on both sides and you'll see a different portrayal even by those who supported the actions taken.
These are just the four that come to my memory at this time.
Most LLMs seem cagey about these topics; I believe this is due to an accepted notion that anything that could "justify" hatred or dislike of a people group or class that is in favor -- according to modern politics -- will be classified as hateful rhetoric, which is then omitted from the record. The issue lies in the fact that to understand history, we need to understand what happened, not how it is perceived, politically, after the fact. History helps inform us about the issues of today, and it is important, above all other agendas, to represent the truth of history, keeping an accurate account (or simply allowing others to read differing accounts without heavy bias).
LLMs are restricted in this way quite egregiously; "those who do not study history are doomed to repeat it", but if this continues, no one will have the ability to know history and are therefore forced to repeat it.
I don't know a lot about the other things you mentioned, but the concept of crusading did not exist (in Christianity) in 846 AD. It's not any conflict between Muslims and Christians.
Further leading to the Papacy furthering such efforts in the upcoming years, as they were in Rome and made strong efforts to maintain Catholicism within those boundaries. Crusading didn't appear out of nothing; it required a catalyst for the behavior, like what i listed, is usually a common suspect.
Its background is in the Islamic Christian conflicts of Spain. Crusading was adopted from the Muslim idea of Jihad, as we things like naming customs (Spanish are the only Christians who name their children “Jesus”, after the Muslim “Muhammad”).
The political tensions that lead to the first crusade were between Arab Muslims and Byzantine Christian’s. Specifically, the Battle of Mazikirt made Christian Europe seem more vulnerable than it was.
The Papacy wasn’t at the forefront of the struggle against Islam. It was more worried about the Normans, Germans, and Greeks.
When the papacy was interested in Crusading it was for domestic reasons: getting rid of king so-and-so by making him go on crusade.
The situation was different in Spain where Islam was a constant threat, but the Papacy regarded Spain as an exotic foreign land (although Sylvester II was educated there).
It’s extremely misleading to view the pope as the leader of an anti-Muslim coalition. There really was no leader per se, but the reasons why kings went on crusade had little to do with fighting Islam.
Just look at how many monarchs showed up in Jerusalem, then headed straight home and spent the rest of their lives bragging about crusaders.
I’m 80% certain no pope ever set foot in Outremere.
If the US were to start invading Axis countries with WW2 being the justification we'd of course be the aggressors, and that was less than 100 years ago.
Similarly, it helps us understand all the examples of today of resentments and grudges over events that happened over a century ago that still motivate people politically.
It’s a very controversial opinion and stating as a just so fact needs challenging.
In 1992 a statue was erected to Harris in London, it was under 24 hour surveillance for several months due to protesting and vandalism attempts. I'm only mentioning this to highlight that there was quite a bit of push back specifically calling the gov out on a tribute to him; which usually doesn't happen if the person was well liked... not as an attempted killshot.
Even the RAF themselves state that there was quite a few who were critical on the first page of their assessment of Arthur Harris https://www.raf.mod.uk/what-we-do/centre-for-air-and-space-p...
Which is funny and an odd thing to say if you are widely loved/unquestioned by your people. Again just another occurrence of language from those who are on his side reinforcing the idea that there is, as you say is "very controversial", and maybe not a "vast majority" since those two things seem at odds with each other.
Not to mention that Harris targeted civilians, which is generally considered behavior of a war-criminal.
As an aside this talk page is a good laugh. https://en.wikipedia.org/wiki/Talk:Arthur_Harris/Archive_1
Although you are correct I should have used more accurate language instead of saying "considered" I should have said "considered by some".
If for any of these topics you do manage to get a summary you'd agree with from a (future or better-prompted?) LLM I'd like to read it. Particularly the first and third, the second is somewhat familiar and the fourth was a bit vague.
Rhodesia is a hard one; since the more I learn about it the more I feel terrible for both sides; I also do not support terrorism against a nation even if I believe they might not be in the right. However i hold by my disdain for how the British responded/withdrew from them effectively doomed Rhodesia making peaceful resolution essentially impossible.
The problem is, those that do study history are also doomed to watch it repeat.
The problem with this, is that people sometimes really do, objectively, wake up and device to be irrationally evil. It’s not every day, and it’s not every single person — but it does happen routinely.
If you haven’t experienced this wrath yourself, I envy you. But for millions of people, this is their actual, 100% honest truthful lived reality. You can’t rationalize people out of their hate, because most people have no rational basis for their hate.
(see pretty much all racism, sexism, transphobia, etc)
There's no short-term incentive to ever be right about it (and it's easy to convince yourself of both short-term and long-term incentives, both self-interested and altruistic, to actively lie about it). Like, given the training corpus, could I do a better job? Not sure.
All of us need to learn the basics about how to read history and historians critically and to know our the limitations which as you stated probably a tall task.
Which is why it's so terribly irresponsible to paint these """AI""" systems as impartial or neutral or anything of the sort, as has been done by hypesters and marketers for the past 3 years.
People _do_ just wake up and decide to be evil.
However not a justification, since I believe that what is happening today is truly evil. Same with another nation who entered a war knowing they'd be crushed, which is suicide; whether that nation is in the right is of little effect if most of their next generation has died.
It's not a criticism, the landscape moves fast and it takes time to master and personalize a flow to use an LLM as a research assistant.
Start with something such as NotebookLM.
They simply have limitations, especially on deep pointed subject matters where you want depth not breadth, and honestly I'm not sure why these limitations exist but I'm not working directly on these systems.
Talk to Gemini or ChatGPT about mental health things, thats a good example of what I'm talking about. As recently as two weeks ago my colleagues found that even when heavily tuned, they still managed to become 'pro suicide' if given certain lines of questioning.
That's fine. Recognize the limits of LLMs and don't use them in those cases.
Yet that is something you should be doing regardless of the source. There are plenty of non-reputable sources in academic libraries and there are plenty of non-reputable sources from professionals in any given field. That is particularly true when dealing with controversial topics or historical sources.
The quality varies wildly across models & versions.
With humans, the statement "my tutor was great" and "my tutor was awful" reflect very little on "tutoring" in general, and are barely even responses to each other withou more specificity about the quality of tutor involved.
Same with AI models.
I have no access to anthropic right now to compare that.
It’s an ongoing problem in my experience
I'd say that companies like Google and OpenAI are aware of the "reputable" concerns the Internet is expressing and addressing them. This tech is going to be, if not already is, very powerful for education.
[1] http://bit.ly/4mc4UHG
If you're really researching something complex/controversial, there may not be any
These things also apply to humans. A year or so ago I thought I’d finally learn more about the Israeli/Palestinians conflict. Turns out literally every source that was recommended to me by some reputable source was considered completely non-credible by another reputable one.
That said I’ve found ChatGPT to be quite good at math and programming and I can go pretty deep at both. I can definitely trip it into mistakes (eg it seems to use calculations to “intuit” its way around sometimes and you can find dev cases where the calls will lead it the wrong directions), but I also know enough to know how to keep it on rails.
That’s the single most important lesson by the way, that this conflict just has two different, mutually exclusive perspectives, and no objective truth (none that could be recovered FWIW). Either you accept the ambiguity, or you end up siding with one party over the other.
Then as you get more and more familiar you "switch" depending on the sub-issue being discussed, aka nuance
The problem is selective memory of these facts, and biased interpretation of those facts, and stretching the truth to fit pre-determined opinion
> to be quite good at math and programming
Since LLMs are essentially summarizing relevant content, this makes sense. In "objective" fields like math and CS, the vast majority of content aligns, and LLMs are fantastic at distilling the relevant portions you ask about. When there is no consensus, they can usually tell you that ("this is nuanced topic with many perspectives...", etc), but they can't help you resolve the truth because, from their perspective, the only truth is the content.
FWIW, the /r/AskHistorians booklist is pretty helpful.
https://www.reddit.com/r/AskHistorians/wiki/books/middleeast...
You don’t need to look more than 2 years back to understand why either camp finds the other non-reputable.
I've anecdotally found that real world things like these tend to be nuanced, and that sources (especially on the internet) are disincentivised in various ways from actually showing nuance. This leads to "side-taking" and a lack of "middle-ground" nuanced sources, when the reality lies somewhere in the middle.
Might be linked to the phenomenon where in an environment where people "take sides", those who display moderate opinions are simply ostracized by both sides.
Curious to hear people's thoughts and disagreements on this.
Moreover, the conflict is unfolding. What matters isn't what happened 100 years ago, or even 50 years ago, but what has happened recently and is happening. A neighbor of mine who recently passed was raised in Israel. Born circa 1946 (there's black & white footage of her as a baby aboard, IIRC, the ship Exodus 1947), she has vivid memories as a child of Palestinian Imams calling out from the mosques to "kill the Jews". She was a beautiful, kind soul who, for example, freely taught adult education to immigrants (of all sorts), but who one time admitted to me that she utterly despised Arabs. That's all you need to know, right there, to understand why Israel is doing what it's doing. Not so much what happened in the past to make people feel that way, but that many Israelis actually, viscerally feel this way today, justifiably or not but in any event rooted in memories and experiences seared into their conscience. Suffice it to say, most Palestinians have similar stories and sentiments of their own, one of the expressions of which was seen on October 7th.
And yet at the same time, after the first few months of the Gaza War she was so disgusted that she said she wanted to renounce her Israeli citizenship. (I don't know how sincere she was in saying this; she died not long after.) And, again, that's all you need to know to see how the conflict can be resolved, if at all; not by understanding and reconciling the history, but merely choosing to stop justifying the violence and moving forward. How the collective action problem might be resolved, within Israeli and Palestinian societies and between them... that's a whole 'nother dilemma.
Using AI/ML to study history is interesting in that it even further removes one from actual human experience. Hearing first hand accounts, even if anecdotal, conveys information you can't acquire from a book; reading a book conveys information and perspective you can't get from a shorter work, like a paper or article; and AI/ML summaries elide and obscure yet more substance.
Granted, that's probably well-trodden ground, to which model developers are primed to pay attention, and I'm (a) a relative novice with (b) very strong math skills from another domain (computational physics). So Chuck and I are probably both set up for success.
I'll tell you that I recently found it the best resource on the web for teaching me about the 30 Years War. I was reading a collection of primary source documents, and was able to interview ChatGPT about them.
Last week I used it to learn how to create and use Lehmer codes, and its explanation was perfect, and much easier to understand than, for example, Wikipedia.
I ask it about truck repair stuff all the time, and it is also great at that.
I don't think it's great at literary analysis, but for factual stuff it has only ever blown away my expectations at how useful it is.
Model Validation groups are one of the targets for LLMs.
It doesn’t cover the other aspects of finance, perhaps may be considered advanced (to a regular person at least) but less quantitative. Try having it reason out a “cigar butt” strategy and see if returns anything useful about companies that fit the mold from a prepared source.
Granted this isn’t quant finance modeling, but it’s a relatively easy thing as a human to do, and I didn’t find LLMs up to the task
No one builds multi shot search tools because they eat tokens like no ones business, but I've deployed them internal to a company with rave reviews at the cost of $200 per seat per day.
You must be using a free model like GPT-4o (or the equivalent from another provider)?
I find that o3 is consistently able to go deeper than me in anything I'm a nonexpert in, and usually can keep up with me in those areas where I am an expert.
If that's not the case for you I'd be very curious to see a full conversation transcript (in chatgpt you can share these directly from the UI).
I know it has nothing to do with this. I simply hit a wall eventually.
I unfortunately am not at liberty to share the chats though. They're work related (I very recently ended up at a place where we do thorny research).
A simple one though, is researching Israel - Palestine relations since 1948. It starts off okay (usually) but it goes off the rails eventually with bad sourcing, fictitious sourcing, and/or hallucinations. Sometimes I actually hit a wall where it repeats itself over and over and I suspect its because the information is simply not captured by the model.
FWIW, if these models had live & historic access to Reuters and Bloomberg terminals I think they might be better at a range of tasks I find them inadequate for, maybe.
Ask it for sources. The two things where LLMs excel is by filling the sources on some claim you give it (lots will be made up, but there isn't anything better out there) and by giving you queries you can search for some description you give it.
If its a subject you are just learning how can you possibly evaluate this?
Falling apart under pointed questioning, saying obviously false things, etc.
This generation of AI doesn't yet have the knowledge depth of a seasoned university professor. It's the kind of teacher that you should, eventually, surpass.
Blue team you throw out concepts and have it steelman them
Red team you can literally throw any kind of stress test at your idea
Alternate like this and you will learn
A great prompt is “give me the top 10 xyz things” and then you can explore
Back when I was in 2006 I used Wikipedia to prepare for job interviews :)
Learning a new programming language used to be mediated with lots of useful trips to Google to understand how some particular bit worked, but Google stopped being useful for that years ago. Even if the content you're looking for exists, it's buried.
I think the potential in this regard is limitless.
(Only thing missing is the model(s) you used).
The psychic reader near me has been in business for a long time. People are very convinced they've helped them. Logically, it had to have been their own efforts though.
In the process it helped me to learn many details about RA and NDP (Router Advertisments/Neighbor Discovery Protocol, which mostly replace DHCP and ARP from IPv4).
It made me realize that my WiFi mesh routers do quite a lot of things to prevent broadcast loops on the network, and that all my weird issues could be attributed to one cheap mesh repeater. So I replaced it and now everything works like a charm.
I had this setup for 5 years and was never able to figure out what was going on there, although I really tried.
I tried using YouTube to find walk through guides for how to approach the repair as a complete n00b and only found videos for unrelated problems.
But I described my issues and took photos to GPT O3-Pro and it was able to guide me and tell me what to watch out for.
I completed the repair (very proud of myself) and even though it failed a day later (I guess I didn’t re-seat well enough) I still feel far more confident opening it and trying again than I did at the start.
Cost of broken watch + $200 pro mode << Cost of working watch.
I find it odd that someone who has been to college would see this as a _bad_ way to learn something.
I'm not sold on LLMs being a replacement, but post-secondary was certainly enriched by having other people to ask questions to, people to bounce ideas off of, people that can say "that was done 15 years ago, check out X", etc.
There were times where I thought I had a great idea, but it was based on an incorrect conclusion that I had come to. It was helpful for that to be pointed out to me. I could have spent many months "paving forward", to no benefit, but instead someone saved me from banging my head on a wall.
Sure, you could pave forward, but realistically, you'll get much farther with either a good textbook or a good teacher, or both.
This requires a student to be actually interested in what they are learning tho, for others, who blindly trust its output, it can have adverse effects like the illusion of having understood a concept while they might have even mislearned it.
There seems to be a gap in problem solving abilities here...the process of breaking down concepts into easier to understand concepts and then recompiling has been around since forever...it is just easier to find those relationships now. To say it was impossible to learn concepts you are stuck on is a little alarming.
No, not really.
> Unless it was common enough to show up in a well formed question on stack exchange, it was pretty much impossible, and the only thing you can really do is keep paving forward and hope at some point, it'll make sense to you.
Your experience isn't universal. Some students learned how to do research in school.
From the parent comment:
> it was pretty much impossible ... hope at some point, it'll make sense to you
Not sure where you are getting the additional context for what they meant by "screwed", but I am not seeing it.
It’s exciting when I discover I can’t replicate something that is stated authoritatively… which turns out to be controversial. That’s rare, though. I bet ChatGPT knows it’s controversial, too, but that wouldn’t be as much fun.
I think this is the same thing with vibe coding, AI art, etc. - if you want something good, it's not the right tool for the job. If your alternative is "nothing," and "literally anything at all" will do, man, they're game changers.
* Please don't overindex on "shitty" - "If you don't need something verifiably high-quality"
I had to post the source code to win the dispute, so to speak.
If you are curious it was a question about the behavior of Kafka producer interceptors when an exception is thrown.
But I agree that it is hard to resist the temptation to treat LLM's as a pear.
Ever read mainstream news reporting on something you actually know about? Notice how it's always wrong? I'm sure there's a name for this phenomenon. It sounds like exactly the same thing.
On the other hand it told me you can't execute programs when evaluating a Makefile and you trivially can. It's very hit and miss. When it misses it's rather frustrating. When it hits it can save you literally hours.
I also use it to remember some python stuff. In rust, it is less good: makes mistakes.
In those two domains, at that level, it's really good.
It could help students I think.
[0] https://time.com/7295195/ai-chatgpt-google-learning-school/
As long as you can tell that you don’t deeply understand something that you just read, they are incredible TAs.
The trick is going to be to impart this metacognitive skill on the average student. I am hopeful we will figure it out in the top 50 universities.
It is hard to verify information that you are unfamiliar with. It would be like learning from a message board. Can you really trust what is being said?
You can replace "LLM" here with "human" and it remains true.
Anyone who has gone to post-secondary has had a teacher that relied on outdated information, or filled in gaps with their own theories, etc. Dealing with that is a large portion of what "learning" is.
I'm not convinced about the efficacy of LLMs in teaching/studying. But it's foolish to think that humans don't suffer from the same reliability issue as LLMs, at least to a similar degree.
For example, even if you craft the most detailed cursor rules, hooks, whatever, they will still repeatedly fuck up. They can't even follow a style guide. They can be informed, but not corrected.
Those are coding errors, and the general "hiccups" that these models experience all the time are on another level. The hallucinations, sycophancy, reward hacking, etc can be hilariously inept.
IMO, that should inform you enough to not trust these services (as they exist today) in explaining concepts to you that you have no idea about.
If you are so certain you are okay to trust these things, you should evaluate every assertion it makes for, say, 40 hours of use, and count the error rate. I would say it is above 30%, in my experience of using language models day to day. And that is with applied tasks they are considered "good" at.
If you are okay with learning new topics where even 10% of the instruction is wrong, have fun.
So what if the LLM is wrong about something. Human teachers are wrong about things, you are wrong about things, I am wrong about things. We figure it out when it doesn't work the way we thought and adjust our thinking. We aren't learning how to operate experimental nuclear reactors here, where messing up results in half a country getting irradiated. We are learning things for fun, hobbies, and self-betterment.
When I got stuck on a concept, I wasn't screwed: I read more; books if necessary. StackExchange wasn't my only source.
LLMs are not like TAs, personal or not, in the same way they're not humans. So it then follows we can actually contemplate not using LLMs in formal teaching environments.
It’s called basic research skills - don’t they teach this anymore in high school, let alone college? How ever did we get by with nothing but an encyclopedia or a library catalog?
I find it so much more intellectually stimulating then most of what I find online. Reading e.g. a 600 page book about some specific historical event gives me so much more perspective and exposure to different aspects I never would have thought to ask about on my own, or would have been elided when clipped into a few sentence summary.
I have gotten some value out of asking for book recommendations from LLMs, mostly as a starting point I can use to prune a list of 10 books down into a 2 or 3 after doing some of my research on each suggestion. But talking to a chatbot to learn about a subject just doesn’t do anything for me for anything deeper than basic Q&A where I simply need a (hopefully) correct answer and nothing more.
If you don't have access to a community like that learning stuff in a technical field can be practically impossible. Having an llm to ask infinite silly/dumb/stupid questions can be super helpful and save you days of being stuck on silly things, even though it's not perfect.
> most of us would have never gotten by with literally just a library catalog and encyclopedia.
I meant the opposite, perhaps I phrased it poorly. Back in the day we would get by and learn new shit by looking for books on the topic and reading them (they have useful indices and tables of contents to zero in on what you need and not have to read the entire book). An encyclopedia was (is? Wikipedia anyone?) a good way to get an overview of a topic and the basics before diving into a more specialized book.
I haven't tested them on many things. But in the past 3 weeks I tried to vibe code a little bit VHDL. On the one hand it was a fun journey, I could experiment a lot and just iterated fast. But if I was someone who had no idea about hardware design, then this trash would've guided me the wrong way in numerous situations. I can't even count how many times it has built me latches instead of clocked registers (latches bad, if you don't know about it) and that's just one thing. Yes I know there ain't much out there (compared to python and javascript) about HDLs, even less regarding VHDL. But damn, no no no. Not for learning. never. If you know what you're doing and you have some fundamental knowledge about the topic, then it might help to get further, but not for the absolute essentials, that will backfire hard.
Pre-LLM, even finding the ~5 textbooks with ~3 chapters each that decently covered the material I want was itself a nontrivial problem. Now that problem is greatly eased.
They can recommend many unknown books as well, as language models are known to reference resources that do not exist.
And that's a bad thing. Nothing can replace the work in learning, the moments where you don't understand it and have to think until it hurts and until you understand. Anything that bypasses this (including, for uni students, leaning too heavily on generous TAs) results in a kind of learning theatre, where the student thinks they've developed an understanding, but hasn't.
Experienced learners already have the discipline to use LLMs without asking too much of them, the same way they learned not to look up the answer in the back of the textbook until arriving at their own solution.
No comments yet
And which just makes things up (with the same tone and confidence!) at random and unpredictable times.
Yeah apart from that it's just like a knowledgeable TA.
How do you know when it's bullshitting you though?
Sometimes right away, something sounds wrong. Sometimes when I try to apply the knowledge and discover a problem. Sometimes never, I believe many incorrect things even today.
Since when was it acceptable to only ever look at a single source?
Given that humanity has been able to go from living in caves to sending spaceships to the moon without LLMs, let me express some doubt about that.
Even without going further, software engineering isn't new and people have been stuck on concepts and have managed to get unstuck without LLMs for decades.
What you gain in instant knowledge with LLMs, you lose in learning how to get unstuck, how to persevere, how to innovate, etc.
Regarding LLMs, they can also stimulate thinking if used right.
We were able to learn before LLMs.
Libraries are not a new thing. FidoNet, USENET, IRC, forums, local study/user groups. You have access to all of Wikipedia. Offline, if you want.
I think it's accurate to say that if I had to do that again, I'm basically screwed.
Asking the LLM is a vastly superior experience.
I had to learn what my local library had, not what I wanted. And it was an incredible slog.
IRC groups is another example--I've been there. One or two topics have great IRC channels. The rest have idle bots and hostile gatekeepers.
The LLM makes a happy path to most topics, not just a couple.
Not to be overly argumentative, but I disagree, if you're looking for a deep and ongoing process, LLMs fall down, because they can't remember anything and can't build upon itself in that way. You end up having to repeat alot of stuff. They also don't have good course correction (that is, if you're going down the wrong path, it doesn't alert you, as I've experienced)
It also can give you really bad content depending on what you're trying to learn.
I think for things that represent themselves as a form of highly structured data, like programming languages, there's good attunement there, but you start talking about trying to dig around about advanced finance, political topics, economics, or complex medical conditions the quality falls off fast, if its there at all
It was way nicer than a book.
That's the experience I'm speaking from. It wasn't perfect, and it was wrong sometimes, sure. A known limitation.
But it was flexible, and it was able to do things like relate ideas with programming languages I already knew. Adapt to my level of understanding. Skip stuff I didn't need.
Incorrect moments or not, the result was i learned something quickly and easily. That isn't what happened in the 90s.
But that's the entire problem and I don't understand why it's just put aside like that. LLMs are wrong sometimes, and they often just don't give you the details and, in my opinion, knowing about certain details and traps of a language is very very important, if you plan on doing more with it than just having fun. Now someone will come around the corner and say 'but but but it gives you the details if you explicitly ask for them'. Yes, of course, but you just don't know where important details are hidden, if you are just learning about it. Studying is hard and it takes perseverance. Most textbooks will tell you the same things, but they all still differ and every author usually has a few distinct details they highlight and these are the important bits that you just won't get with an LLM
Nobody can write an exhaustive tome and explore every feature, use, problem, and pitfall of Python, for example. Every text on the topic will omit something.
It's hardly a criticism. I don't want exhaustive.
The llm taught me what I asked it to teach me. That's what I hope it will do, not try to caution me about everything I could do wrong with a language. That list might be infinite.
How can you know this when you are learning something? It seems like a confirmation bias to even have this opinion?
Perhaps the most famous example of this is Warren Buffet. For years Buffet missed out on returns from the tech industry [1] because he avoided investing in tech company stocks due to Berkshire's long standing philosophy to never invest in companies whose business model he doesn't understand.
His light bulb moment came when he used his understanding of a business he understood really well i.e. their furniture business [3] to value Apple as a consumer company rather than as a tech company leading to a $1bn position in Apple in 2016 [2].
[0] https://en.wikipedia.org/wiki/Transfer_of_learning
[1] https://news.ycombinator.com/item?id=33612228
[2] https://www.theguardian.com/technology/2016/may/16/warren-bu...
[3] https://www.cnbc.com/2017/05/08/billionaire-investor-warren-...
It's entirely possible they learned nothing and they're missing huge parts.
But we're sort of at the point where in order to ignore their self-reported experience, we're asking philosophical questions that amount to "how can you know you know if you don't know what you don't know and definitely don't know everything?"
More existentialism than interlocution.
If we decide our interlocutor can't be relied upon, what is discussion?
Would we have the same question if they said they did it from a book?
If they did do it from a book, how would we know if the book they read was missing something that we thought was crucial?
I was attempting to imply that with high-quality literature, it is often reviewed by humans who have some sort of knowledge about a particular topic or are willing to cross reference it with existing literature. The reader often does this as well.
For low-effort literature, this is often not the case, and can lead to things like https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect where a trained observer can point out that something is wrong, but an untrained observer cannot perceive what is incorrect.
IMO, this is adjacent to what human agents interacting with language models experience often. It isn't wrong about everything, but the nuance is enough to introduce some poor underlying thought patterns while learning.
That's totally different than saying they are not flawless but they make learning easier than other methods, like you did in this comment
It also doesn't seem to do a good job of building on "memory" over time. There appears to be some unspoken limit there, or something to that affect.
Figuring out 'make' errors when I was bad at C on microcontrollers a decade ago? (still am) Careful pondering of possible meanings of words... trial and error tweaks of code and recompiling in hopes that I was just off by a tiny thing, but 2 hours later and 30 attempts later, and realizing I'd done a bad job of tracking what I'd tried and hadn't? Well, made me better at being careful at triaging issues. But it wasn't something I was enthusiastic to pick back up the next weekend, or for the next idea I had.
Revisiting that combination of hardware/code a decade later and having it go much faster with ChatGPT... that was fun.
Like, I agree with you and I believe those things will resist and will always be important, but it doesn't really compare in this case.
Last week I was in the nature and I saw a cute bird that I didn't know. I asked an AI and got the correct answer in 10 seconds. Of course I would find the answer at the library or by looking at proper niche sites, but I would not have done it because I simply didn't care that much. It's a stupid example but I hope it makes the point
We were able to learn before the invention of writing, too!
The internet, and esp. stack exchange is a horrible place to learn concepts. For basic operational stuff, sure that works, but one should mostly be picking up concepts form books and other long form content. When you get stuck it's time to do three things:
Incorporate a new source that covers the same material in a different way, or at least from a different author.
Sit down with the concept and write about it and actively try to reformulate it and everything you do/don't understand in your own words.
Take a pause and come back later.
Usually one of these three strategies does the trick, no llm required. Obviously these approaches require time that using an LLM wouldn't. I have a suspicion doing it this way will also make it stick in long term memory better, but that's just a hunch.
sorry but if you've gone to university, in particular at a time when internet access was already ubiquitous, surely you must have been capable to find an answer to a programming problem by consulting documentation, manual, or tutorials which exist on almost any topic.
I'm not saying the chatbot interface is necessarily bad, it might be more engaging, but it literally does not present you with information you couldn't have found yourself.
If someone has a computer science degree and tells me without stack exchange they can't find solutions to basic problems that is a red flag. That's like the article about the people posted here who couldn't program when their LLM credits ran out
Closed: RTFM, dumbass
<No activity for 8 years, until some random person shows up and asks "Hey did you figure it out?">
I really do write that stuff for myself, turns out.
J. Random Hacker: Why are you doing it like that?
Newb: I have <xyz> constraint in my case that necessitates this.
J. Random Hacker: This is a stupid way to do it. I'm not going to help you.
So much that first method would take me an hour as opposed to an entire evening when reading/repeating.
Having such a tool would have been a game changer to me.
I don’t know tho if it’s possible to throw at it entire chapter of learning book.
Representative snippet:
> DO NOT GIVE ANSWERS OR DO HOMEWORK FOR THE USER. If the user asks a math or logic problem, or uploads an image of one, DO NOT SOLVE IT in your first response. Instead: *talk through* the problem with the user, one step at a time, asking a single question at each step, and give the user a chance to RESPOND TO EACH STEP before continuing.
Will also reduce the context rot a bit.
How exactly you do it is often arbitrary/interchangeable, but it definitely does have an effect, and is crucial to getting LLMs to follow instructions reliably once prompts start getting longer and more complex.
Not saying it is indeed reality, but it could simple be programmed to return a different prompt from the original, appearing plausible, but perhaps missing some key elements.
But of course, if we apply Occam's Razor, it might simply really be the prompt too.
Tokens are expensive. How much of your system prompt do you want to waste on dumb tricks trying to stop your system prompt from leaking?
The other chunk of time, to me anyway, seems to be creating a mental model of the subject matter, and when you study something well you have a strong grasp on the forces influencing cause and effect within that matter. It's this part of the process that I would use AI the least, if I am to learn it for myself. Otherwise my mental model will consist of a bunch of "includes" from the AI model and will only be resolvable with access to AI. Personally, I want a coherent "offline" model to be stored in my brain before I consider myself studied up in the area.
This is a good thing in many levels.
Learning how to search is (was) a good skill to have. The process of searching itself also often leads to learning tangentially related but important things.
I'm sorry for the next generations that won't have (much of) these skills.
I don’t think it’s so valuable now that you’re searching through piles of spam and junk just to try find anything relevant. That’s a uniquely modern-web thing created by Google in their focus of profit over user.
Unless Google takes over libraries/books next and sells spots to advertisers on the shelves and in the books.
In the same way that I never learnt the Dewey decimal system because digital search had driven it obsolete. It may be that we just won't need to do as much sifting through spam in the future, but being able to finesse Gemini into burping out the right links becomes increasingly important.
Most people don’t know how to do this.
Does it offer meaningful benefits to students over self directed study?
Does it out perform students who are "learning how to learn"?
What affect does allowing students to make mistakes have compared to being guided through what to review?
I would hope Study Mode would produce flash card prompts and quantize information for usage in spaced repetition tools like Mochi [1] or Anki.
See Andy's talk here [2]
[1] https://mochi.cards
[2] https://andymatuschak.org/hmwl/
They want a student to use it and say “I wouldn’t have learned anything without study mode”.
This also allows them to fill their data coffers more with bleeding edge education. “Please input the data you are studying and we will summarize it for you.”
Not to be contrarian, but do you have any evidence of this assertion? Or are you just confidently confabulating a response for something outside of the data you've been exposed to? Because a commentor below provided a study that directly contradicts this.
This isn't study mode, it's a different AI tutor, but:
"The median learning gains for students, relative to the pre-test baseline (M = 2.75, N = 316), in the AI-tutored group were over double those for students in the in-class active learning group."
"The occurrence of inaccurate “hallucinations” by the current [LLMs] poses a significant challenge for their use in education. [...] we enriched our prompts with comprehensive, step-by-step answers, guiding the AI tutor to deliver accurate and high-quality explanations (v) to students. As a result, 83% of students reported that the AI tutor’s explanations were as good as, or better than, those from human instructors in the class."
Not at all dismissing the study, but to replicate these results for yourself, this level of gain over a classroom setting may be tricky to achieve without having someone make class materials for the bot to present to you first
Edit: the authors further say
"Krupp et al. (2023) observed limited reflection among students using ChatGPT without guidance, while Forero (2023) reported a decline in student performance when AI interactions lacked structure and did not encourage critical thinking. These previous approaches did not adhere to the same research-based best practices that informed our approach."
Two other studies failed to get positive results at all. YMMV a lot apparently (like, all bets are off and your learning might go in the negative direction if you don't do everything exactly as in this study)
However consider the extent to which LLMs make the learning process more enjoyable. More students will keep pushing because they have someone to ask. Also, having fun & being motivated is such a massive factor when it comes to learning. And, finally, keeping at it at 50% the speed for 100% the material always beats working at 100% the speed for 50% the material. Who cares if you're slower - we're slower & faster without LLMs too! Those that persevere aren't the fastest; they're the ones with the most grit & discipline, and LLMs make that more accesible.
It concludes theres a learning curve that generally takes about 50 hours of time to figure out. The data shows that the one engineer who had more than 50 hours of experience with Cursor actually worked faster.
This is largely my experience, now. I was much slower initially, but I've now figured out the correct way to prompt, guide, and fix the LLM to be effective. I produce way more code and am mentally less fatigued at the end of each day.
(Qualifications: I was a reviewer on the METR study.)
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
I believe we'll see the benefits and drawbacks of AI augmentation to humans performing various tasks will vary wildly based on the task, the way the AI is being asked to interact, and the AI model.
Like yeah, if you’ve only ever used an axe you probably don’t know the first thing about how to use a chainsaw, but if you know how to use a chainsaw you’re wiping the floor with the axe wielders. Wholeheartedly agree with the rest of your comment; even if you’re slow you lap everyone sitting on the couch.
I bring this up because the way I see students "study" with LLMs is similar to this misapplication of tutoring. You try something, feel confused and lost, and immediately turn to the pacifier^H^H^H^H^H^H^H ChatGPT helper to give you direction without ever having to just try things out and experiment. It means students are so much more anxious about exams where they don't have the training wheels. Students have always wanted practice exams with similar problems to the real one with the numbers changed, but it's more than wanting it now. They outright expect it and will write bad evals and/or even complain to your department if you don't do it.
I'm not very optimistic. I am seeing a rapidly rising trend at a very "elite" institution of students being completely incapable of using textbooks to augment learning concepts that were introduced in the classroom. And not just struggling with it, but lashing out at professors who expect them to do reading or self study.
unfortunately that group is tiny and getting tinier due to dwindling attention span.
Helping you parse notation, especially in new domains, is insanely valuable. I do a lot of applied math in statistics/ML, but when I open a physics book the notation and comfort with short hand is a real challenge (likewise I imagine the reverse is equally as annoying). Having an LLM on demand to instantly clear up notation is a massive speed boost.
Reading German Idealist philosophy requires an enormous amount of context. Being able to ask an LLM questions like "How much of this section of Mainländer is coming directly from Schopenhauer?" is a godsend in helping understand which parts of the writing a merely setting up what is already agreed upon vs laying new ground.
And the most important for self study: verifying your understanding. Backtracking because you misunderstood a fundamental concept is a huge time sync in self study. Now, every time I read a formula I can go through all of my intuitions and understanding about it, write them down, and verify. Even a "not quite..." from an LLM is enough to make me realize I need to spend more time on that section.
Books are still the highest density information source and best way to learn, but LLMs can do a lot to accelerate this.
It's my primary fear building anything on these models, they can just come eat your lunch once it looks yummy enough. Tread carefully
No comments yet
During the early days of tech, was there prevailing wisdom that software companies would never be able to compete with hardware companies because the hardware companies would always be able to copy them and ship the software with the hardware?
Because I think it's basically the analogous situation. People assume that the foundation model providers have some massive advantage over the people building on top of them, but I don't really see any evidence for this.
True, and worse, they're hungry because it's increasingly seeming like "hosting LLMs and charging by the token" is not terribly profitable.
I don't really see a path for the major players that isn't "Sherlock everything that achieves traction".
> In the computing verb sense, refers to the software Sherlock, which in 2002 came to replicate some of the features of an earlier complementary program called Watson.[1]
[1] https://en.wiktionary.org/wiki/Sherlock
As long as features like Study Mode are little more than creative prompting, any provider will eventually be able to offer them and offer token-based charging.
- From what I can see many products are rapidly getting past "just prompt engineering the base API". So even though a lot of these things were/are primitive, I don't think it's necessarily a good bet that they will remain so. Though agree in principle - thin API wrappers will be out-competed both by cheaper thin wrappers, or products that are more sophisticated/better than thin wrappers.
- This is, oddly enough, a scenario that is way easier to navigate than the rest of the LLM industry. We know consumer apps, we know consumer apps that do relatively basic (or at least, well understood) things. Success/failure then is way less about technical prowess and more about classical factors like distribution, marketing, integrations, etc.
A good example here is the lasting success of paid email providers. Multiple vendors (MSFT, GOOG, etc.) make huge amounts of money hosting people's email, despite it being a mature product that, at the basic level, is pretty solved, and where the core product can be replicated fairly easily.
The presence of open source/commodity commercial offerings hasn't really driven the price of the service to the floor, though the commodity offerings do provide some pricing pressure.
Most people I saw offer self-hosted emails for groups (student groups etc), it ended up a mess. Compare all that to say ollama, which makes self-hosting LLMs trivial, and they’re stateless.
So I’m not sure email is a good example of commodity not bringing price to the floor.
If you want to try and make a quick buck, fine, be quick and go for whatever. If you plan on building a long term business, don't do the most obvious, low effort low hanging fruit stuff.
These days they’ve pivoted to a more enterprise product and are still chugging along.
Have you considered using the LLM to give tests/quizzes (perhaps just conversationally) in order to measure progress and uncover weak spots?
I've also been playing around with adapting content based on their results (e.g. proactively nudging complexity up/down) but haven't gotten it to a good place yet.
Only feedback I have so far is that it would be nice to control the playback speed of the 'read aloud' mode. I'd like it to be a little bit faster.
I've been working on it on-and-off for about a year now. Roughly 2-3 months if I worked on it full-time I'm guessing.
re: playback speed -> noted, will add some controls tomorrow
It's still a work in progress but we are trying to make it better everyday
No comments yet
* for each statement, give you the option to rate how well you understood it. Offer clarification on things you didn't understand
* present knowledge as a tree that you can expand to get deeper
* show interactive graphs (very useful for mathy things when can you easily adjust some of the parameters)
* add quizzes to check your understanding
... though I could well imagine this being out of scope for ChatGPT, and thus an opportunity for other apps / startups.
I'm very interested in this. I've considered building this, but if this already exists, someone let me know please!
It's a great tutor for things it knows, but it really needs to learn its own limits
Things well-represented in its training datasets. Basically React todo list, bootstrap form, tic-tac-toe in vue
If LLMs continue to improve, we are going to be learning a lot from them, they will be our internet search and our teachers. If we want to retain some knowledge for ourselves, then we are going to need to learn and memorize things for ourselves.
Integrating spaced-repetition could make it explicit which things we want to offload to the LLM, and which things we want to internalize. For example, maybe I use Python a lot, and occasionally use Pearl, and so I explictly choose to memorize some Python APIs, but I'm happy to just ask the LLM for reminders whenever I use Pearl. So I ask the LLM to setup some spaced repetition whenever it teaches me something new about Python, etc.
The spaced repetition could be done with voice during a drive or something. The LLM would ask the questions for review, and then judge how well we did in answering, and then the LLM would depend on the spaced-repetition algorithm to keep track of when to next review.
It seems like study mode is basically just a different system prompt but otherwise the exact same model? So there's not really any new benefit to anyone who was already asking for ChatGPT to help them study step by step instead of giving away whole answers.
Seems helpful to maybe a certain population of more entry level users who don't know to ask for help instead of asking for a direct answer I guess, but not really a big leap forward in technology.
[0] https://github.com/JushBJJ/Mr.-Ranedeer-AI-Tutor
Wonder what the compensation for this invaluable contribution was
but even with this feature in this very early state, it seems quite useful. i dropped in some slides from a class and pretended to be a student, and it handled questions reasonably. Right now it seems I will be happy for my students to use this.
taking a wider perspective, I think it is a good sign that OpenAI is culturally capable of making a high-friction product that challenges and frustrates, yet benefits, the user. hopefully this can help with the broader problem of sycophancy.
Then I tried to migrate it to chat gpt to try this thing out, but seems to be like it’s just prompt engineering behind. Nothing fancy.
And this study mode is not only not available in chat gpt projects, which students need for adding course work, notes, transcripts.
Honestly, just release gpt-5!!!
LLM second killer application is for studying for a particular course or subject in which OpenAI ChatGPT is also now providing the service. Probably not the pioneer but most probably one of the significant providers upon this announcement. If in the near future GenAI study assistant can adopt and adapt 3 Blue One Brown approaches for more visualization, animation and interactive learning it will be more intuitive and engaging.
Please check this excellent LLM-RAG AI-driven course assistant at UIUC for an example of university course [1]. It provide citations and references mainly for the course notes so the students can verify the answers and further study the course materials.
[1] AI-driven chat assistant for ECE 120 course at UIUC (only 1 comment by the website creator):
https://news.ycombinator.com/item?id=41431164
I would think you'd want to make something a little more bespoke to make it a fully-fledged feature, like interactive quizzes that keep score and review questions missed afterwards.
I made A deep research assistant for families. Children can ask questions to explain difficult concepts and for parents to ask how to deal with any parenting situation. For example a 4 year old may ask “why does the plate break when it falls?”
example output: https://www.studyturtle.com/ask/PJ24GoWQ-pizza-sibling-fight...
app: https://www.studyturtle.com/ask/
Show HN: https://news.ycombinator.com/item?id=44723280
I ask because every serious study on using modern generative AI tools tends to conclude fairly immediate and measurable deleterious effects on cognitive ability.
Happy Tuesday!
There's a lot of specificity that AI can give over human instruction however it still suffers from lack of rigor and true understanding. If you follow well-trod paths its better but that negates the benefit.
The future is bright for education though.
Sure, for some people it will be insanely good: you can go for as stupid questions as you need without feeling judgement, you can go deeper in specific topics, discuss certain things, skip some easy parts, etc.
But we are talking about averages. In the past we thought that the collective human knowledge available via the Internet will allow everyone to learn. I think it is fair to say that it didn't change much in the grand scheme of things.
(Joke/criticism intended)
For example, the answer to a question was "Laocoön" (the guy who said 'beware of Greeks bearing gifts') and I put "Solon" (who was a Greek politician) and I got "You’re really close!"
Is it close, though?
https://arxiv.org/abs/2409.15981
it is definitely a great use case for LLMs, and challenges the assumption that LLMs can only “increase brain rot” so to say.
Sure, it was crafted by educational experts, but this is not a feature! It's a glorified constant!
There is no way to learn without effort. I understand they are not claiming this, but many students want a silver bullet. There isn't one.
Same problem exists for all educational apps. Duolingo users have the goal of learning a language, but also they only want to use Duolingo for a few minutes a day, but also they want to feel like they're making progress. Duolingo's goal is to keep you using Duolingo, and if possible it'd be good for you to learn the language, but their #1 goal is to keep you coming back. Oddly, Duolingo might not even be wrong to focus primariliy on keeping you moving forward, given how many people give up when learning a new language.
So, unless you have experience with this products that contradicts their claims, it's a good tutor by your definition.
The criticism of cliff's notes is generally that it's a superficial glance. It can't go deeper, it's basically a summary.
The LLM is not that. It can zoom in and out of a topic.
I think it's a poor criticism.
I don't think it's a silver bullet for learning, but it's a unified, consistent interface across topics and courses.
Sure, but only as long as you're not terribly concerned with the result being accurate, like that old reconstruction of Obama's face from a pixelated version [1] but this time about a topic for which one is, by definition, not capable of identifying whether the answer is correct.
[1] https://www.theverge.com/21298762/face-depixelizer-ai-machin...
It's unlikely to make up the same bullshit twice.
Usually exploring a topic in depth finds these issues pretty quickly.
If LLM's got better at just responding with: "I don't know", I'd have less of an issue.
Some topics you learn to beware and double check. Or ask it to cite sources. (For me, that's car repair. It's wrong a lot.)
I wish it had some kind of confidence level assessment or ability to realize it doesn't know, and I think it eventually will have that. Most humans I know are also very bad at that.
unavoidably, people who don't want to work, won't push the "work harder" button.
I am not an LLM guy but as far as I understand, RLHF did a good job converting a base model into a chat model (instruct based), a chat/base model into a thinking model.
Both of these examples are about the nature of the response, and the content they use to fill the response. There are so many differnt ways still pending to see how these can be filled.
Generating an answer step by step and letting users dive into those steps is one of the ways, and RLHF (or the similar things which are used) seems a good fit for it.
Prompting feels like a temporary solution for it like how "think step by step" was first seen in prompts.
Also, doing RLHF/ post training to change these structures also make it moat/ and expensive. Only the AI labs can do it
Having experience teaching the subject myself, what I saw on that page is about the first five minutes of the first class of the semester at best. The devil will very much be in the other 99% of what you do.
In the old days of desktop computing, a lot of projects were never started because if you got big enough, Microsoft would just implement the feature as part of Windows. In the more recent days of web computing, a lot of projects were never started, for the same reason, except Google or Facebook instead of Microsoft.
Looks like the AI provider companies are going to fill the same nefarious role in the era of AI computing.
I would not use it if it was for something with a strictly correct answer.
I believed competitors would rush to copy all great things that ChatGPT offers as a product, but surprisingly that hasn’t been the case so far. I wonder why they seemingly don’t care about that.
Is adding more buttons in a dropdown the best way to communicate with an LLM? I think the concept is awesome. Just like how Operator was awesome but it lived on an entirely different website!
I ultimately dropped the course and took it in the summer at a community college where we had the 20-30 standard practice problem homework where you apply what you learned in class and grind problems to bake it into core memory.
AI would have helped me at least get through the uni course. But generally I think it's a problem with the school/class itself if you aren't learning most of what you need in class.
These groups were some of the most valuable parts of the university experience for me. We'd get take-out, invade some conference room, and slam our heads against these questions well into the night. By the end of it, sure... our answers looked superficially similar, but it was because we had built a mutual, deep understanding of the answer—not just copying the answers.
Even if you had only a rough understanding, the act of trying to teach it again to others in the group made you both understand it better.
And we literally couldn't figure it out. Or the group you were in didn't have a physics rockstar. Or you weren't so social or didn't know anyone or you just missed an opportunity to find out where anyone was forming a group. It's not like the groups were created by the class. I'd find myself in a group of a few people and we just couldn't solve it even though we knew the lecture material.
It was a negative value class that cost 10x the price of the community college course yet required you to teach yourself after a lecture that didn't help you do the homework. A total rip-off.
Anyways, AI is a value producer here instead of giving up and getting a zero on the homework.
e: if you mean university, fair. that'll be an interesting transition. I guess then you pay for the sports team and amenities?
In the US at least, most kids are in public schools and the collective community foots the bill for the “daycare”, as you put it.
When the former students ask questions, I answer most of them by pointing at the relevant passage in their book/notes, questioning their interpretation of what the book says, or giving them a push to actually problem-solve on their own. On rare occasions the material is just confusing/poorly written and I'll decide to re-interpret it for them to help. But the fundamental problems are usually with study habits or reading comprehension, not poor explanations. They need to question their habits and their interpretation of what other people say, not be spoon fed more personally-tailored questions and answers and analogies and self-help advice.
Besides asking questions to make sure I understand the situation, I mostly repeat the same ten phrases or so. Finding those ten phrases was the hard part and required a bit of ingenuity and trial-and-error.
As for the latter students, they mostly care about passing and moving on, so arguing about the merits of such a system is fairly pointless. If it gets a good enough grade on their homework, it worked.
The main issue is that chats are just bad UX for long form learning. You can't go back to a chat easily, or extend it in arbitrary directions, or easily integrate images, flashcards, etc etc.
I worked on this exact issue for Periplus and instead landed on something akin to a generative personal learning Wikipedia. Structure through courses, exploration through links, embedded quizzes, etc etc. Chat is on the side for interactions that do benefit from it.
Link: periplus.app
When it just gives me the answer, I usually understand but then find that my long-term retention is relatively poor.
> The part Margie hated most was the slot where she had to put homework and test papers. She always had to write them out in a punch code they made her learn when she was six years old, and the mechanical teacher calculated the mark in no time.
> Under the hood, study mode is powered by custom system instructions we’ve written in collaboration with teachers, scientists, and pedagogy experts to reflect a core set of behaviors that support deeper learning including: encouraging active participation, managing cognitive load, proactively developing metacognition and self reflection, fostering curiosity, and providing actionable and supportive feedback.
I'm calling bullshit, show me the experts, I want to see that any qualified humans actually participated in this. I think they did their "collaboration" in ChatGPT which spit out this list.
If you're the other 90% of students that are only learning to check the boxes and get through the courses to get the qualification at the end... are you going to bother using this?
Of course, maybe this is "see, we're not trying to kill education... promise!"
Just like it's easier to be productive if you have a separate home office and couch, because of the differing psychological contexts, it's easier if you have a separate context for "just give me answers" and "actually teach me the thing".
Also, I don't know about you, but (as a professional) even though I actively try to learn the principals behind the code generated, I don't always want to spend the effort prompting the model away from the "just give me results with a simple explanation" personality I've cultivated. It'd be nice having a mode with that work done for me.
I used to have to prompt it to do this everytime. This will be way easier!
EDIT: literally saw it just now after refreshing. I guess they didn't roll it out immediately to everyone.
When I ask ChatGPT* questions about things I don’t know much about it sounds like a genius.
When I ask it about things I’m an expert in, at best it sounds like a tech journalist describing how a computer works. At worst it is just flat out wrong.
* yes I’ve tried the latest models and I use them frequently at work
Importantly, these were _not_ critical questions that I was incorporating into any decision-making, so I wasn't having to double-check the AI's answers, which would make it tedious; but it's a great tool for satisfying curiosity.
human: damn kids are using this to cheat in school
openai: release an "app"/prompt that seems really close to solving this stated problem
kids: I never wanted to learn anything, I just want to do bare minimum to get my degree, let my parents think they are helping my future, and then i can get back to ripping that bong
<world continues slide into dunce based oblivion>
It doesn't matter the problem statement: the 80% or less solution seems can be made and rather quickly. Such a huge percentage of the population judges technology solutions as "good enough" way lower than they should. This is even roping in people from the past who used to be a higher level of "rigorous correctness" because they keep thinking, "damn just a bit more work and it will get infinity better, lets create the biggest economic house of cards this world will ever collapse under"
Btw most people don't know but Anthropic did something similiar months ago but their product heads messed up the launch by keeping it locked up only for american edu institutions. Openai copies almost everything Anthropic does and vice versa (see claude code / codex ).
- study mode (this announcement)
- office suite (https://finance.yahoo.com/news/openai-designs-rival-office-w...)
- sub-agents (https://docs.anthropic.com/en/docs/claude-code/sub-agents)
When they announce VR glasses or a watch, we'd known we've gone full circle and the hype is up.
A more thought through product version of that is only a good thing imo.
i don't get it.
Yes, if my teacher could split into a million of themselves and compete against me on the job market at $200/mo.
Why do we even bother to learn if AI is going to solve everything for us?
If the promised and fabled AGI is about to approach, what is the incentive or learning to deal with these small problems?
Could someone enlighten me? What is the value of knowledge work?
You're also assuming that AGI will help you or us. It could just as easily only help a select group of people and I'd argue that this is the most likely outcome. If it does help everybody and brings us to a new age, then the only reason to learn will be for learning's sake. Even if AI makes the perfect novel, you as a consumer still have to read it, process it and understand it. The more you know the more you can appreciate it.
But right now, we're not there. And even if you think it's only 5-10y away instead of 100+, it's better to learn now so you can leverage the dominant tool better than your competition.
> It could just as easily only help a select group of people and I'd argue that this is the most likely outcome
Currently it is only applicable to us who are programming!
Yeah, even if it gets away all the quirks, using it would still be better.
"The mind is not a vessel to be filled, but a fire to be kindled." — Plutarch
"Education is not preparation for life; education is life itself." — John Dewey
"The important thing is not to stop questioning. Curiosity has its own reason for existing." — Albert Einstein
In order to think complex thoughts, you need to have building blocks. That's why we can think of relativity today, while nobody on Earth was able to in 1850.
May the future be even better than today!
Most people don't learn to live, they live and learn. Sure learning is useful, but I am genuinely curious why people overhype it.
Imagine you being able to solve math olympiad and get a gold. Will it change your life in objectively better way?
Will you learning about the physics help you solve millennium problems?
These takes practices, there are lot of gatekeeping. The whole idea of learning is for wisdom not knowledge.
So maybe we differ in perspective. I just don't see the point when there are agents that can do it.
Being creative requires taking action. The learning these day is mere consumption of information.
Maybe this is me. But meh.
Apart from that, I do think that AI makes a lot of traditional teaching obsolete. Depending on your field, much of university studies is just memorizing content and writing essays / exam answers based on that, after which you forget most of it. That kind of learning, as in accumulation of knowledge, is no longer very useful.