Sprinkling self-doubt on ChatGPT

129 ingve 79 8/22/2025, 5:45:27 PM justin.searls.co ↗

Comments (79)

trjordan · 4h ago
We've been building out our agent [0], and we've found this to be the case.

We actually dialed it back a bunch, because it feels _terrible_. Yes, you get more correct answers, but it's more akin to giving the agent anxiety. Especially with agents that have access to tools, they'll burn enormous amounts of time on tool calls, trying to get enough information to overcome a motivation that's essentially burned into its identity.

(We saw one conversation where it just browsed social media instead of looking at the code for like 5 minutes, which ... you know, I get it.)

It's been much more effective to make uncertainty or further exploration be part of the agents success criteria.

- BAD: "Critique your own thoughts" -> leads to the agent trying really hard to get it right, but still not willing to actually be wrong

- GOOD: "Expose where your thoughts are unsupported or could benefit from further information" -> leads to the agent producing high-quality results, with loose ends that the user can choose to incorporate, ignore, or correct.

That prompt, combined with dialing up the thinking (either with API or prompt tuning) works much better, because it's sidestepping the training and tuning that's implicitly encouraged it to sound correct at all times.

[0] https://tern.sh, code migration AI

searls · 4h ago
Yeah this is a great comment and moves it forward. Will incorporate this into my personalization stanza.

I agree, I think one reason this strategy has not helped me with Claude Code is that it just leads to endless spinning. Not just tool churn, but ten, twenty, thirty revisions of a file listing that should've just been tested and declared done much sooner. By the time Claude gets around to _executing the code_ it's so far gone it can't even get back to successful compilation.

p.s. ChatGPT's hyperbole with this kind of personalization enabled is extremely embarrassing to read. It routinely opens paragraphs with things like "my skepticism is screaming" and I cringe pretty hard as I copy paste the working code underneath it.

shinycode · 2h ago
Anxiety for AI ? I don’t follow all developments but it looks « weird » to me. Like AI could benefit from a psychologist or « psychology prompting » in its chain of thought like « don’t panic, you’re the best, you can do it » would have a positive outcome ? Pep talk for AI ?
scotty79 · 9m ago
It's not that it gets anxiety. It's just that responses it starts to produce are similar to responses of anxious person. Shaky, unsure, indecisive, chaotic.
byronic · 1h ago
You're 100% right. This kind of prompt just alters the dice probabilities within the word bag. The OP here is ridiculous (as in 'worthy of ridicule') by means of ascribing intent to sentences with a different rhetorical flavor _that was prompted by the person putting in a prompt_.

I am now fully of the opinion that LLM proponents should turn off their monitor to reveal the consciousness within the AI

frays · 1h ago
Useful tips -

"Expose where your thoughts are unsupported or could benefit from further information"

is great. Adding it to my personalization.

ForHackernews · 1h ago
This comment reads like anthropomorphism. I don't believe an LLM can feel anything, anxiety included.
scotty79 · 7m ago
It's just a shorthand. It's obviously not anxious. Just produces more anxious responses, like an anxious person would. It's no more anthropomorphizing than saying the computer is thinking when the hourglass is spinning.
natrius · 1h ago
You know, it doesn't really seem like a mistake for people to anthropomorphize the thinking machines.
ForHackernews · 56m ago
Attributing "thinking" is also a mistake. Anthropic have shown that the "thoughts" it produces to explain what it's "thinking" are (like all the rest of its output) just plausible text, unrelated to the actual node activation happening inside the model: https://transformer-circuits.pub/2025/attribution-graphs/bio...

These tools don't have the capacity for introspection, and they are not doing anything that really resembles the thinking done by a human or an animal.

freehorse · 1h ago
Does any of these things actually work? When I pre-prompt models like this it most often than not it ends up making uninteresting and mostly irrelevant points, very rarely will it come back with something "insightful". It will just put some doubt because it has to without sth meaningful to notice. Maybe with agents it does not matter as much if you do not mind burning tokens and time, because you are not gonna read most of the output wrt tool use anyway, but for a chat model I don't see much benefit? It is a bit like encouraging an LLM to write tests/asserts in the code, ime it ends up checking for the most superficial things.
qwertox · 4h ago
I had things like

* Do not use emotional reinforcement, not even robotic one (e.g., "Excellent," "Perfect," "Unfortunately", "Acknowledged").

* Do not use metaphors or hyperbole (e.g., "smoking gun," "major turning point").

in it. When they upgraded to GPT 5, Advanced Voice embedded into every answer things like:

---

Me: "I want to make a pizza tonight. What's the best way to start?"

It: "Sure, I'll stick to your guidelines and leave out the emotional reinforcements. If you want to make a pizza tonight, I'd first briefly explain how to go about it and why."

Me: "What should I drink with it?"

It: "To put it simply: Just drink whatever you like with it. Water, juice, or maybe a simple soft drink are classic options. It's really up to you, and I'll stick to your guidelines of saying it simply and without any frills. If you need anything else, just let me know.

Understood, I've noted that. I'll just stick to that and give you the information directly and without frills in the future. If you need anything else, just ask."

---

Absolutely useless. The next day my subscription expired, and when I talked with it again, it behaved like it used to prior to GPT 5: good.

I'm now big into Claude.

Also, before you ask: The chat about the pizza was just to test if it would behave that same way in every conversation.

baq · 3h ago
Do not use ‘do not’. Remember the memes about generating pictures without elephants and the elephants were hidden on pictures or tvs?

Invert your logic (‘be straight and to the point; concise’, ‘use balanced and dry wording’) instead, it might not be a definite solution, but you want to avoid triggering the neuron instead of negating its activation.

mh- · 2h ago
I see where you're coming from, but if you take a look at the system prompts for these models (some are public, some have partially or fully leaked), you'll see that is no longer a concern. At least not for the kind of models being discussed here.

That older generation of image diffusion models (e.g. Stable Diffusion) used text encoders like CLIP [0], which simply don't have the language understanding that even the smaller modern LLMs do.

Later image models moved on to using variants of T5 [1], sometimes in addition to CLIP variants (this is how FLUX.1 works).

The state of the art for open models in this regard (right now, likely out of date before I can finish formatting this comment..) is probably Qwen-Image [2] which uses Qwen2.5-VL [3]. That is a multimodal LLM with native vision capabilities in addition to text. It comes in a few sizes (up to 72 billion parameters), but the one commonly used with Qwen-Image is still 7b parameters.

[0]: https://openai.com/index/clip/

[1]: https://en.wikipedia.org/wiki/T5_(language_model)

[2]: https://arxiv.org/abs/2508.02324

[3]: https://arxiv.org/abs/2502.13923

bmurphy1976 · 2h ago
Is there something, a blog post, research paper, or other that you know of that explains why this is the case? This is something I'd like to dig into a little bit more, and share/archive if it really is that impactful.
mh- · 2h ago
I just replied to OP with an explanation and some links you might enjoy.
kingkawn · 3h ago
a mind we cannot physically intimidate forcing us to discover how to work well with others
anp · 3h ago
I chuckled and upvoted but I think it might be more subtle. It’s best to avoid negation with humans if possible too, but we are also way better at following negative examples than an LLM. I suspect it might have something to do with emotional responses and our general tendency towards loss aversion, traits these mind emulators currently seem to lack.
DrewADesign · 3h ago
I just can't stomach the idea that I have to ask my product nicely to do it's fucking job because OpenAI designed it not to. This is not a technology problem-- it's a product design problem.
zlies · 3h ago
I had "Use always two space tab size" because I was tired of long tab widths when code was returned. However, even when it wasn't about programming, I was reminded that the tab size would be two spaces ...
qwertox · 3h ago
Another rule was:

* Always use `-` instead of `–`, unless explicitly used by the user.

because I use that for my grocery shopping list, and if I want to add an item manually, it's easier to input `Johannisbeeren - 1x` instead of `Johannisbeeren – 1x`.

It resulted this

----

Me: "Tell me what's on TV tonight"

It: "I checked what's on TV tonight. For example, the spy comedy "Get Smart" [...]. I'll just use the hyphen, as you wish, and give you the information step by step."

----

Seriously?

brianwawok · 46m ago
Now you just gotta add a line to not tell you about avoiding usage of -
cj · 4h ago
Is Advanced Voice mode any better than it was a month or 2 ago?

I had to stop using it because with the “upgrade” a few months back, it felt like its IQ was slashed in half. Constantly giving short and half baked lazy answers.

NikolaNovak · 3h ago
So it's not just me!

I loved it in winter, I used it to learn interesting things on long drives :). Then sometime in the spring:

1. The voice got more human, in the sense it was more annoying - doing all the things I'm constantly coached against and that I coach my team against (ending sentences in question voice, umms and ahms, flat reading of bullet points, etc).

2. Answers got much much shorter and more superficial, and I'd need six follow ups before leaving frustrated.

I haven't used advanced voice last two months because of this :-(

mh- · 2h ago
I have no inside info, however, I would be shocked to find out that OpenAI does not have several knobs for load shedding across their consumer products.

Had I been responsible for implementing that, the very first thing I'd reach for is "effort". I'd dynamically remap what the various "reasoning effort" presets mean, and the thinking token budgets (where relevant).

The next thing I'd have looked to do is have smaller distillations of their flagship models - the ones used in their consumer apps - available to be served in their place.

One or both of these things being in place would explain every tweet about "why does [ChatGPT|Claude Code] feel so dumb right now?" If they haven't taken my approach, it's because they figured out something smarter. But that approach would necessarily still lead to this huge variability we all feel with using these products a lot.

(I want to reiterate I don't have any inside information, just drawing on a lot of experience building big systems with unpredictable load.)

cj · 45m ago
I sort of always assumed OpenAI was constantly training the next new model.

I wonder what percent of compute goes towards training vs. inference. If it’s a meaningful percent, you could possibly dial down training to make room for high inference load (if both use the same hardware).

I also wouldn’t be surprised if they’re overspending and throwing money at it to maximize the user experience. They’re still a high growth company that doesn’t want anything to slow it down. They’re not in “optimize everything for profit margin” mode yet.

DrewADesign · 3h ago
Yeah I was sold when I saw some very glitzy demos on YouTube but ditched it immediately. Useless, glib, sycophantic nonsense. It would be a great product if it did what it says it was supposed to do rather than just superficially appearing to do that unless you put in a shitload of effort mitigating their deliberate design decisions.
nickthegreek · 4h ago
It's still not as good, and way less useful to me as it was before the advanced voice rollout. I recently found a setting to disable, but haven't tried it out yet to see if it fixed any of the many issues I have with advanced voice.
lostmsu · 4h ago
Shameless self-plug: if you're on iPhone, try Roxy: https://apps.apple.com/app/apple-store/id6737482921?pt=12710...

You can connect and talk to any LLM you want (just switch in settings). I would suggest gemini-2.5-flash-lite for fast responses. API key for that can be obtained at https://aistudio.google.com/apikey

nickthegreek · 4h ago
Recent additions I found BURIED in the settings.

Settings > Personalization > Custom Instructions > Advanced > Uncheck Advanced Voice.

mh- · 3h ago
That disables GPT's ability to use that Tool altogether. Despite the confusing location, it doesn't have anything to do with whether it gets your Custom Instructions or not.
dmd · 1h ago
Just FYI, that specific bug (repeating your custom stuff back to you in voice mode) was fixed a few days later.
iammjm · 58m ago
No it wasn’t, it’s very much still there when using voice mode
dmd · 55m ago
Hmm. It's not happening to me any more - I just checked. And it definitely was before.
yard2010 · 3h ago
You are absolutely right.
N_Lens · 3h ago
OpenAI have moved towards enshittifying their core product. I also moved to Claude.
stcg · 4h ago
This sounds like how I think.

But for me, it often results in situations where I think much harder and longer than others but fail to act.

I learned to sometimes act instead of thinking more, because by acting I gain information I could not have learned by thinking.

Perhaps this human insight can be applied to working with LLMs. Perhaps not :)

searls · 4h ago
Yeah, I've been griping about LLM overconfidence for years, as somebody who is racked with self-doubt and second-guessing. On one hand, my own low opinion of myself made me a terrible mentor and manager, because having a similarly zero-trust policy towards my colleagues' work caused no end of friction (especially as a founder where people looked up to me for validation). On the other hand, i don't know very many top-tier practitioners that don't exhibit significantly more self-doubt than an off-the-shelf LLM.

Hence this blog post. I will say I've got a dozen similar tricks baked into my Claude config, but I'm not sure they've helped any.

mikepurvis · 3h ago
I relate to this a lot— I treat my colleagues' work with suspicion and distrust not because I don't trust them but because that's also my stance toward my own work, like "what is this BS? Is it absolutely necessary? Can it be half the length by leveraging a library or ignoring error handling in cases where a panic/crash is no worse than a controlled exit?"

I find working with copilot is just catnip to someone like this because it's endlessly willing to iterate and explore the problem space, certainly well past the point where a normal person would be like are you for real can we just merge this and move on.

lotyrin · 3h ago
I don't know how anyone can trust these things at all.

What I want: a balanced, nuanced, deep engagement with a topic I don't already have mastered that may need to challenge my intuition or require of me some effort to research and correct my conception.

When I ask it to explain something I already understand quite well, which there is no broad public consensus about or the public consensus is known to be based in misconception, I will tend to get the public consensus view that there's no clear answer or provided an answer based in the misconception.

When I make it at all clear about what my conception is, I'll get a confirmation or reinforcement of my conception.

If I play a bit as a character who believes the opposite of my conception, unless my conception has a very, very clear basis in public consensus, I will get a confirmation or reinforcement of the opposite of my conception.

Why should I trust them in fields I'm not an expert in, given this? They want to please and serve rather than challenge you or inform you. Even when you ask them to be blunt and factual they do the theater of those things and not the substance. Their basis in Human linguistic outcome dooms them to pretend to be human which means people-pleasing or social-consensus-finding goals over truth-finding goals.

pluc · 2h ago
What you are describing is human interactions with another human who is knowledgeable in that specific field. AKA the before-AI.
lotyrin · 2h ago
I think it's more the Before Internet, or at least before Eternal September.
pluc · 2h ago
Before AI people were still able to question results searches gave them and had the option to keep investigating. You can't really do that with AI, it encourages a question-answer format.
vo2maxer · 59m ago
Ironic. I’ve spent years with humans doing the same thing, only with more naps.
anp · 3h ago
Maybe I’m just falling victim to my own cognitive biases (and/or financial incentives as a Google employee), but I get the closest to that experience among the frontier labs’ chat interfaces when I chat with Gemini.
lotyrin · 2h ago
https://g.co/gemini/share/0a0a38963df3

Edit to add commentary: Not the worst, but it still wants to fall into one or another ideology or consequences-detached philosophy (favorable to people because they provide prescriptions without requiring expertise).

Edit: my "out-of-character" message here is deliberately also played up a bit (overly judgmental and emotionally charged) and I kinda think it should challenge me for that.

anp · 1h ago
This was an interesting read since it's unlike any conversation I've had with the current bots, I haven't done a lot of exploratory probing of conversations I wouldn't have otherwise had with a person.

That said, I'm a bit surprised it agreed with you about the persuasiveness of the final approach (and maybe there's the agreeability to counter my previous point?). I agree a consequentialist argument could be compelling in the abstract but in my experience many bigoted people who care about things like the NAP will have emotional responses to social compromise so extreme that it wouldn't be a good idea to challenge them directly on the consequences of their actions. Without having any prior relationship with someone I would maybe expect that I'd achieve more influence with them if I learn to speak their own priorities back to them before I gently challenge them via contrast rather than argument.

Terretta · 3h ago
In my use it feels as though this should not be done in advance or in the same prompt, even with reasoning models. It'd be better to make a "double check" MCP that calls your prompt, asks whether anything should be amended in that reply or use as is, amends if needed, then gives answer.

What you do not want to do is reposition your context into under-informed persona space, so leave the second-guessing out of the initial context. Instead use it as its own judge. (Doesn't have to be, but could also be an alt model.)

behnamoh · 1h ago
> And even when ChatGPT is nevertheless wrong, its penchant for extremely-long thinking times means I'm getting my money's worth in GPU time.

OpenAI most certainly considers #thinking_tokens when deciding your rate limit. If not, models could be exploited to keep thinking for a long time to waste resources.

throwaway713 · 4h ago
Ah, ChatGPT’s hidden INTP mode. We’ll finally get the right theory for ASI, but it will provide no clues on how to actually implement it in a timely manner.
anp · 2h ago
> you are a highly critical thinker and this is tempered by your self-doubt: you absolutely hate being wrong but you live in constant fear of it

Is there any work happening to model these kinds of emotional responses at a “lower level” than prompts?

I see work around like councils of experts and find myself wondering if some of those “experts” should actually be attempting to model things like survival pressure or the need to socially belong that many would normally consider to be non-rational behaviors.

CreRecombinase · 2h ago
I turned this on and for most "general" use cases I found it useful, I also observed a downward biased in a family of "quantatitative estimation" tasks, so just remember when you have this kind of stuff turned on (always beware of mutating global state!)
typpilol · 5h ago
This article is so sparce with any details it's basically useless.

Does telling the AI to "just be correct" essentially work? I have no idea after this article because there no details at all related to what changed the type of prompts etc

schneems · 5h ago
> there no details at all related to what changed the type of prompts etc

He gave you the exact text he added to his agents file. What else are you looking for?

beefnugs · 4h ago
This is absolutely infuriating for me to see: People keep posting shit like "do it right or i will kill a puppy, you can get bitcoin if you are right" then never any testing where they change one word here or there and compare what does and doesn't work vs the dumb shit they are saying
schneems · 4h ago
The article is stating what inputs they used and the output they observed. They stated they saw more tokens used and more time spent before returning an answer. That seems like a data point you can test. Which is maybe not the zoom level or exact content you’re looking for, but I don’t feel your criticism sticks here.

> testing where they change one word here or there and compare

You can be that person. You can write that post. Nothing is stopping you.

skybrian · 5h ago
I haven't tried this, but I can say that LLM's are very good at picking up patterns. If you tell it that it's wrong a few times in a row, it will learn the pattern and go into "I am repeatedly wrong" mode. Perhaps even making mistakes "on purpose" to continue the pattern.
Sharlin · 4h ago
Amusingly, humans are known to do exactly that. Including the "mistakes on purpose", even though they might not realize that they're doing it.
jampa · 4h ago
> Does telling the AI to "just be correct" essentially work?

This forces the LLM to use more "thinking" tokens, making the AI more likely to visualize any mistakes in the previous outputs. In most APIs, this can be configured manually, producing better results for complex problems, at the cost of time.

ajkjk · 4h ago
This is unrelated but occurred to me as I was reading it:

It would be really amusing or entertaining or inspiring or something to see what the best possible outcome from an LLM-style chat session would be. E.g. get a bunch of brilliant humans together, ask them ChatGPT-style questions, but use their collective resources to craft perfect responses to everything (I mean, slowly, over days or months, of course). LLMs are useful sometimes, sure, but sometimes it feels like people think their LLM is being really useful because they have low standards for how good the answer could be.

like... we're trying to make LLMs really good at doing human stuff. But humans are also really bad at what LLMs do: always having experts available as resources that give really good, directed, thoughtful answers to questions on any subject and come up with sophisticated plans to execute on goals. Businesses kinda do this but it's all twisted up; you never get to know if you're doing anything 'right' or in the best known way; mostly everyone is just making everything up as they go in the context of what they happen to know. It would be nice for once to see what the best possible human effort on a task or question would look like.

e.g. you ask it a math question and it teaches you the answer perfectly and describes it in the most lucid possible way. Maybe it's a subject no one really teaches right and the answer involves reformulating it in a new presentation that is easier to understand, and it expresses itself with just the right amount of confidence, nothing bizarre. Then you ask it for advice and it gives really, really good human-to-human interpersonal advice, takes into account everything you say and really sees you as a person and understand what you're going through but also has an eye towards pushing you to be better instead of just validating what you think you need. Then you ask it to make a website and what you get is a brilliant piece of original software plus it factors out some of the functionality into well-documented, well-tested open source software with a good plan for improvement. Then you ask it to fix local politics in your city and it realizes it can't be fixed without running for office so it puts together a team and a platform and finds a good candidate and starts a campaign...

searls · 4h ago
I had a very similar thought (which I removed from the final draft)

Yesterday without searching this prompt was able to speculate that my query about static vars on Swift actors was a sign of an underlying compiler bug.

Turns out, it WAS a compiler bug and fixed back in February. I have never found a compiler bug and I'm a Swift noob, but I was pretty impressed. (It's what led me to write this post) https://github.com/swiftlang/swift/issues/78435

creatonez · 4h ago
All these "constantly challenge your own assumptions" prompts really do in practice is make it second guess assumptions that actually are obvious and true, which pollutes the output further.

In a sense, it bikesheds[1] itself. It's been told it needs more discussion/debate to solve a problem, but it's only smart enough to tackle minor details that might not matter. And when it does try to tackle major details, it screws them up, causing a cascade of compounding bullshit. And it also tends to cause it to roleplay someone who constantly bungles things, which is bad because it takes it away from its default RLHF tuning for factually accurate outputs on first shot (not that it lives up to this goal, but that's how they've tried to train it).

Getting this right seems to be a very tricky problem.

1: https://bikeshed.com/

Mallowram · 5h ago
Using the arbitrary to confuse the wetware as to where arbitrary settings reside (in arbitrariness). Note to self- save money for the analog NI bots around the corner (that compose specifics, and then resort to symbols/metaphors for output).

No comments yet

processing · 3h ago
u: what do you think to this idea?

gpt: yeah you need to do it now

u: actually I think it's a bad idea

gpt: yes, you're right and here's why

u: no, actually it's genius.

gpt: you're absolutely right - it's genius and here's why

bongodongobob · 2h ago
Stop asking such open ended questions. What does "what do you think" even mean?

You need to give it some direction. Is this safe, does this follow best practices, is it efficient in terms of memory, etc.

You have to put in a little fucking effort man, it's not magic.

danielbln · 2h ago
I believe OP's point is that the LLM rarely pushes back, whatever you tell it it will go with.
h4ch1 · 1h ago
I've faced this a lot while working with LLMs in order to develop an architecturally sound spec; especially when I am learning a new language/framework where I am not intimately aware of what the best practices in that framework are. Are there some methods you're aware of to make LLMs less subservient and more confident in their approach?
angryhermit · 2h ago
I've been experimenting with explicit directions/instructions in Claude AI. I have no real evidence that the responses are any better than without these instructions but anecdotally, I haven't been having the "No that's wrong. Sorry you're quite right, here's the correct info. No wrong again." and so on conversation half as much as without them.

FWIW here's what I use to make the thing more bearable: "After forming a response, check it again and if any contradictions or errors found, fix the issues and then check that answer again and so on. Also I'd prefer you to search the internet rather than tell me to and how to. When you find contradictory information in search results, do additional searches to resolve the contradiction before responding. But also, try to use logic and reasoning to clearly state what you think it should be. If you make any factual claims, double-check them against the search results before sending. When checking your response, specifically look for logical contradictions between different parts of your answer. Don't keep saying phrases in praise of my intelligence. Things like "you nailed it!" and "that's actually very perceptive of you". What the kids call 'glazing'; yeah don't do that"

esafak · 5h ago
My 'trick' is to use one model to check another. I'll start by saying I'm skeptical of the answer and ask it to state its reasoning.

It's the same as asking a person to double check; it works because the models know different things. The next step would be to use a lightweight model to automate the ensembling...

schneems · 4h ago
The problem I have with this approach is that it seems models get confused with prescriptive information such as print “it worked” or a comment stating the desired intent of code (which it doesn’t do at all). As opposed to descriptive information: generating priors and comparing them to actual output to yield either confirmation/refutation or surprise.

I think double checking is better than not, but without the ability to really “know” reason it feels a bit like adding one more hull layer to the titanic in an effort to make it unsinkable.

baq · 3h ago
Inmates are running the asylum, but the twist is they second guess each other. Brilliant! ;)
koreth1 · 4h ago
> I'll start by saying I'm skeptical of the answer and ask it to state its reasoning.

How do you tell if it's actually stating the reasoning that got it to its answer originally, as opposed to constructing a plausible-sounding explanation after the fact? Or is the goal just to see if it detects mistakes, rather than to actually get it to explain how it arrived at the answer?

esafak · 3h ago
The act of making it state its reasoning can help it uncover mistakes. Note that I'm asking a second model to do this; not the original one, otherwise I would not expect a different result.
mh- · 3h ago
I would totally expect a different result even on the same model. Especially if you're doing this via a chat interface (vs API) where you can't control the temperature parameters.

But yes, it'll be more effective on a different model.

quinnjh · 4h ago
I’ve been utilizing this too but I’m not sure it gets closer to truth, rather gives me more stuff to skim over and decide for myself.

Do you find you generally get a well reasoned outcome or do you also find the model stretching to come up with a take that aligns with your skepticism?

sublinear · 1h ago
I think the people still trying to "optimize" their prompts are the ones that need some self-doubt.
schneems · 5h ago
If you’re a podcast person I recommend Searl’s “Breaking Change” it’s kinda him shooting the shit with himself about whatever, but I find it entertaining and informative.

To the topic at hand: I’ve not tried it yet, but I wish I could get my agent to frame everything in terms of the scientific method: state a hypothesis, brainstorm a plausible way to test it, run the test to validate itself. This is more or less TDD that I try to do with the agent, but it has no concept of that so I spent a lot of time saying “actually you can make that code change and see if it worked or not instead of just guessing that it solves the problem” it’s like an over eager knowitall junior that moves way too fast way to confidently. I need it to slow down and prove it’s not bullshitting me.

atoav · 3h ago
I instructed ChatGPT to tell me things as they are without fluff.

Since version 5 it constantly starts with a line like "Let's address this without fluff!" or even "I am giving it to you straight!".

It keeps constantly talking about how it behaves like I instructed it to behave instead of actually cutting the crap and giving me the raw facts.

ljsprague · 3h ago
>you absolutely hate being wrong but you live in constant fear of it

Is this a typo?