I don't buy the arguments made here. You can't call it attention optimized without opening the LLM brain and assessing what happens in the attention layers. Does the quoted paper did some of that? I know Anthropic is advanced in this area but I haven't seen that many results elsewhere yet. I mean the fact that optimizing for attention makes better prompt is a solid hypothesis, but I don't read a proof here.
jdefr89 · 44m ago
Yea they sound like plausible enough arguments.. I often test my vague input against structured ones and for many tasks it didn't seem like hallucinations happened significantly less. Honestly the more structured you have to be, the less "general" your model probably is, and in theory you want your model to be as general as possible. Seems like here, you're simply helping it over-fit.. At least that's what my intuition tells me but I am yet to really check that out either.
ozgung · 48m ago
I think the author is confusing the attention mechanism with attention in the general sense. He refers to a phenomenon called “attention drift”, but neither Google nor LLM searches return any relevant references to that term. In fact, ChatGPT only cited this same blog post.
eric-burel · 2m ago
I think LLM vendors papers entertain such confusions, the lack of peer reviewing tends to favour grandiose but barely defined wording as such. At least author seems to have checked a few papers that do some empirical validation (which is still different from establishing any kind of theory of prompt engineering, there is no other way than cutting the model up to do that).
energy123 · 15m ago
I can vouch for this prompting best practice. It leads to better results and better instruction following, whatever the cause.
gashmol · 1h ago
Aside -
It's funny to me how many developers still don't like to call their craft engineering and how fast LLM users jumped on the opportunity.
rockostrich · 50m ago
Not sure if this is a part of it, but there are places where "engineer" is a protected title that requires a degree or license. I work with a bunch of folks in Quebec and they have to use the title "software developer" unless they are a member of the Order of Engineers. I find this to be pretty silly considering someone can have a degree in Mechanical Engineering and use the title "Software Engineer" but someone with a degree in Computer Science can't.
NitpickLawyer · 56m ago
TBF I always read prompt engineering as the 2nd definition, not the first.
> 1. the branch of science and technology concerned with the design, building, and use of engines, machines, and structures.
2.
the action of working artfully to bring something about.
So you're trying to learn what / how to prompt, in order to "bring something about" (the results you're after).
whoamii · 1h ago
What’s the problem?
-A Prompt Architect
nateroling · 2h ago
Can you write a prompt to optimize prompts?
Seems like an LLM should be able to judge a prompt, and collaboratively work with the user to improve it if necessary.
alexc05 · 1h ago
100% yes! There've been some other writers who've been doing parallel work around that in the last couple weeks.
The LLM is basically a runtime that needs optimized input bc the output is compute bottlenecked. Input quality scales with domain knowledge, specificity and therefore human time input. You can absolutely navigate an LLMs attention piecemeal around a spec until you build an optimized input.
chopete3 · 2h ago
I use Grok to write the prompts. Its excellent. I think human created prompts are insufficient in almost all cases.
Write your prompt in some shape and ask grok
Please rewrite this prompt for higher accuracy
--
Your prompt
AlecSchueler · 1h ago
Wouldn't you be better doing it with almost anything other than Grok?
How do you know it won't introduce misinformation about white genocide into your prompt?
chopete3 · 1h ago
Any llm is fine. Grok had the latest model available for free use. Thats all. Gpt5 produces similar result too.
We have to read the result of course.
skort · 1h ago
If someone keeps choosing to use the "mechahitler chatbot" at this point, I don't think they care about what misinformation goes into their prompt.
CuriouslyC · 1h ago
This is pretty much DSPy.
slt2021 · 33m ago
yes, just prepend your request to llm with "Please give me a well-structured LLM prompt that will solve this problem..."
cobbzilla · 2h ago
This article has some solid general advice on prompt-writing for anyone, even though the examples are technical.
I found the “Big-O” analogy a bit strained & distracting, but still a good read.
alexc05 · 2h ago
I'll admit that it's a little shoehorned in at the end :)
... but you know how editors are with writing the headline for clicks against the wishes of the journalist writing the article. You'll always see Journos sayign stuff like "don't blame me, that's my editor, I don't write the headlines"
I did toy with the idea of going with something like:
`Prompt Engineering is a wrapper around Attention.`
But my editor overruled me *FOR THE CLICKS!!!*
Full disclosure: I'm also the editor
kstenerud · 49m ago
TBH I almost passed this article by because of the big-o reference - it seemed like such a strange thing to say.
I'm glad I didn't, though, because I had no idea how the LLM is actually interpreting my commands, and have been frustrated as a result.
Maybe a title like "How to write LLM prompts with the greatest impact", or "Why your LLM is misinterpreting what you say", or something along those lines.
drfrankensearch · 2h ago
There’s a repeated paragraph in it atm you may want to fix:
“This is why prompt structure matters so much. The model isn’t reading your words like you do — it’s calculating relationships between all of them simultaneously. Where you place information, how you group concepts, and what comes first or last all influence which relationships get stronger weights.
This is why prompt structure matters so much. The model isn’t reading your words like you do — it’s calculating relationships between all of them simultaneously”
Reprimand the editor. ;)
I look forward to using the ideas in this, but would be much more excited if you could benchmark these concepts somehow, and provide regular updates about how to optimize.
alexc05 · 2h ago
thank you so much!!
Because medium is such a squirrely interface I find myself writing in markdown in vscode then copying and pasting sections across. If I make an edit after I've stared inserting images and embedding the gists it gets a bit manual.
Your comment in addition to another one about finding a way to compare the outputs of the good/bad prompts side by side - 100% agree. This could be more robust.
While I am running a process transformation against production teams in small isolated experimental groups, I can say I'm getting really great feedback so far.
Both with the proprietary stuff happening in the job, and with the feedback I'm getting back from the engineers I've shared this with in the wider industry.
> @Alex Chesser i've started using some of your approach, in particular having the agent write out a plan of stacked diffs, and then having separate processes to actually write the diffs, and it's a marked improvement to my workflow. Usually the agent gets wonky after the context window fills up, and having the written plan of self contained diffs helps a lot with 'checkpoints' so I can just restart at any time! Thanks!
from someone else:
> I just went through your first two prompts and I'm blown away. I haven't done much vibe coding yet as I've gotten initial poor results and don't trust the agent to do what I want. But the output for the architecture and the prompts are mind blowing. This tutorial is giving me the confidence to experiment more.
benchmarking feedback vs. qualitative devex feedback is definitely a thing though.
editor's note: title also chosen for the clicks.
mavilia · 2h ago
This was a great refresher on things I’ve seen writings of but never thought deeply about. A lot of it already “made sense” yet in my day to day I’m still doing the bad versions of the prompts.
Do you have preference n a more continual system like Claude Code for one big prompt or just trying to do one task and starting something new?
CuriouslyC · 2h ago
Having a lot of context in the main agent is an antipattern. You want the main agent to call subagents to keep high level reasoning context clean. If you're not using subagents you should start a new chat any time you're doing something fairly different from what you were doing before.
esafak · 2h ago
Could you share an example with results so we can see what difference it made?
alexc05 · 2h ago
I can't republish anything that happens in a production/proprietary environment.
One of the things that I think is pretty great about being able to share these particular prompts is that you can run this on one of your own repos to see how it turns out.
ACTUALLY!! Hold on. A couple weekends ago I spent some time doing some underlying research with huggingface/transformers and I have it on a branch.
You can look at the results of an architectural research prompt.
Unfortunately I don't have a "good mode" side by side with a "bad mode" at the moment. I can work on that in the future.
The underlying research linked has the experimental design version of this with each piece evaluated in isolation.
cobbzilla · 2h ago
oh my please read TFA it has exactly the answers you seek.
cobbzilla · 26m ago
I was wrong. I just read the thing and could swear I saw results. Sorry for being dismissive, it’s me who is the idiot here.
jdefr89 · 37m ago
He wants to see a comparison and the article shows no out put comparison. It shows two inputs, says one gives more accurate output than the other without showing us...
alexc05 · 2h ago
I don't know, I thought it was a fair question :shrug: :)
tanvach · 2h ago
Article doesn’t seem to include results. Only the prompts.
lubujackson · 1h ago
To add to these good points - for bigger changes, don't just have LLMs restructure your prompt, but break it down into a TODO list, a summary and/or build a scaffolded result then continue from that. LLMs thrive in structure, and the more architectural you can make both your inputs and outputs, the more consistent your results will be.
For example, pass an LLM a JSON structure with keys but no values and it tends to do a much better job populating the values than trying to fully generate complex data from a prompt alone. Then you can take that populated JSON to do something even more complex in a second prompt.
boredtofears · 1h ago
Does Claude code do this by default?
It seems for most prompts that I give it, it ends up breaking things into TODO lists and reformulating the prompt. This seems to work well for most tasks.
jwilber · 59m ago
Nowadays, basically no architecture with an API is using standard attention anymore. There are all kinds of attention alternatives (e.g. Hyena) and tricks (e.g. Sliding Window, etc.) that make this analogy, as presented, flat out incorrect.
In addition, for the technical aspect to make sense, a more effective article would place the points should be shown alongside evals. For example, if you're trying to make a point about where to put important context in the prompt, show a classic needle-in-the-haystack eval, or a jacobian matrix, alongside the results. Otherwise it's largely more prompt fluff.
AlecSchueler · 50m ago
> There are all kinds of attention alternatives (e.g. Hyena) and tricks (e.g. Sliding Window, etc.) that make this analogy, as presented, flat out incorrect.
Not to doubt you but could you explain why so?
slt2021 · 31m ago
also LLM providers optimize your request routing to a cheaper model sometimes, I wonder if there is a way to structure prompt that will route to a large (more expensive, and arguably better) model for better results
> 1. the branch of science and technology concerned with the design, building, and use of engines, machines, and structures.
2. the action of working artfully to bring something about.
So you're trying to learn what / how to prompt, in order to "bring something about" (the results you're after).
-A Prompt Architect
Seems like an LLM should be able to judge a prompt, and collaboratively work with the user to improve it if necessary.
https://www.dbreunig.com/2025/06/10/let-the-model-write-the-... is an example.
You can see the hands on results in this hugging face branch I was messing around in:
here is where I tell the LLM to generate prompts for me based on research so far
https://github.com/AlexChesser/transformers/blob/personal/vi...
here is the prompts that produced:
https://github.com/AlexChesser/transformers/tree/personal/vi...
and here is the result of those prompts:
https://github.com/AlexChesser/transformers/tree/personal/vi.... (also look at the diagram folders etc..)
Write your prompt in some shape and ask grok
Please rewrite this prompt for higher accuracy
-- Your prompt
How do you know it won't introduce misinformation about white genocide into your prompt?
We have to read the result of course.
I found the “Big-O” analogy a bit strained & distracting, but still a good read.
... but you know how editors are with writing the headline for clicks against the wishes of the journalist writing the article. You'll always see Journos sayign stuff like "don't blame me, that's my editor, I don't write the headlines"
I did toy with the idea of going with something like: `Prompt Engineering is a wrapper around Attention.`
But my editor overruled me *FOR THE CLICKS!!!*
Full disclosure: I'm also the editor
I'm glad I didn't, though, because I had no idea how the LLM is actually interpreting my commands, and have been frustrated as a result.
Maybe a title like "How to write LLM prompts with the greatest impact", or "Why your LLM is misinterpreting what you say", or something along those lines.
“This is why prompt structure matters so much. The model isn’t reading your words like you do — it’s calculating relationships between all of them simultaneously. Where you place information, how you group concepts, and what comes first or last all influence which relationships get stronger weights. This is why prompt structure matters so much. The model isn’t reading your words like you do — it’s calculating relationships between all of them simultaneously”
Reprimand the editor. ;)
I look forward to using the ideas in this, but would be much more excited if you could benchmark these concepts somehow, and provide regular updates about how to optimize.
Because medium is such a squirrely interface I find myself writing in markdown in vscode then copying and pasting sections across. If I make an edit after I've stared inserting images and embedding the gists it gets a bit manual.
Your comment in addition to another one about finding a way to compare the outputs of the good/bad prompts side by side - 100% agree. This could be more robust.
While I am running a process transformation against production teams in small isolated experimental groups, I can say I'm getting really great feedback so far.
Both with the proprietary stuff happening in the job, and with the feedback I'm getting back from the engineers I've shared this with in the wider industry.
Feedback from colleages who have started taking "selected pieces" from the "vibe engineering" flow (https://alexchesser.medium.com/vibe-engineering-a-field-manu...) has been really positive.
> @Alex Chesser i've started using some of your approach, in particular having the agent write out a plan of stacked diffs, and then having separate processes to actually write the diffs, and it's a marked improvement to my workflow. Usually the agent gets wonky after the context window fills up, and having the written plan of self contained diffs helps a lot with 'checkpoints' so I can just restart at any time! Thanks!
from someone else:
> I just went through your first two prompts and I'm blown away. I haven't done much vibe coding yet as I've gotten initial poor results and don't trust the agent to do what I want. But the output for the architecture and the prompts are mind blowing. This tutorial is giving me the confidence to experiment more.
benchmarking feedback vs. qualitative devex feedback is definitely a thing though.
editor's note: title also chosen for the clicks.
Do you have preference n a more continual system like Claude Code for one big prompt or just trying to do one task and starting something new?
One of the things that I think is pretty great about being able to share these particular prompts is that you can run this on one of your own repos to see how it turns out.
ACTUALLY!! Hold on. A couple weekends ago I spent some time doing some underlying research with huggingface/transformers and I have it on a branch.
https://github.com/AlexChesser/transformers/tree/personal/vi...
You can look at the results of an architectural research prompt.
Unfortunately I don't have a "good mode" side by side with a "bad mode" at the moment. I can work on that in the future.
The underlying research linked has the experimental design version of this with each piece evaluated in isolation.
For example, pass an LLM a JSON structure with keys but no values and it tends to do a much better job populating the values than trying to fully generate complex data from a prompt alone. Then you can take that populated JSON to do something even more complex in a second prompt.
In addition, for the technical aspect to make sense, a more effective article would place the points should be shown alongside evals. For example, if you're trying to make a point about where to put important context in the prompt, show a classic needle-in-the-haystack eval, or a jacobian matrix, alongside the results. Otherwise it's largely more prompt fluff.
Not to doubt you but could you explain why so?