Show HN: BloomPilot – AI-Powered Overlay for Bloomberg Terminal (prestigious-albatross-928.convex.app)

Drew Breunig has been doing some fantastic writing on this subject - coincidentally at the same time as the "context engineering" buzzword appeared but actually unrelated to that meme.

How Long Contexts Fail - https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-ho... - talks about the various ways in which longer contexts can start causing problems (also known as "context rot")

How to Fix Your Context - https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.... - gives names to a bunch of techniques for working around these problems including Tool Loadout, Context Quarantine, Context Pruning, Context Summarization, and Context Offloading.

old_man_cato · 2h ago

First, you pay a human artist to draw a pelican on a bicycle.

Then, you provide that as "context".

Next, you prompt the model.

Voila!

_carbyau_ · 1h ago

How to draw an owl.

1. Draw some circles.

2. Prompt an AI to draw the rest of the fucking owl.

NomDePlum · 46m ago

And if you want 2 owls?

TrainedMonkey · 32m ago

Hire a context engineer to define the task of drawing an owl as drawing two owls.

jknoepfler · 1h ago

Oh, and don't forget to retain the artist to correct the ever-increasingly weird and expensive mistakes made by the context when you need to draw newer, fancier pelicans. Maybe we can just train product to draw?

d0gsg0w00f · 1h ago

This hits too close to home.

the_mitsuhiko · 2h ago

Drew Breunig's posts are a must read on this. This is not only important for writing your own agents, it is also critical when using agentic coding right now. These limitations/behaviors will be with us for a while.

outofpaper · 2h ago

They might be good reads on the topic but Drew makes some significant etymological mistakes. For example loadout doesn't come from gaming but military terminology. It's essentially the same as kit or gear.

simonw · 2h ago

Drew isn't using that term in a military context, he's using it in a gaming context. He defines what he means very clearly:

> The term “loadout” is a gaming term that refers to the specific combination of abilities, weapons, and equipment you select before a level, match, or round.

In the military you don't select your abilities before entering a level.

GuinansEyebrows · 57m ago

i think that software engineers using this terminology might be envisioning themselves as generals, not infantry :)

DiggyJohnson · 2h ago

This seems like a rather unimportant type of mistake, especially because the definition is still accurate, it’s just the etymology isn’t complete.

scubbo · 38m ago

It _is_ a gaming term - it is also a military term (from which the gaming term arose).

ZYbCRq22HbJ2y7 · 2h ago

> They might be good reads on the topic but Drew makes some significant etymological mistakes. For example loadout doesn't come from gaming but military terminology. It's essentially the same as kit or gear.

Doesn't seem that significant?

Not to say those blog posts say anything much anyway that any "prompt engineer" (someone who uses LLMs frequently) doesn't already know, but maybe it is useful to some at such an early stage of these things.

JoeOfTexas · 2h ago

So who will develop the first Logic Core that automates the context engineer.

igravious · 1h ago

The first rule of automation: that which can be automated will be automated.

Observation: this isn't anything that can't be automated /

risyachka · 1h ago

“A month-long skill” after which it won’t be a thing anymore, like so many other.

simonw · 1h ago

Most of the LLM prompting skills I figured out ~three years ago are still useful to me today. Even the ones that I've dropped are useful because I know that things that used to be helpful aren't helpful any more, which helps me build an intuition for how the models have improved over time.

dbreunig · 1h ago

While researching the above posts Simon linked, I was struck by how many of these techniques came from the pre-ChatGPT era. NLP researchers have been dealing with this for awhile.

refulgentis · 1h ago

I agree with you, but would echo OP's concern, in a way that makes me feel like a party pooper, but, is open about what I see us all expressing squeamish-ness about.

It is somewhat bothersome to have another buzz phrase. I don't why we are doing this, other than there was a Xeet from the Shopify CEO, QT'd approvingly by Karpathy, then its written up at length, and tied to another set of blog posts.

To wit, it went from "buzzphrase" to "skill that'll probably be useful in 3 years still" over the course of this thread.

Has it even been a week since the original tweet?

There doesn't seem to be a strong foundation here, but due to the reach potential of the names involved, and their insistence on this being a thing while also indicating they're sheepish it is a thing, it will now be a thing.

Smacks of a self-aware version of Jared Friedman's tweet re: watching the invention of "Founder Mode" was like a startup version of the Potsdam Conference. (which sorted out Earth post-WWII. and he was not kidding. I could not even remember the phrase for the life of me. Lasted maybe 3 months?)

dbreunig · 29m ago

Sometimes buzzwords turn out to be mirages that disappear in a few weeks, but often they stick around.

I find they takeoff when someone crystallizes something many people are thinking about internally, and don’t realize everyone else is having similar thoughts. In this example, I think the way agent and app builders are wrestling with LLMs is fundamentally different than chatbots users (it’s closer to programming), and this phrase resonates with that crowd.

Here’s an earlier write up on buzzwords: https://www.dbreunig.com/2020/02/28/how-to-build-a-buzzword....

refulgentis · 26m ago

I agree - what distinguishes this is how rushed and self-aware it is. It is being pushed top down, sheepishly.

EDIT: Ah, you also wrote the blog posts tied to this. It gives 0 comfort that you have a blog post re: building buzz phrases in 2020, rather, it enhances the awkward inorganic rush people are self-aware of.

simonw · 1h ago

The way I see it we're trying to rebrand because the term "prompt engineering" got redefined to mean "typing prompts full of stupid hacks about things like tipping and dead grandmas into a chatbot".

joe5150 · 17m ago

It helps that the rebrand may lead some people to believe that there are actually new and better inputs into the system rather than just more elaborate sandcastles built in someone else's sandbox.

orbital-decay · 48m ago

Many people figured it out two-three years ago when AI-assisted coding basically wasn't a thing, and it's still relevant and will stay relevant. These are fundamental principles, all big models work similarly, not just transformers and not just LLMs.

However, many fundamental phenomena are missing from the "context engineering" scope, so neither context engineering nor prompt engineering are useful terms.

storus · 2h ago

Those issues are considered artifacts of the current crop of LLMs in academic circles; there is already research allowing LLMs to use millions of different tools at the same time, and stable long contexts, likely reducing the amount of agents to one for most use cases outside interfacing different providers.

Anyone basing their future agentic systems on current LLMs would likely face LangChain fate - built for GPT-3, made obsolete by GPT-3.5.

simonw · 2h ago

Can you link to the research on millions of different terms and stable long contexts? I haven't come across that yet.

storus · 2h ago

You can look at AnyTool, 2024 (16,000 tools) and start looking at newer research from there.

https://arxiv.org/abs/2402.04253

For long contexts start with activation beacons and RoPE scaling.

simonw · 2h ago

I would classify AnyTool as a context engineering trick. It's using GPT-4 function calls (what we would call tool calls today) to find the best tools for the current job based on a 3-level hierarchy search.

Drew calls that one "Tool Loadout" https://www.dbreunig.com/2025/06/26/how-to-fix-your-context....

nyrikki · 52m ago

Thanks for the link. It finally explained why I was getting hit up by recruiters for a job that was for a data broker looking to do what seemed like silly uses.

Cloud API recommender systems must seem like a gift to that industry.

Not my area anyways but I couldn't see a profit model for a human search for an API when what they wanted is well covered by most core libraries in Python etc...

ZYbCRq22HbJ2y7 · 2h ago

How would "a million different tool calls at the same time" work? For instance, MCP is HTTP based, even at low latency in incredibly parallel environments that would take forever.

Jarwain · 9m ago

MCPs aren't the only way to embed tool calls into an LLM

dinvlad · 1h ago

> already research allowing LLMs to use millions of different tools

Hmm first time hearing about this, could you share any examples please?

simonw · 1h ago

See this comment https://news.ycombinator.com/item?id=44428548

Foreignborn · 2h ago

yes, but those aren’t released and even then you’ll always need glue code.

you just need to knowingly resource what glue code is needed, and build it in a way it can scale with whatever new limits that upgraded models give you.

i can’t imagine a world where people aren’t building products that try to overcome the limitations of SOTA models

storus · 2h ago

My point is that newer models will have those baked in, so instead of supporting ~30 tools before falling apart they will reliably support 10,000 tools defined in their context. That alone would dramatically change the need for more than one agent in most cases as the architectural split into multiple agents is often driven by the inability to reliably run many tools within a single agent. Now you can hack around it today by turning tools on/off depending on the agent's state but at some point in the future you might afford not to bother and just dump all your tools to a long stable context, maybe cache it for performance, and that will be it.

ZYbCRq22HbJ2y7 · 1h ago

There will likely be custom, large, and expensive models at an enterprise level in the near future (some large entities and governments already have them (niprgpt)).

With that in mind, what would be the business sense in siloing a single "Agent" instead of using something like a service discovery service that all benefit from?

storus · 1h ago

My guess is the main issue is latency and accuracy; a single agent without all the routing/evaluation sub-agents around it that introduce cumulative errors, lead to infinite loops and slow it down would likely be much faster, accurate and could be cached at the token level on a GPU, reducing token preprocessing time further. Now different companies would run different "monorepo" agents and those would need something like MCP to talk to each other at the business boundary, but internally all this won't be necessary.

Also the current LLMs have still too many issues because they are autoregressive and heavily biased towards the first few generated tokens. They also still don't have full bidirectional awareness of certain relationships due to how they are masked during the training. Discrete diffusion looks interesting but I am not sure how does that one deal with tools as I've never seen a model from that class using any tools.

zaptheimpaler · 17m ago

I feel like this is incredibly obvious to anyone who's ever used an LLM or has any concept of how they work. It was equally obvious before this that the "skill" of prompt-engineering was a bunch of hacks that would quickly cease to matter. Basically they have the raw intelligence, you now have to give them the ability to get input and the ability to take actions as output and there's a lot of plumbing to make that happen.

JohnMakin · 2h ago

> Building powerful and reliable AI Agents is becoming less about finding a magic prompt or model updates.

Ok, I can buy this

> It is about the engineering of context and providing the right information and tools, in the right format, at the right time.

when the "right" format and "right" time are essentially, and maybe even necessarily, undefined, then aren't you still reaching for a "magic" solution?

If the definition of "right" information is "information which results in a sufficiently accurate answer from a language model" then I fail to see how you are doing anything fundamentally differently than prompt engineering. Since these are non-deterministic machines, I fail to see any reliable heuristic that is fundamentally indistinguishable than "trying and seeing" with prompts.

mentalgear · 2h ago

It's magical thinking all the way down. Whether they call it now "prompt" or "context" engineering because it's the same tinkering to find something that "sticks" in non-deterministic space.

andy99 · 1h ago

It's called over-fitting, that's basically what prompt engineering is.

dinvlad · 1h ago

> when the "right" format and "right" time are essentially, and maybe even necessarily, undefined, then aren't you still reaching for a "magic" solution?

Exactly the problem with all "knowing how to use AI correctly" advice out there rn. Shamans with drums, at the end of the day :-)

edwardbernays · 2h ago

The state of the art theoretical frameworks typically separates these into two distinct exploratory and discovery phases. The first phase, which is exploratory, is best conceptualized as utilizing an atmospheric dispersion device. An easily identifiable marker material, usually a variety of feces, is metaphorically introduced at high velocity. The discovery phase is then conceptualized as analyzing the dispersal patterns of the exploratory phase. These two phases are best summarized, respectively, as "Fuck Around" followed by "Find Out."

FridgeSeal · 31m ago

It’s just AI people moving the goalposts now that everyone has realised that “prompt engineering” isn’t a special skill.

benreesman · 56m ago

The new skill is programming, same as the old skill. To the extent these things are comprehensible, you understand them by writing programs: programs that train them, programs that run inferenve, programs that analyze their behavior. You get the most out of LLMs by knowing how they work in detail.

I had one view of what these things were and how they work, and a bunch of outcomes attached to that. And then I spent a bunch of time training language models in various ways and doing other related upstream and downstream work, and I had a different set of beliefs and outcomes attached to it. The second set of outcomes is much preferable.

I know people really want there to be some different answer, but it remains the case that mastering a programming tool involves implemtenting such, to one degree or another. I've only done medium sophistication ML programming, and my understand is therefore kinda medium, but like compilers, even doing a medium one is the difference between getting good results from a high complexity one and guessing.

Go train an LLM! How do you think Karpathy figured it out? The answer is on his blog!

pyman · 42m ago

Saying the best way to understand LLMs is by building one is like saying the best way to understand compilers is by writing one. Technically true, but most people aren't interested in going that deep.

benreesman · 32m ago

I don't know, I've heard that meme too but it doesn't track with the number of cool compiler projects on GitHub or that frontpage HN, and while the LLM thing is a lot newer, you see a ton of useful/interesting stuff at the "an individual could do this on their weekends and it would mean they fundamentally know how all the pieces fit together" type stuff.

There will always be a crowd that wants the "master XYZ in 72 hours with this ONE NEAT TRICK" course, and there will always be a..., uh, group of people serving that market need.

But most people? Especially in a place like HN? I think most people know that getting buff involves going to the gym, especially in a place like this. I have a pretty high opinion of the typical person. We're all tempted by the "most people are stupid" meme, but that's because bad interactions are memorable, not because most people are stupid or lazy or whatever. Most people are very smart if they apply themselves, and most people will work very hard if the reward for doing so is reasonably clear.

https://www.youtube.com/shorts/IQmOGlbdn8g

bgwalter · 1h ago

These discussions increasingly remind me of gamers discussing various strategies in WoW or similar. Purportedly working strategies found by trial and error and discussed in a language that is only intelligible to the in-group (because no one else is interested).

We are entering a new era of gamification of programming, where the power users force their imaginary strategies on innocent people by selling them to the equally clueless and gaming-addicted management.

coderatlarge · 58m ago

i tend to share your view. but then your comment describes a lot of previous cycles of enterprise software selling. it’s just that this time is reaching a little uncomfortably into the builder’s /developer’s traditional areas of influence/control/workflow. how devs feel now is probably how others (ex csr, qa, sre) felt in the past when their managers pushed whatever tooling/practice was becoming popular or sine qua non in previous “waves”.

baxtr · 3h ago

>Conclusion

Building powerful and reliable AI Agents is becoming less about finding a magic prompt or model updates. It is about the engineering of context and providing the right information and tools, in the right format, at the right time. It’s a cross-functional challenge that involves understanding your business use case, defining your outputs, and structuring all the necessary information so that an LLM can “accomplish the task."

That’s actually also true for humans: the more context (aka right info at the right time) you provide the better for solving tasks.

root_axis · 2h ago

I am not a fan of this banal trend of superficially comparing aspects of machine learning to humans. It doesn't provide any insight and is hardly ever accurate.

furyofantares · 1h ago

I've seen a lot of cases where, if you look at the context you're giving the model and imagine giving it to a human (just not yourself or your coworker, someone who doesn't already know what you're trying to achieve - think mechanical turk), the human would be unlikely to give the output you want.

Context is often incomplete, unclear, contradictory, or just contains too much distracting information. Those are all things that will cause an LLM to fail that can be fixed by thinking about how an unrelated human would do the job.

EricMausler · 1h ago

Alternatively, I've gotten exactly what I wanted from an LLM by giving it information that would not be enough for a human to work with, knowing that the llm is just going to fill in the gaps anyway.

It's easy to forget that the conversation itself is what the LLM is helping to create. Humans will ignore or depriotitize extra information. They also need the extra information to get an idea of what you're looking for in a loose sense. The LLM is much more easily influenced by any extra wording you include, and loose guiding is likely to become strict guiding

furyofantares · 19m ago

Yeah, it's definitely not a human! But it is often the case in my experience that problems in your context are quite obvious once looked at through a human lens.

Maybe not very often in a chat context, my experience is in trying to build agents.

stefan_ · 1h ago

Theres all these philosophers popping up everywhere. This is also another one of these topics that featured in peoples favorite scifi hyperfixation so all discussions inevitably get ruined with scifi fanfic (see also: room temperature superconductivity).

ModernMech · 1h ago

I agree, however I do appreciate comparisons to other human-made systems. For example, "providing the right information and tools, in the right format, at the right time" sounds a lot like a bureaucracy, particularly because "right" is decided for you, it's left undefined, and may change at any time with no warning or recourse.

mentalgear · 2h ago

Basically, finding the right buttons to push within the constraints of the environment. Not so much different from what (SW) engineering is, only non-deterministic in the outcomes.

QuercusMax · 3h ago

Yeah... I'm always asking my UX and product folks for mocks, requirements, acceptance criteria, sample inputs and outputs, why we care about this feature, etc.

Until we can scan your brain and figure out what you really want, it's going to be necessary to actually describe what you want built, and not just rely on vibes.

lupire · 2h ago

Not "more" context. "Better" context.

(X-Y problem, for example.)

mountainriver · 49m ago

You can give most of the modern LLMs pretty darn good context and they will still fail. Our company has been deep down this path for over 2 years. The context crowd seems oddly in denial about this

arkmm · 34m ago

What are some examples where you've provided the LLM enough context that it ought to figure out the problem but it's still failing?

ozim · 2h ago

Finding a magic prompt was never “prompt engineering” it was always “context engineering” - lots of “AI wannabe gurus” sold it as such but they never knew any better.

RAG wasn’t invented this year.

Proper tooling that wraps esoteric knowledge like using embeddings, vector dba or graph dba becomes more mainstream. Big players improve their tooling so more stuff is available.

dinvlad · 1h ago

I feel like ppl just keep inventing concepts for the same old things, which come down to dancing with the drums around the fire and screaming shamanic incantations :-)

viccis · 1h ago

When I first used these kinds of methods, I described it along those lines to my friend. I told him I felt like I was summoning a demon and that I had to be careful to do the right incantations with the right words and hope that it followed my commands. I was being a little disparaging with the comment because the engineer in me that wants reliability, repeatability, and rock solid testability struggles with something that's so much less under my control.

God bless the people who give large scale demos of apps built on this stuff. It brings me back to the days of doing vulnerability research and exploitation demos, in which no matter how much you harden your exploits, it's easy for something to go wrong and wind up sputtering and sweating in front of an audience.

crystal_revenge · 3h ago

Definitely mirrors my experience. One heuristic I've often used when providing context to model is "is this enough information for a human to solve this task?". Building some text2SQL products in the past it was very interesting to see how often when the model failed, a real data analyst would reply something like "oh yea, that's an older table we don't use any more, the correct table is...". This means the model was likely making a mistake that a real human analyst would have without the proper context.

One thing that is missing from this list is: evaluations!

I'm shocked how often I still see large AI projects being run without any regard to evals. Evals are more important for AI projects than test suites are for traditional engineering ones. You don't even need a big eval set, just one that covers your problem surface reasonably well. However without it you're basically just "guessing" rather than iterating on your problem, and you're not even guessing in a way where each guess is an improvement on the last.

edit: To clarify, I ask myself this question. It's frequently the case that we expect LLMs to solve problems without the necessary information for a human to solve them.

adiabatichottub · 1h ago

A classic law of computer programming:

"Make it possible for programmers to write in English and you will find that programmers cannot write in English."

It's meant to be a bit tongue-in-cheek, but there is a certain truth to it. Most human languages fail at being precise in their expression and interpretation. If you can exactly define what you want in English, you probably could have saved yourself the time and written it in a machine-interpretable language.

kevin_thibedeau · 2h ago

Asking yes no questions will get you a lie 50% of the time.

adriand · 2h ago

I have pretty good success with asking the model this question before it starts working as well. I’ll tell it to ask questions about anything it’s unsure of and to ask for examples of code patterns that are in use in the application already that it can use as a template.

hobs · 2h ago

The thing is, all the people cosplaying as data scientists don't want evaluations, and that's why you saw so little in fake C level projects, because telling people the emperor has no clothes doesn't pay.

For those actually using the products to make money well, hey - all of those have evaluations.

CharlieDigital · 2h ago

I was at a startup that started using OpenAI APIs pretty early (almost 2 years ago now?).

"Back in the day", we had to be very sparing with context to get great results so we really focused on how to build great context. Indexing and retrieval were pretty much our core focus.

Now, even with the larger windows, I find this still to be true.

The moat for most companies is actually their data, data indexing, and data retrieval[0]. Companies that 1) have the data and 2) know how to use that data are going to win.

My analogy is this:

    > The LLM is just an oven; a fantastical oven.  But for it to produce a good product still depends on picking good ingredients, in the right ratio, and preparing them with care.  You hit the bake button, then you still need to finish it off with presentation and decoration.

[0] https://chrlschn.dev/blog/2024/11/on-bakers-ovens-and-ai-sta...

LASR · 7m ago

Honestly, GPT-4o is all we ever needed to build a complete human-like reasoning system.

I am leading a small team working on a couple of “hard” problems to put the limits of LLMs to the test.

One is an options trader. Not algo / HFT, but simply doing due diligence, monitoring the news and making safe long-term bets.

Another is an online research and purchasing experience for residential real-estate.

Both these tasks, we’ve realized, you don’t even need a reasoning model. In fact, reasoning models are harder to get consistent results from.

What you need is a knowledge base infrastructure and pub-sub for updates. Amortize the learned knowledge across users and you have collaborative self-learning system that exhibits intelligence beyond any one particular user and is agnostic to the level of prompting skills they have.

Stay tuned for a limited alpha in this space. And DM if you’re interested.

jumploops · 2h ago

To anyone who has worked with LLMs extensively, this is obvious.

Single prompts can only get you so far (surprisingly far actually, but then they fall over quickly).

This is actually the reason I built my own chat client (~2 years ago), because I wanted to “fork” and “prune” the context easily; using the hosted interfaces was too opaque.

In the age of (working) tool-use, this starts to resemble agents calling sub-agents, partially to better abstract, but mostly to avoid context pollution.

Zopieux · 1h ago

I find it hilarious that this is how the original GPT3 UI worked, if you remember, and we're now discussing of reinventing the wheel.

A big textarea, you plug in your prompt, click generate, the completions are added in-line in a different color. You could edit any part, or just append, and click generate again.

90% of contemporary AI engineering these days is reinventing well understood concepts "but for LLMs", or in this case, workarounds for the self-inflicted chat-bubble UI. aistudio makes this slightly less terrible with its edit button on everything, but still not ideal.

nomel · 1h ago

Did you release your client? I've really wanted something like this, from the beginning.

I thought it would also be neat to merge contexts, by maybe mixing summarizations of key points at the merge point, but never tried.

jcon321 · 2h ago

I thought this entire premise was obvious? Does it really take an article and a venn diagram to say you should only provide the relevant content to your LLM when asking a question?

simonw · 2h ago

"Relevant content to your LLM when asking a question" is last year's RAG.

If you look at how sophisticated current LLM systems work there is so much more to this.

Just one example: Microsoft open sourced VS Code Copilot Chat today (MIT license). Their prompts are dynamically assembled with tool instructions for various tools based on whether or not they are enabled: https://github.com/microsoft/vscode-copilot-chat/blob/v0.29....

And the autocomplete stuff has a wealth of contextual information included: https://github.com/microsoft/vscode-copilot-chat/blob/v0.29....

  You have access to the following information to help you make
  informed suggestions:

  - recently_viewed_code_snippets: These are code snippets that
  the developer has recently looked at, which might provide
  context or examples relevant to the current task. They are
  listed from oldest to newest, with line numbers in the form
  #| to help you understand the edit diff history. It's
  possible these are entirely irrelevant to the developer's
  change.
  - current_file_content: The content of the file the developer
  is currently working on, providing the broader context of the
  code. Line numbers in the form #| are included to help you
  understand the edit diff history.
  - edit_diff_history: A record of changes made to the code,
  helping you understand the evolution of the code and the
  developer's intentions. These changes are listed from oldest
  to latest. It's possible a lot of old edit diff history is
  entirely irrelevant to the developer's change.
  - area_around_code_to_edit: The context showing the code
  surrounding the section to be edited.
  - cursor position marked as ${CURSOR_TAG}: Indicates where
  the developer's cursor is currently located, which can be
  crucial for understanding what part of the code they are
  focusing on.

timr · 1h ago

I get what you're saying, but the parent is correct -- most of this stuff is pretty obvious if you spend even an hour thinking about the problem.

For example, while the specifics of the prompts you're highlighting are unique to Copilot, I've basically implemented the same ideas on a project I've been working on, because it was clear from the limitations of these models that sooner rather than later it was going to be necessary to pick and choose amongst tools.

LLM "engineering" is mostly at the same level of technical sophistication that web work was back when we were using CGI with Perl -- "hey guys, what if we make the webserver embed the app server in a subprocess?" "Genius!"

I don't mean that in a negative way, necessarily. It's just...seeing these "LLM thought leaders" talk about this stuff in thinkspeak is a bit like getting a Zed Shaw blogpost from 2007, but fluffed up like SICP.

simonw · 59m ago

most of this stuff is pretty obvious if you spend even an hour thinking about the problem

I don't think that's true.

Even if it is true, there's a big difference between "thinking about the problem" and spending months (or even years) iteratively testing out different potential prompting patterns and figuring out which are most effective for a given application.

I was hoping "prompt engineering" would mean that.

timr · 51m ago

>I don't think that's true.

OK, well...maybe I should spend my days writing long blogposts about the next ten things that I know I have to implement, then, and I'll be an AI thought-leader too. Certainly more lucrative than actually doing the work.

Because that's literally what's happening -- I find myself implementing (or having implemented) these trendy ideas. I don't think I'm doing anything special. It certainly isn't taking years, and I'm doing it without reading all of these long posts (mostly because it's kind of obvious).

Again, it very much reminds me of the early days of the web, except there's a lot more people who are just hype-beasting every little development. Linus is over there quietly resolving SMP deadlocks, and some influencer just wrote 10,000 words on how databases are faster if you use indexes.

mccoyb · 1h ago

That doesn't strike me as sophisticated, it strikes me as obvious to anyone with a little proficiency in computational thinking and a few days of experience with tool-using LLMs.

The goal is to design a probability distribution to solve your task by taking a complicated probability distribution and conditioning it, and the more detail you put into thinking about ("how to condition for this?" / "when to condition for that?") the better the output you'll see.

(what seems to be meant by "context" is a sequence of these conditioning steps :) )

alfalfasprout · 1h ago

The industry has attracted grifters with lots of "<word of the day> engineering" and fancy diagrams for, frankly, pretty obvious ideas

I mean yes, duh, relevant context matters. This is why so much effort was put into things like RAG, vector DBs, prompt synthesis, etc. over the years. LLMs still have pretty abysmal context windows so being efficient matters.

8organicbits · 2h ago

One thought experiment I was musing on recently was the minimal context required to define a task (to an LLM, human, or otherwise). In software, there's a whole discipline of human centered design that aims to uncover the nuance of a task. I've worked with some great designers, and they are incredibly valuable to software development. They develop journey maps, user stories, collect requirements, and produce a wealth of design docs. I don't think you can successfully build large projects without that context.

I've seen lots of AI demos that prompt "build me a TODO app", pretend that is sufficient context, and then claim that the output matches their needs. Without proper context, you can't tell if the output is correct.

walterfreedom · 35m ago

I am mostly focusing in this issue during the development of my agent engine (mostly for game npcs). Its really important to manage the context and not bloat the llm with irrelevant stuff for both quality and inference speed. I wrote about it here if anyone is interested: https://walterfreedom.com/post.html?id=ai-context-management

No comments yet

hintymad · 41m ago

> The New Skill in AI Is Not Prompting, It's Context Engineering

Sounds like good managers and leaders now have an edge. Per Patty McCord of Netflix fame used to say: All that a manager does is setting the context.

liampulles · 2h ago

The only engineering going on here is Job Engineering™

ryhanshannon · 2h ago

It is really funny to see the hyper fixation on relabeling of soft skills / product development to "<blank> Engineering" in the AI space.

joe5150 · 25m ago

Surely Jim is also using an agent. Jim can't be worth having a quick sync with if he's not using his own agent! So then why are these two agents emailing each other back and forth using bizarre, terse office jargon?

mgdev · 2h ago

If we zoom out far enough, and start to put more and more under the execution umbrella of AI, what we're actually describing here is... product development.

You are constructing the set of context, policies, directed attention toward some intentional end, same as it ever was. The difference is you need fewer meat bags to do it, even as your projects get larger and larger.

To me this is wholly encouraging.

Some projects will remain outside what models are capable of, and your role as a human will be to stitch many smaller projects together into the whole. As models grow more capable, that stitching will still happen - just as larger levels.

But as long as humans have imagination, there will always be a role for the human in the process: as the orchestrator of will, and ultimate fitness function for his own creations.

pyman · 53m ago

That does sound a lot like the role of a software architect. You're setting the direction, defining the constraints, making trade-offs, and stitching different parts together into a working system

somewhereoutth · 59m ago

> for his own creations.

for their own creations is grammatically valid, and would avoid accusations of sexism!

GuinansEyebrows · 54m ago

i just hope that, along with imagination, humans can have an economy that supports this shift.

slavapestov · 2h ago

I feel like if the first link in your post is a tweet from a tech CEO the rest is unlikely to be insightful.

coderatlarge · 52m ago

i don’t disagree with your main point, but is karpathy a tech ceo right now?

simonw · 50m ago

I think they meant Tobi Lutke, CEO of Shopify: https://twitter.com/tobi/status/1935533422589399127

jshorty · 2h ago

I have felt somewhat frustrated with what I perceive as a broad tendency to malign "prompt engineering" as an antiquated approach for whatever new the industry technique is with regards to building a request body for a model API. Whether that's RAG years ago, nuance in a model request's schema beyond simple text (tool calls, structured outputs, etc), or concepts of agentic knowledge and memory more recently.

While models were less powerful a couple of years ago, there was nothing stopping you at that time from taking a highly dynamic approach to what you asked of them as a "prompt engineer"; you were just more vulnerable to indeterminism in the contract with the models at each step.

Context windows have grown larger; you can fit more in now, push out the need for fine-tuning, and get more ambitious with what you dump in to help guide the LLM. But I'm not immediately sure what skill requirements fundamentally change here. You just have more resources at your disposal, and can care less about counting tokens.

simonw · 2h ago

I liked what Andrej Karpathy had to say about this:

https://twitter.com/karpathy/status/1937902205765607626

> [..] in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step. Science because doing this right involves task descriptions and explanations, few shot examples, RAG, related (possibly multimodal) data, tools, state and history, compacting... Too little or of the wrong form and the LLM doesn't have the right context for optimal performance. Too much or too irrelevant and the LLM costs might go up and performance might come down. Doing this well is highly non-trivial. And art because of the guiding intuition around LLM psychology of people spirits.

bgwalter · 21m ago

All that work just for stripping a license. If one uses code directly from GitHub, copy and paste is sufficient. One can even keep the license.

zacharyvoase · 2h ago

I love how we have such a poor model of how LLMs work (or more aptly don't work) that we are developing an entire alchemical practice around them. Definitely seems healthy for the industry and the species.

simonw · 1h ago

The stuff that's showing up under the "context engineering" banner feels a whole lot less alchemical to me than the older prompt engineering tricks.

Alchemical is "you are the world's top expert on marketing, and if you get it right I'll tip you $100, and if you get it wrong a kitten will die".

The techniques in https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.... seem a whole lot more rational to me than that.

semiinfinitely · 2h ago

context engineering is just a phrase that karpathy uttered for the first time 6 days ago and now everyone is treating it like its a new field of science and engineering

eddythompson80 · 2h ago

Which is funny because everyone is already looking at AI as: I have 30 TB of shit that is basically "my company". Can I dump that into your AI and have another, magical, all-konwning, co-worker?

coliveira · 2h ago

Which I think it is double funny because, given the zeal with which companies are jumping into this bandwagon, AI will bankrupt most businesses in record time! Just imagine the typical company firing most workers and paying a fortune to run on top of a schizophrenic AI system that gets things wrong half of the time...

eddythompson80 · 1h ago

Yes, you can see the insanely accelerated pace of bankruptcies or "strategic realignments" among AI startups.

I think it's just game theory in play and we can do nothing but watch it play out. The "up side" is insane, potentially unlimited. The price is high, but so is the potential reward. By the rules of the game, you have to play. There is no other move you can make. No one knows the odds, but we know the potential reward. You could be the next T company easy. You could realistically go from startup -> 1 Trillion in less than a year if you are right.

We need to give this time to play itself out. The "odds" will eventually be better estimated and it'll affect investment. In the mean time, just give your VC Google's, Microsoft's, or AWS's direct deposit info. It's easier that way.

asciii · 29m ago

Here I was thinking that part of Prompt Engineering is understanding context and awareness for other yada yada.

emporas · 1h ago

Prompting sits on the back seat, while context is the driving factor. 100% agree with this.

For programming I don't use any prompts. I give a problem solved already, as a context or example, and I ask it to implement something similar. One sentence or two, and that's it.

Other kind of tasks, like writing, I use prompts, but even then, context and examples are still the driving factor.

In my opinion, we are in an interesting point in history, in which now individuals will need their own personal database. Like companies the last 50 years, which had their own database records of customers, products, prices and so on, now an individual will operate using personal contextual information, saved over a long period of time in wikis or Sqlite rows.

d0gsg0w00f · 1h ago

Yes, the other day I was telling a colleague that we all need our own personal context to feed into every model we interact with. You could carry it around on a thumb drive or something.

_pdp_ · 2h ago

It is wrong. The new/old skill is reverse engineering.

If the majority of the code is generated by AI, you'll still need people with technical expertise to make sense of it.

CamperBob2 · 2h ago

Not really. Got some code you don't understand? Feed it to a model and ask it to add comments.

Ultimately humans will never need to look at most AI-generated code, any more than we have to look at the machine language emitted by a C compiler. We're a long way from that state of affairs -- as anyone who struggled with code-generation bugs in the first few generations of compilers will agree -- but we'll get there.

inspectorwadget · 1h ago

>any more than we have to look at the machine language emitted by a C compiler.

Some developers do actually look at the output of C compilers, and some of them even spend a lot of time criticizing that output by a specific compiler (even writing long blog posts about it). The C language has an ISO specification, and if a compiler does not conform to that specification, it is considered a bug in that compiler.

You can even go to godbolt.org / compilerexplorer.org and see the output generated for different targets by different compilers for different languages. It is a popular tool, also for language development.

I do not know what prompt engineering will look like in the future, but without AGI, I remain skeptical about verification of different kinds of code not being required in at least a sizable proportion of cases. That does not exclude usefulness of course: for instance, if you have a case where verification is not needed; or verification in a specific case can be done efficiently and robustly by a relevant expert; or some smart method for verification in some cases, like a case where a few primitive tests are sufficient.

But I have no experience with LLMs or prompt engineering.

I do, however, sympathize with not wanting to deal with paying programmers. Most are likely nice, but for instance a few may be costly, or less than honest, or less than competent, etc. But while I think it is fine to explore LLMs and invest a lot into seeing what might come of them, I would not personally bet everything on them, neither in the short term nor the long term.

May I ask what your professional background and experience is?

rvz · 2h ago

> Not really. Got some code you don't understand? Feed it to a model and ask it to add comments.

Absolutely not.

An experienced individual in their field can tell if the AI made a mistake in the comments / code rather than the typical untrained eye.

So no, actually read the code and understand what it does.

> Ultimately humans will never need to look at most AI-generated code, any more than we have to look at the machine language emitted by a C compiler.

So for safety critical systems, one should not look or check if code has been AI generated?

rednafi · 2h ago

I really don’t get this rush to invent neologisms to describe every single behavioral artifact of LLMs. Maybe it’s just a yearning to be known as the father of Deez Unseen Mind-blowing Behaviors (DUMB).

LLM farts — Stochastic Wind Release.

The latest one is yet another attempt to make prompting sound like some kind of profound skill, when it’s really not that different from just knowing how to use search effectively.

Also, “context” is such an overloaded term at this point that you might as well just call it “doing stuff” — and you’d objectively be more descriptive.

bGl2YW5j · 3h ago

Saw this the other day and it made me think that too much effort and credence is being given to this idea of crafting the perfect environment for LLMs to thrive in. Which to me, is contrary to how powerful AI systems should function. We shouldn’t need to hold its hand so much.

Obviously we’ve got to tame the version of LLMs we’ve got now, and this kind of thinking is a step in the right direction. What I take issue with is the way this thinking is couched as a revolutionary silver bullet.

aleksiy123 · 2h ago

It may not be a silver bullet, in that it needs lots of low level human guidance to do some complex task.

But looking at the trend of these tools, the help they are requiring is become more and more higher level, and they are becoming more and more capable of doing longer more complex tasks as well as being able to find the information they need from other systems/tools (search, internet, docs, code etc...).

I think its that trend that really is the exciting part, not just its current capabilities.

4ndrewl · 2h ago

Reminds me of first gen chatbots where the user had to put in the effort of trying to craft a phrase in a way that would garner the expected result. It's a form of user-hostility.

ramesh31 · 2h ago

We shouldn't but it's analogous to how CPU usage used to work. In the 8 bit days you could do some magical stuff that was completely impossible before microcomputers existed. But you had to have all kinds of tricks and heuristics to work around the limited abilities. We're in the same place with LLMs now. Some day we will have the equivalent of what gigabytes or RAM are to a modern CPU now, but we're still stuck in the 80s for now (which was revolutionary at the time).

bGl2YW5j · 1m ago

Good points that you and Aleksiy have made. Thanks for enhancing my perspective!

smeej · 2h ago

It also reminds me of when you could structure an internet search query and find exactly what you wanted. You just had to ask it in the machine's language.

I hope the generalized future of this doesn't look like the generalized future of that, though. Now it's darn near impossible to find very specific things on the internet because the search engines will ignore any "operators" you try to use if they generate "too few" results (by which they seem to mean "few enough that no one will pay for us to show you an ad for this search"). I'm moderately afraid the ability to get useful results out of AIs will be abstracted away to some lowest common denominator of spammy garbage people want to "consume" instead of use for something.

skydhash · 1h ago

An empty set of results is a good signal just like a "I don't know" or "You're wrong because <reason>" are good replies to a question/query. It's how a program crashing, while painful, is better than it corrupting data.

gametorch · 3h ago

It's still way easier for me to say

"here's where to find the information to solve the task"

than for me to manually type out the code, 99% of the time

geeewhy · 1h ago

ive beeen experimenting with this for a while, (im sure in a way, most of us did). Would be good to numerate some examples. When it comes to coding, here's a few:

- compile scripts that can grep / compile list of your relevant files as files of interest

- make temp symlinks in relevant repos to each other for documentation generation, pass each documentation collected from respective repos to to enable cross-repo ops to be performed atomically

- build scripts to copy schemas, db ddls, dtos, example records, api specs, contracts (still works better than MCP in most cases)

I found these steps not only help better output but also reduces cost greatly avoiding some "reasoning" hops. I'm sure practice can extend beyond coding.

colgandev · 2h ago

I've been finding a ton of success lately with speech to text as the user prompt, and then using https://continue.dev in VSCode, or Aider, to supply context from files from my projects and having those tools run the inference.

I'm trying to figure out how to build a "Context Management System" (as compared to a Content Management System) for all of my prompts. I completely agree with the premise of this article, if you aren't managing your context, you are losing all of the context you create every time you create a new conversation. I want to collect all of the reusable blocks from every conversation I have, as well as from my research and reading around the internet. Something like a mashup of Obsidian with some custom Python scripts.

The ideal inner loop I'm envisioning is to create a "Project" document that uses Jinja templating to allow transclusion of a bunch of other context objects like code files, documentation, articles, and then also my own other prompt fragments, and then to compose them in a master document that I can "compile" into a "superprompt" that has the precise context that I want for every prompt.

Since with the chat interfaces they are always already just sending the entire previous conversation message history anyway, I don't even really want to use a chat style interface as much as just "one shotting" the next step in development.

It's almost a turn based game: I'll fiddle with the code and the prompts, and then run "end turn" and now it is the llm's turn. On the llm's turn, it compiles the prompt and runs inference and outputs the changes. With Aider it can actually apply those changes itself. I'll then review the code using diffs and make changes and then that's a full turn of the game of AI-assisted code.

I love that I can just brain dump into speech to text, and llms don't really care that much about grammar and syntax. I can curate fragments of documentation and specifications for features, and then just kind of rant and rave about what I want for a while, and then paste that into the chat and with my current LLM of choice being Claude, it seems to work really quite well.

My Django work feels like it's been supercharged with just this workflow, and my context management engine isn't even really that polished.

If you aren't getting high quality output from llms, definitely consider how you are supplying context.

labrador · 2h ago

I’m curious how this applies to systems like ChatGPT, which now have two kinds of memory: user-configurable memory (a list of facts or preferences) and an opaque chat history memory. If context is the core unit of interaction, it seems important to give users more control or at least visibility into both.

I know context engineering is critical for agents, but I wonder if it's also useful for shaping personality and improving overall relatability? I'm curious if anyone else has thought about that.

simonw · 2h ago

I really dislike the new ChatGPT memory feature (the one that pulls details out of a summarized version of all of your previous chats, as opposed to older memory feature that records short notes to itself) for exactly this reason: it makes it even harder for me to control the context when I'm using ChatGPT.

If I'm debugging something with ChatGPT and I hit an error loop, my fix is to start a new conversation.

Now I can't be sure ChatGPT won't include notes from that previous conversation's context that I was trying to get rid of!

Thankfully you can turn the new memory thing off, but it's on by default.

I wrote more about that here: https://simonwillison.net/2025/May/21/chatgpt-new-memory/

labrador · 1h ago

On the other hand, for my use case (I'm retired and enjoy chatting with it), having it remember items from past chats makes it feel much more personable. I actually prefer Claude, but it doesn't have memory, so I unsubscribed and subscribed to ChatGPT. That it remembers obscure but relevant details about our past chats feels almost magical.

It's good that you can turn it off. I can see how it might cause problems when trying to do technical work.

Edit: Note, the introduction of memory was a contributing factor to "the sychophant" that OpenAI had to rollback. When it could praise you while seeming to know you was encouraging addictive use.

Edit2: Here's the previous Hacker News discussion on Simon's "I really don’t like ChatGPT’s new memory dossier"

https://news.ycombinator.com/item?id=44052246

grafmax · 2h ago

There is no need to develop this ‘skill’. This can all be automated as a preprocessing step before the main request runs. Then you can have agents with infinite context, etc.

simonw · 2h ago

You need this skill if you're the engineer that's designing and implementing that preprocessing step.

dolebirchwood · 2h ago

The skill amounts to determining "what information is required for System A to achieve Outcome X." We already have a term for this: Critical thinking.

Zopieux · 1h ago

Why does it takes hundreds of comments for obvious facts to be laid out on this website? Thanks for the reality check.

grafmax · 1h ago

In the short term horizon I think you are right. But over a longer horizon, we should expect model providers to internalize these mechanisms, similar to how chain of thought has been effectively “internalized” - which in turn has reduced the effectiveness that prompt engineering used to provide as models have gotten better.

yunwal · 2h ago

Non-rhetorical question: is this different enough from data engineering that it needs it’s own name?

ofjcihen · 2h ago

Not at all, just ask the LLM to design and implement it.

AI turtles all the way down.

lawlessone · 2h ago

I look forward to 5 million LinkedIn posts repeating this

pyman · 35m ago

Someone needs to build a Chrome extension called "BS Analysis" for LinkedIn

saejox · 2h ago

Claude 3.5 was released 1 year ago. Current LLMs are not much better at coding than it. Sure they are more shiny and well polished, but not much better at all. I think it is time to curb our enthusiasm.

I almost always rewrite AI written functions in my code a few weeks later. Doesn't matter they have more context or better context, they still fail to write code easily understandable by humans.

simonw · 2h ago

Claude 3.5 was remarkably good at writing code. If Claude 3.7 and Claude 4 are just incremental improvements on that then even better!

I actually think they're a lot more than incremental. 3.7 introduced "thinking" mode and 4 doubled down on that and thinking/reasoning/whatever-you-want-to-call-it is particularly good at code challenges.

As always, if you're not getting great results out of coding LLMs it's likely you haven't spent several months iterating on your prompting techniques to figure out what works best for your style of development.

stillpointlab · 1h ago

I've been using the term context engineering for a few months now, I am very happy to see this gain traction.

This new stillpointlab hacker news account is based on the company name I chose to pursue my Context as a Service idea. My belief is that context is going to be the key differentiator in the future. The shortest description I can give to explain Context as a Service (CaaS) is "ETL for AI".

patrickhogan1 · 2h ago

OpenAI’s o3 searches the web behind a curtain: you get a few source links and a fuzzy reasoning trace, but never the full chunk of text it actually pulled in. Without that raw context, it’s impossible to audit what really shaped the answer.

simonw · 2h ago

Yeah, I find that really frustrating.

I understand why they do it though: if they presented the actual content that came back from search they would absolutely get in trouble for copyright-infringement.

I suspect that's why so much of the Claude 4 system prompt for their search tool is the message "Always respect copyright by NEVER reproducing large 20+ word chunks of content from search results" repeated half a dozen times: https://simonwillison.net/2025/May/25/claude-4-system-prompt...

Zopieux · 50m ago

This is no secret or suspicion. It is definitely about avoiding (more accuratly, delaying until legislation destroys the business model) the warth of copyright holders with enough lawyers.

I find this very hypocritical given that for all intents and purposes the infringement already happened at training time, since most content wasn't acquired with any form of retribution or attribution (otherwise this entire endeavor would not have been economically worth it). See also the "you're not allowed to plagiarize Disney" being done by all commercial text to image providers.

bag_boy · 1h ago

Anecdotally, I’ve found that chatting with Claude about a subject for a bit — coming to an understanding together, then tasking it — produces much better results than starting with an immediate ask.

I’ll usually spend a few minutes going back and forth before making a request.

For some reason, it just feels like this doesn't work as well with ChatGPT or Gemini. It might be my overuse of o3? The latency can wreck the vibe of a conversation.

hnthrow90348765 · 2h ago

Cool, but wait another year or two and context engineering will be obsolete as well. It still feels like tinkering with the machine, which is what AI is (supposed to be) moving us away from.

hobs · 2h ago

Probably impossible unless computers themselves change in another year or two.

pwarner · 2h ago

It's an integration adventure. This is why much AI is failing in the enterprise. MS Copilot is moderately interesting for data in MS Office, but forget about it accessing 90% of your data that's in other systems.

alganet · 2h ago

If I need to do all this work (gather data, organize it, prepare it, etc), there are other AI solutions I might decide to use instead of an LLM.

joe5150 · 2h ago

You might as well use your natural intelligence instead of the artificial stuff at that point.

coliveira · 2h ago

Yes, when all is said and done people will realize that artificial intelligence is too expensive to replace natural intelligence. AI companies want to avoid this realization for as long as possible.

alganet · 2h ago

This is not what I'm talking about, see the other reply.

alganet · 2h ago

I'm assuming the post is about automated "context engineering". It's not a human doing it.

In this arrangement, the LLM is a component. What I meant is that it seems to me that other non-LLM AI technologies would be a better fit for this kind of thing. Lighter, easier to change and adapt, potentially even cheaper. Not for all scenarios, but for a lot of them.

simonw · 2h ago

What kind of alternative AI solutions might you use here?

alganet · 2h ago

Classifiers to classify things, traditional neural nets to identify things. Typical run of the mill.

In OpenAI hype language, this is a problem for "Software 2.0", not "Software 3.0" in 99% of the cases.

The thing about matching an informal tone would be the hard part. I have to concede that LLMs are probably better at that. But I have the feeling that this is not exactly the feature most companies are looking for, and they would be willing to not have it for a cheaper alternative. Most of them just don't know that's possible.

whimsicalism · 2h ago

i think context engineering as described is somewhat a subset of ‘environment engineering.’ the gold-standard is when an outcome reached with tools can be verified as correct and hillclimbed with RL. most of the engineering effort is from building the environment and verifier while the nuts and bolts of grpo/ppo training and open-weight tool-using models are commodities.

drmath · 1h ago

Isn't "context" just another word for "prompt?" Techniques have become more complex, but they're still just techniques for assembling the token sequences we feed to the transformer.

simonw · 1h ago

Almost. It's the current prompt plus the previous prompts and responses in the current conversation.

The idea behind "context engineering" is to help people understand that a prompt these days can be long, and can incorporate a whole bunch of useful things (examples, extra documentation, transcript summaries etc) to help get the desired response.

"Prompt engineering" was meant to mean this too, but the AI influencer crowd redefined it to mean "typing prompts into a chatbot".

drmath · 1h ago

Haha there's a pigheaded part of me that insists all of that is the "prompt," but I just read your bit about "inferred definitions," and acceptance is probably a healthier attitude.

amelius · 2h ago

Yes, and it is a soft skill.

retinaros · 2h ago

it is still sending a string of chars and hoping the model outputs something relevant. let’s not do like finance and permanently obfuscate really simple stuff to make us bigger than we are.

prompt engineering/context engineering : stringbuilder

Retrieval augmented generation: search+ adding strings to main string

test time compute: running multiple generation and choosing the best

agents: for loop and some ifs

davidclark · 2h ago

Good example of why I have been totally ignoring people who beat the drum of needing to develop the skills of interacting with models. “Learn to prompt” is already dead? Of course, the true believers will just call this an evolution of prompting or some such goalpost moving.

Personally, my goalpost still hasn’t moved: I’ll invest in using AI when we are past this grand debate about its usefulness. The utility of a calculator is self-evident. The utility of an LLM requires 30k words of explanation and nuanced caveats. I just can’t even be bothered to read the sales pitch anymore.

simonw · 2h ago

We should be so far past the "grand debate about its usefulness" at this point.

If you think that's still a debate, you might be listening to the small pool of very loud people who insist nothing has improved since the release of GPT-4.

davidclark · 37m ago

Have you considered the opposite? Reflected on your own biases?

I’m listening to my own experience. Just today I gave it another fair shot. GitHub Copilot agent mode with GPT-4.1. Still unimpressed.

This is a really insightful look at why people perceive the usefulness of these models differently. It is fair to both sides without being dismissive as one side just not “getting it” or how we should be “so far” past debate:

https://ferd.ca/the-gap-through-which-we-praise-the-machine....

simonw · 8m ago

Do either of these impress you?

https://alexgaynor.net/2025/jun/20/serialize-some-der/ - using Claude Code to compose and have a PR accepted into llvm that implements a compiler optimization (more of my notes here: https://simonwillison.net/2025/Jun/30/llvm/ )

https://lucumr.pocoo.org/2025/6/21/my-first-ai-library/ - Claude Code for writing and shipping a full open source library that handles sloppy (hah) invalid XML

Examples from the past two weeks, both from expert software engineers.

nandhinianand · 49m ago

I think this is definitely true for novel writing and stuff like that based on my experiments with AI so far.. I'm still on the fence about coding/building s/w based on it, but that may just be about the unlearning and re-learning i'm yet to do/try out.

fragmede · 1h ago

Should be, but the bar for scientifically proven is high. Absent actual studies showing this, (and with a large N), people will refuse to believe things they don't want to be true.

jongjong · 2h ago

Recently I started work on a new project and I 'vibe coded' a test case for a complex OAuth token expiry bug entirely with AI (with Cursor), complete with mocks and stubs... And it was on someone else's project. I had no prior familiarity with the code.

That's when I understood that vibe coding is real and context is the biggest hurdle.

That said, most of the context could not be pulled from the codebase directly but came from me after asking the AI to check/confirm certain things that I suspected could be the problem.

I think vibe coding can be very powerful in the hands of a senior developer because if you're the kind of person who can clearly explain their intuitions with words, it's exactly the missing piece that the AI needs to solve the problem... And you still need to do code review aspect which is also something which senior devs are generally good at. Sometimes it makes mistakes/incorrect assumptions.

I'm feeling positive about LLMs. I was always complaining about other people's ugly code before... I HATE over-modularized, poorly abstracted code where I have to jump across 5+ different files to figure out what a function is doing; with AI, I can just ask it to read all the relevant code across all the files and tell me WTF the spaghetti is doing... Then it generates new code which 'follows' existing 'conventions' (same level of mess). The AI basically automates the most horrible aspect of the work; making sense of the complexity and churning out more complexity that works. I love it.

That said, in the long run, to build sustainable projects, I think it will require following good coding conventions and minimal 'low code' coding... Because the codebase could explode in complexity if not used carefully. Code quality can only drop as the project grows. Poor abstractions tend to stick around and have negative flow-on effects which impact just about everything.

m3kw9 · 2h ago

Well, it’s still a prompt

adhamsalama · 2h ago

There is no engineering involved in using AI. It's insulting to call begging an LLM "engineering".

rednafi · 2h ago

This. Convincing a bullshit generator to give you the right data isn’t engineering, it quackery. But I guess “context quackery” wouldn’t sell as much.

LLMs are quite useful and I leverage them all the time. But I can’t stand these AI yappers saying the same shit over and over again in every media format and trying to sell AI usage as some kind of profound wizardry when it’s not.

mikhmha · 1h ago

It is total quackery. When you zoom out in these discussions you begin to see how the AI yappers and their methodology is just modern-day alchemy with its own jargon and "esoteric" techniques.

simonw · 1h ago

See my comment here. These new context engineering techniques are a whole lot less quackery than the prompting techniques from last year: https://news.ycombinator.com/item?id=44428628

Zopieux · 44m ago

That's the definition of a hype cycle. Can't wait for tech to be past it.

bradhe · 2h ago

Back in my day we just called this "knowing what to google" but alright, guys.

ModernMech · 2h ago

"Wow, AI will replace programming languages by allowing us to code in natural language!"

"Actually, you need to engineer the prompt to be very precise about what you want to AI to do."

"Actually, you also need to add in a bunch of "context" so it can disambiguate your intent."

"Actually English isn't a good way to express intent and requirements, so we have introduced protocols to structure your prompt, and various keywords to bring attention to specific phrases."

"Actually, these meta languages could use some more features and syntax so that we can better express intent and requirements without ambiguity."

"Actually... wait we just reinvented the idea of a programming language."

throwawayoldie · 2h ago

Only without all that pesky determinism and reproducibility.

(Whoever's about to say "well ackshually temperature of zero", don't.)

nimish · 1h ago

A half baked programming language that isn't deterministic or reproducible or guaranteed to do what you want. Worst of all worlds unless your input and output domains are tolerant to that, which most aren't. But if they are, then it's great

georgeburdell · 2h ago

We should have known up through Step 4 for a while. See: the legal system

mindok · 2h ago

“Actually - curly braces help save space in the context while making meaning clearer”

neilv · 1h ago

> Then you can generate a response.

> > Hey Jim! Tomorrow’s packed on my end, back-to-back all day. Thursday AM free if that works for you? Sent an invite, lmk if it works.

Feel free to send generated AI responses like this if you are a sociopath.

joe5150 · 31m ago

Jim's agent replies, "Thursday AM touchbase sounds good, let's circle back after." Both agents meet for a blue sky strategy session while Jim's body floats serenely in a nutrient slurry.

la64710 · 2h ago

Of course the best prompts automatically included providing the best (not necessarily most) context to extract the right output.

intellectronica · 2h ago

See also: https://ai.intellectronica.net/context-engineering for an overview.

rvz · 2h ago

This is just another "rebranding" of the failed "prompt engineering" trend to promote another borderline pseudo-scientific trend to attact more VC money to fund a new pyramid scheme.

Assuming that this will be using the totally flawed MCP protocol, I can only see more cases of data exfiltration attacks on these AI systems just like before [0] [1].

Prompt injection + Data exfiltration is the new social engineering in AI Agents.

[0] https://embracethered.com/blog/posts/2025/security-advisory-...

[1] https://www.bleepingcomputer.com/news/security/zero-click-ai...

Zopieux · 41m ago

Rediscovering basic security concepts and hygiene from 2005 is also a very hot AI thing right now, so that tracks.

Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktoken (github.com)

Show HN: New Ensō – first public beta (untested.sonnet.io)

Show HN: Local LLM Notepad – run a GPT-style model from a USB stick (github.com)

Show HN: We're two coffee nerds who built an AI app to track beans and recipes (beanbook.app)

Show HN: Open-Source International Space Station Tracker ESP32/Arduino for $20 (github.com)

Show HN: A continuation of IRS Direct File that can be self-hosted (github.com)

Show HN: Timezone converter that tells you if your meeting time sucks (timezig.com)

Show HN: C.O.R.E – Opensource, user owned, shareable memory for Claude, Cursor (github.com)

Show HN: Private real-time dictation app for Mac (github.com)

Show HN: I built a daily sunlight tracker (lumehealth.io)

Show HN: Audiopipe – Pipeline for audio diarization, denoising and transcription (github.com)

Show HN: Attach Gateway – one-command OIDC/DID auth for local LLMs (github.com)

Show HN: Octelium – FOSS Alternative to Teleport, Cloudflare, Tailscale, Ngrok (github.com)

Show HN: ArcFont – Font Embedding Model (github.com)

Show HN: QuizKnit, an open source quiz creator (quizknit.com)

Show HN: I'm an airline pilot – I built interactive graphs/globes of my flights (jameshard.ing)

Show HN: MailMap – Turn emails into interactive stories on Google Maps (mailmap.site)

Show HN: Summle – A little maths Game (summle.net)

Show HN: Sharpe Ratio Calculation Tool (fundratios.com)

Show HN: Semantic-dictionary – A Python dictionary with semantic lookup (github.com)

Show HN: AGL a toy language that compiles to Go (github.com)

Show HN: DotnetEbpf - Write Linux eBPF kernel applications in C# (github.com)

Show HN: Ravana, Multi AI Assistant Browser Desktop App (github.com)

Show HN: My Cross-Platform MySQL Parser (abbychau.github.io)

Show HN: Vet – A tool for safely running remote shell scripts (getvet.sh)

Show HN: AI image alt text generation tool – Turn images into text with AI (imagerr.ai)

Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted) (llmapitest.com)

Show HN: Sink – Sync any directory with any device on your local network (github.com)

Show HN: A different kind of AI Video generation

Show HN: Flutter Course 3.0: A Browser IDE (hungrimind.com)

Show HN: Magnitude – Open-source AI browser automation framework (github.com)

Show HN: Do you know RGB? (maxwellito.github.io)

Show HN: Zenta – Mindfulness for Terminal Users (github.com)

Show HN: I built an AI dataset generator (github.com)

Show HN: SmartStepper – Multi-Step Form Library with Config-Based Flow (github.com)

Show HN: SVG Lined Tile Generator (adpreese.github.io)

Show HN: Layerfig Type-safe layered config for JavaScript/TS with any validator (layerfig.dev)

Show HN: Tablr – Supabase with AI Features (tablr.dev)

Show HN: Oasis – An open-source, 3D-printed smart terrarium (github.com)

Show HN: BloomPilot – AI-Powered Overlay for Bloomberg Terminal (prestigious-albatross-928.convex.app)

Show HN: Ciara – Securely deploy any application on any server (ciara-deploy.dev)

Show HN: Scream to Unlock – Blocks social media until you scream “I'm a loser”

Show HN: PILF, The ultimate solution to catastrophic oblivion on AI models (github.com)

Show HN: Anti-Cluely – Detect virtual devices and cheating tools on exam systems (anti-cluely.com)

Show HN: Ketcher Docker – Self-Hosting Advanced Chemical Structure Editor (github.com)

Show HN: EliteSaaS – Self-Contained SaaS Starter (Auth, Billing, Email, Launch) (elitesaas.dev)

Show HN: Clai - Vendor agnostic Claude Code/Gemini CLI written in Go (github.com)

Show HN: PRSS Site Creator – Create Blogs and Websites from Your Desktop (prss.co)

Show HN: Elelem, a tool-calling CLI for Ollama and DeepSeek in C (codeberg.org)

Show HN: Visualizing method dependencies over classes in C# and TypeScript (github.com)

The new skill in AI is not prompting, it's context engineering

Comments (182)