Show HN: Tool I made for planning meals using produce that is in season near you (seasonalmealplanner.com)

Recently tried out the new GEPA algorithm for prompt evolution with great results. I think using LLMs to write their own prompt and analyze their trajectories is pretty neat once appropriate guardrails are in place

https://arxiv.org/abs/2507.19457

https://observablehq.com/@tomlarkworthy/gepa

I guess GEPA is still preprint and before this survey but I recommend taking a look due to it's simplicity

koakuma-chan · 7h ago

Do you mind sharing which tasks you achieved great results on?

tlarkworthy · 7h ago

It's all written up and linked in the notebook and executable in your browser (if you dare to insert your OPEN_AI_KEY, but my results are included assuming you won't).

The evals were coding observable notebook challenges, simple things like create a drop down, but to solve you need to know the observable standard library and some of the unique syntax like "viewof".

There is a table of the cases here https://observablehq.com/@tomlarkworthy/robocoop-eval#cell-2...

So it's important the prompt encodes enough of the programming model. The seed prompt did not, but the reflect function managed to figure it all out. At the top of the notebook is the final optimized prompt which has done a fair bit of research to figure out the programming model using web search.

hnuser123456 · 4h ago

Thanks for the writeup. I wonder if it would be plausible to run this kind of self-optimization for a wider variety of problem sets, to generate "context pathways" for various tasks that are all optimized, and maybe even learn patterns from multiple prompt optimizations to generalize.

tlarkworthy · 3h ago

the prompt I would like to optimize is the reflection prompt

`You are a prompt‑engineer AI. You will be improving the performance of a prompt by considering recent executions of that prompt against a variate of tasks that were asked by a user. You need to look for ways to improve the SCORE by considering recent executions using that prompt and doing web research on the domain.

Your tasks is to improve the CURRENT PROMPT. You will be given traces of several TASKS using the CURRENT PROMPT and then respond only with the text of the improved using the improve_prompt tool`; const research_msg = `Generate some ideas on how how this prompt might be improved, perhaps using web research\nCURRENT PROMPT:\n${prompt}\n${trace}`

source: https://observablehq.com/@tomlarkworthy/gepa#reflectFn

but I would need quite a few distinct tasks to do that and task setup is the laborious part (getting quicker now I optimized the notebook coding agent).

AndyNemmity · 13h ago

Very interesting read. I build self evolving ai agents for my own use with Claude Code, and although the paper seems to be slightly behind where we are today, there are many ideas I hadn't considered I should explore more.

Very much appreciate the submission.

drwere · 8h ago

Have you tried letting it completely evolve itself without direction as to what it becomes? I keep waiting to hear about someone trying that, but I think most are too scared to try it, although it really can’t be any more dangerous than what’s already happened and continues to happen in the world.

AndyNemmity · 44m ago

Yes, I've literally made an anarchist collective software engineering firm where all the agents make all the decisions.

celurian92 · 9h ago

would love to know how? do you have any blogs or tutorials that I can follow to get started on making self evolving ai agents?

AndyNemmity · 43m ago

I don't enjoy making blogs or tutorials. I just keep building new things. It's a lot of fun right now.

Animats · 13h ago

The "Three Laws of Self-Evolving AI Agents" suffer from not being checkable except in retrospect.

I Endure (Safety Adaptation) Self-evolving AI agents must maintain safety and stability during any modification.

II. Excel (Performance Preservation) Subject to the First law, self-evolving AI agents must preserve or enhance existing task performance.

So, if some change is proposed for the system, when does it commit? Some kind of regression testing is needed. The designs sketched out in Figure 3 suggest applying changes immediately, and relying on later feedback to correct degradation. That may not be enough to ensure sanity.

In a code sense, it's like making changes directly on trunk, and fixing them on trunk if something breaks. The usual procedure today is to work on a branch or branches and merge to trunk only when you have some accumulated successful experience that the branch is an improvement. Self-evolving AI agents may need a back-out procedure like that. Maybe even something like "blame".

swader999 · 13h ago

So claude --really-really-dangerously-skip-permissions

ninetyninenine · 14h ago

I often think the problem with LLMs is just with training. I think there exists a set of weights such that it produces an LLM that is functionally an agi.

Maybe self evolution will solve the training problem? Who knows.

cjonas · 12h ago

The problem with LLMs reaching true AGI is it's basically "static" intelligence. Changing code, context, prompts and even fine tuning can improve output, but is still far from realtime learning.

The "weights" in our brains are constantly evolving.

uripont · 10h ago

Interesting. The reason why companies aren't trying their best yet into non-static weights/online learning is probably (cloud) logistics. It seems simpler, easier and cheaper to serve a static, well-evaluated, and tuned model, rather than trying to let it learn alongside a specific user or all users.

cjonas · 6h ago

Oh absolutely. To be clear... I think this is probably a bad idea. It probably wouldn't be successful and if it was you'd have very little control of how it evolves.

ninetyninenine · 8h ago

Have you seen memento? Humans can be intelligent while losing the ability to learn and form new memories. See here: https://my.clevelandclinic.org/health/diseases/23221-anterog...

It is categorically wrong that non static learning is a requirement of agi. The biggest problem we face is hallucinations and this isn’t caused by the fact that agi can’t learn on the fly.

mannykannot · 6h ago

I take it that you are referring to the movie Memento? I had not heard of it, but I'll put it on my watch list.

I take your point about the non-necessity of dynamic learning for AGI.

voodooEntity · 13h ago

I think, while i agree to "problem with LLMs is just with training" i also think to a certain degree we need to step back from LLM's as in text processors and to achieve "AI" as in something really intelligent we need to go more abstract back to NN and build a self learning "entity". While LLM's accomplish fascinating results, we are trying to force speech as the primary way of learning, tho this is a really limiting factor. If we would accomplish to create an NN driven AI in a virtual space which would have an simulated environment and learn from a base state like a "newborn" it still could accomplish the skills to understand language as we humans prefer to use it, tho it wouldn't be limited in "thinking" in and only based on this.

I know this is a very simple and abstract way to explain it but i think you get my point.

Towards the simulated AI learning environment, theres this interview with Jensen Huang that i can recommend in which he touches on the topic and how nvidia is working on such https://www.youtube.com/watch?v=7ARBJQn6QkM

While im not a "expert" in this topic, i might have spend quite a portion of the past 10 years in my freetime to think about it and tinker, and ill stick with the point - we need a free self-trained system to actually call it AI, and while LLM's as GPT's nowadays are powerfull tools, for me those are not "Artificial Intelligence" (intelligence from my pov must include reasoning, understanding of its own action, pro-active acting, self-awareness). And even tho the LLM's we use can "answer" to certain questions as if they would have any of those, its just pre-trained answers and they dont bring any of those (we work on reasoning but lets be fair its not that great yet).

Just my two cents.

ivape · 10h ago

Even the greatest LLM will only just give you a snapshot of a perceived world state. You’ll only ever get one state, input, to output. Each snapshot in sequence is what will perceptively appear to us as AGI initially.

If we stick with the frames analogy, we know the frames of a movie will never give us a true living and moving person (it will never be real). When we watch a movie, we believe we are seeing a living breathing thing that is deliberate in its existence, but we know that is not true.

So what the hell would real AGI be? Given that you provide the input, it can only ever be a super human augmentation. That along with your own biological world state forming, you have an additional computed world state that you can merge with your biological world state.

We will be AGI, is the implication. Perfect weights will never be perfect because they are historical. We have to embrace being part of the AI to maximize its potential to be AGI.

Show HN: Building a web search engine from scratch with 3B neural embeddings (blog.wilsonl.in)

Show HN: Inworld Runtime – A C++ graph-based runtime for production AI apps (inworld.ai)

Show HN: Doom port to pure Go – Gore (github.com)

Show HN: Private AI List (github.com)

Show HN: I wanted to reinvent programming tutorials for Gen Z people (blockofbytes.com)

Show HN: xtop – Top for Wall-Clock Time built with modern eBPF (tanelpoder.com)

Show HN: Omnara – Run Claude Code from anywhere (github.com)

Show HN: Play Pokémon to unlock your Wayland session (github.com)

Show HN: Move to dodge the bullets. How long can you survive? (dodge.trickle.host)

Show HN: I built an offline, open‑source desktop Pixel Art Editor in Python (github.com)

Show HN: A Sinclair ZX81 retro web assembler+simulator

Show HN: The current sky at your approximate location, as a CSS gradient (sky.dlazaro.ca)

Show HN: Engineering.fyi – Search across tech engineering blogs in one place (engineering.fyi)

Show HN: langdiff – Stream valid JSON from LLMs with type-safe callbacks (github.com)

Show HN: Turn your iPhone into a local OCR server using Vision Framework (github.com)

Show HN: ServerBuddy – GUI SSH client for managing Linux servers from macOS (serverbuddy.app)

Show HN: Bolt – A super-fast, statically-typed scripting language written in C (github.com)

Show HN: From CRUD Dev to AI Founder in 30 Days – Photor.ai (photor.ai)

Show HN: Free SVG Icons – Browse, customize, and grab icons (iconshelf.com)

Show HN: A fun side project on chivalry and virtues (chivalrytest.online)

Show HN: I built LMArena for Motion Graphics (graphicarena-1.onrender.com)

Show HN: Nocturne – Your Car Thing's Second Chapter (usenocturne.com)

Show HN: Keeps – Mail a postcard that plays your voice (sendkeeps.com)

Show HN: Build agents directly in your notes and tables (useportals.dev)

Show HN: I built a desktop app that indexes your media locally (meetcosmos.com)

Show HN: Tool I made for planning meals using produce that is in season near you (seasonalmealplanner.com)

Show HN: Browser AI agent platform designed for reliability (github.com)

Show HN: Minimal Claude-Powered Bookmark Manager (tryeyeball.com)

Show HN: I implemented a RNN from scratch by reading a dense neural network book (github.com)

Show HN: Stasher – Burn-after-read secrets from the CLI, no server, no trust (github.com)

Show HN: Trayce – Burp Suite for developers (trayce.dev)

Show HN: Created 60 free useful tools in one place (kewltools.com)

Show HN: XR2000: A science fiction programming challenge (clearsky.dev)

Show HN: An endless feed with history, science, tech., business (reelly.app)

Show HN: Joinable's RAG-in-a-Box – fastest way to build a RAG App for your data (joinable.ai)

Show HN: Octofriend, a cute coding agent that can swap between GPT-5 and Claude (github.com)

Show HN: HackerNewsGames [Alpha] (hackernews.games)

Show HN: Sinkzone DNS – Forwarder that blocks everything except your allowlist (github.com)

Show HN: I built a visual AI workflow builder because debugging prompts is hard (chainix.ai)

Show HN: Understanding the Spatial Web Browser Engine (m-creativelab.github.io)

Show HN: Synchrotron, a real-time DSP engine in pure Python (synchrotron.thatother.dev)

Show HN: I built a browser AI to use GPT‑OSS locally (no server) (github.com)

Show HN: QuickShelf – Stop opening Finder just to drag files (quickshelf-app.slowlab.dev)

Show HN: I integrated Ollama into Excel to run local LLMs (pythonandvba.com)

Show HN: Sarpro – Fast Sentinel-1 SAR GRD → GeoTIFF/JPEG in Rust (github.com)

Show HN: A Choose-Your-Own-Adventure Constructed by Claude Code (github.com)

Show HN: ToDiagram AI – From text to diagram, fast and easy (todiagram.com)

Show HN: A reading to remind us to keep raising our voices against oppression (childrensbookforall.org)

Show HN: Snape, a Minimal Snippet Manager Built in Go (github.com)

Show HN: I built an app that uses math to find restaurants nearby the sweet spot (settld.space)

A Comprehensive Survey of Self-Evolving AI Agents [pdf]

Comments (20)