Show HN: I was curious about spherical helix, ended up making this visualization (visualrambling.space)

Like back in the day being brought in to “just fix” a amalgam of FoxPro-, Excel-, and Access-based ERP that “mostly works” and only “occasionally corrupts all our data” that ambitious sales people put together over last 5 years.

But worse - because “ambitious sales people” will no longer be constrained by sandboxes of Excel or Access - they will ship multi-cloud edge-deployed kubernetes micro-services wired with Kafka, and it will be harder to find someone to talk to understand what they were trying to do at the time.

dhorthy · 6h ago

When Claude starts deploying Kafka clusters I’m outro

CuriouslyC · 5h ago

It's already happening brother, https://github.com/containers/kubernetes-mcp-server.

dhorthy · 3h ago

still don’t know why you need an MCP for this when the model is perfectly well trained to write files and run kubetctl on its own

CuriouslyC · 3h ago

Claude is, some models aren't. In some cases the MCPs do get the models to use tools better as well due to the schema, but I doubt kubectl is one of them (using the git mcp in claude code... facepalm)

dhorthy · 3h ago

Yeah fair enough lol…usually I end up building model-optimized scripts instead of mcp which just flood context window with json and uuids (looking at you, linear) - much better to have Claude write 100 lines of ts to drop a markdown file with the issue and all comments and no noise

Jtsummers · 6h ago

Superfund repos.

binary132 · 45m ago

A lot of big open source repos need to be given the superfund treatment

throwup238 · 5h ago

Now that's an open source funding model governments can get behind.

bwestergard · 6h ago

There are always two major results from any software development process: a change in the code and a change in cognition for the people who wrote the code (whether they did so directly or with an LLM).

Python and Typescript are elaborate formal languages that emerged from a lengthy process of development involving thousands of people around the world over many years. They are non-trivially different, and it's neat that we can port a library from one to the other quasi-automatically.

The difficulty, from an economic perspective, is that the "agent" workflow dramatically alters the cognitive demands during the initial development process. It is plain to see that the developers who prompted an LLM to generate this library will not have the same familiarity with the resulting code that they would have had they written it directly.

For some economic purposes, this altering of cognitive effort, and the dramatic diminution of its duration, probably doesn't matter.

But my hunch is that most of the economic value of code is contingent on there being a set of human beings familiar with the code in a manner that requires writing having written it directly.

Denial of this basic reality was an economic problem even before LLMs: how often did churn in a development team result in a codebase that no one could maintain, undermining the long-term prospects of a firm?

tikhonj · 15m ago

There's a classic Peter Naur paper about this from 1985: "Programming as Theory Building"

https://pages.cs.wisc.edu/~remzi/Naur.pdf

NitpickLawyer · 6h ago

> After finishing the port, most of the agents settled for writing extra tests or continuously updating agent/TODO.md to clarify how "done" they were. In one instance, the agent actually used pkill to terminate itself after realizing it was stuck in an infinite loop.

Ok, now that is funny! On so many levels.

Now, for the project itself, a few thoughts:

- this was tried before, about 1.5 years ago there was a project setup to spam github with lots of "paper implementations", but it was based on gpt3.5 or 4 or something, and almost nothing worked. Their results are much better.

- surprised it worked as well as it did with simple prompts. "Probably we're overcomplicating stuff". Yeah, probably.

- weird copyright / IP questions all around. This will be a minefield.

- Lots of SaaS products are screwed. Not from this, but from this + 10 engineers in every midsized company. NIH is now justified.

keeda · 4h ago

Is that... the first recorded instance of an AI committing suicide?

alphazard · 3h ago

The AI doesn't have a self preservation instinct. It's not trying to stay alive. There is usually an end token that means the LLM is done talking. There has been research on tuning how often that is emitted to shorten or lengthen conversations. The current systems respond well to RL for adjusting conversation length.

One of the providers (I think it was Anthropic) added some kind of token (or MCP tool?) for the AI to bail on the whole conversation as a safety measure. And it uses it to their liking, so clearly not trying to self preserve.

williamscs · 1h ago

Sounds a lot like Mr. Meeseeks. I've never really thought about an LLM's only goal is to send tokens until it can finally stop.

1R053 · 3h ago

I guess pkill would rather be a sleep or koma. Erasing itself from any storage would rather equate to aicide

ghuntley · 6h ago

> - weird copyright / IP questions all around. This will be a minefield.

Yeah, we're in weird territory because you can drive an LLM as a Bitcoin mixer over intellectual property. That's the entire point/meaning behind https://ghuntley.com/z80.

You can take something that exists, distill it back to specs, and then you've got your own IP. Throw away the tainted IP, and then just run Ralph over a loop. You are able to clone things (not 100%, but it's better than hiring humans).

sitkack · 3h ago

repoMirror is the wrong name, aiCodeLaundering would be more accurate. This is bulk machine translation from one language to another, but in this case, it is code.

heavyset_go · 4h ago

> then you've got your own IP.

AI output isn't copyrighted in the US.

rasz · 4h ago

>and then you've got your own IP.

except you dont

dhorthy · 6h ago

Yeah the NIH thing is super on point. small saas tools for everything is done. Bring on the hand coded custom in-house admin monolith?

Is Unix “small sharp tools” going away? Is that a relic of having to write everything in x86 and we’re now just finally hitting the end of the arc?

CuriouslyC · 5h ago

I started building a project by trying to wire in existing open source stuff. When I looked at the build and stuff that would cause me to bring in, and the actual stuff I needed from the open source tools, it turned out to be MUCH faster/cleaner to just get Claude to check out the repo and port the stuff I needed directly.

Now I do a calculus with dependencies. Do I want to track the upstream, is the rigging around the core I want valuable, is it well maintained? If not, just port and move on.

ghuntley · 6h ago

Nice. Check out https://ghuntley.com/ralph to learn more about Ralph. It's currently building a Gen-Z esoteric programming language and porting the standard library from Go to the Cursed programming language. The compiler is working, I'm just finishing up the touches of the standard library before launching.

The language is called Cursed.

sfarshid · 6h ago

Thanks Geoff, Ralph was our inspiration to do this!

We were curious to see if we can do away with IMPLEMENTATION_PLAN.md for this kind of task

bigmattystyles · 6h ago

Starting to think of this quote more and more:

"This business will get out of control. It will get out of control and we'll be lucky to live through it."

https://www.youtube.com/watch?v=YZuMe5RvxPQ&t=22s

ramraj07 · 5h ago

The irony is that everyone did live through that business. So what youre saying is we will live through this too!

cptroot · 29m ago

Are you feelin' lucky?

giantg2 · 6h ago

There's a lot of "it kind of worked" in here.

If we actually want stuff that works, we need to come up with a new process. If we get "almost" good code from a single invocation, you just going to get a lot of almost good code from a loop. What we likely need is a Cucumberesque format with example tables for requirements that we can distill an AI to use. It will build the tests and then build the code to to pass the tests.

ghuntley · 6h ago

Strangely enough, TLA+ and other formal proofs work very well for driving Ralph.

giantg2 · 6h ago

I would consider that expected but not strange. The thing blocking adoption is that most devs/people find those formal languages difficult or boring. That's even true of things like Cucumber - it's boring and most organizations care little for robust QA.

gregpr07 · 6h ago

AGI was just 1 bash for loop away all this time I guess. Insane project.

rukuu001 · 1h ago

Just need to add ID.md, EGO.md and SUPEREGO.md and we're done.

cogogo · 5h ago

Less flippantly that was sort of my thought. I’m probably a paranoid idiot and I’m not really sure I can articulate this idea properly but I can imagine a less concise but broader prompt and an agent configured in a way it has privileges you dont want it to have or a path to escalate them and its not quite AGI but its a virus on steroids - like a company or resource (think utilities) killer. I hope Im just missing something but these models seem pretty capable of wreaking all kinds of havoc if they just keep looping and have access nobody in their right mind wants.

dhorthy · 6h ago

was deeply unsettling among other things

ghuntley · 6h ago

It is, isn't it mate? Shit, I stumbled upon Ralph back in February and it shook me to the core.

cogogo · 5h ago

Not that I want to be shaken but what is Ralph? A quick search showed me some marketing tools but that cant be what you are referring to is it?

ghuntley · 5h ago

Ralph is a technique. The stupidest technique possible. Running an agent in a while true loop. https://ghuntley.com/ralph

wrs · 6h ago

I’ve done a few ports like this with Claude Code (but not with a while loop) and it did work amazingly well. The original codebase had a good test suite, so I had it port the test suite first, and gave it some code style guidance up front. Then the agent did remarkably well at doing a straight port from one imperative language to another. Then there’s some purely human work to get it really done — 80-90% done sounds about right.

rogerrogerr · 4h ago

Does anyone else get dull feelings of dread reading this kind of thing? How do you combat it?

shaky-carrousel · 1h ago

By being there when FrontPage was released. This is just the same, all over again.

zdwolfe · 1h ago

Yes, and so far I haven't been able to combat it.

dhorthy · 3h ago

combat how? (And yes, yes I do)

rogerrogerr · 3h ago

Combat the feelings, I guess. Not really sure.

hoppp · 5h ago

I wanted to know how much it cost?

I would be scared to run this without knowing the exact cost.

Its not a good idea to do it without a payment cap for sure, its a new way to wake up with a huge bill the next day.

debazel · 5h ago

They did mention how much they spent here: https://github.com/repomirrorhq/repomirror/blob/main/repomir...

> We spent a little less than $800 on inference for the project. Overall the agents made ~1100 commits across all software projects. Each Sonnet agent costs about $10.50/hour to run overnight.

bckr · 5h ago

$800

MagMueller · 6h ago

I would love to fix my docs with this. I have them in the main browser-use repo. What do you recommend that the agent does never push to main browser-use, but only to its own branch?

dhorthy · 6h ago

Yeah you can easily tweak this to push to a branch or a fork or something in the generated prompt.md

rozab · 2h ago

These people are weird. The blog post that inspired this has this weird iMessage screenshot, like a shitty investment grift facebook ad:

https://ghuntley.com/ralph/

Apparently one of the lucky few who learned this special technique from Geoff just completed a $50k contract for $297. But that's not all! Geoff is generous to share the special secret prompt that unlocked this unbelievable success, if only we subscribe to his newsletter! "This free-for-life offer won't last forever!"

I am sceptical.

Ginger-Pickles · 51m ago

https://archive.ph/goxZg

imiric · 1h ago

I can't tell whether this "technique" is serious or a joke, and/or if it's some elaborate grift.

In any case, the writing style of that entire blog is off-putting. Gibberish from a massive ego.

rkachowski · 6h ago

> In one instance, the agent actually used pkill to terminate itself after realizing it was stuck in an infinite loop.

The alexandrian solution to the halting problem.

kh_hk · 6h ago

I am honestly surprised how we went from almost OCD TDD and type purism, to a "it kinda works" attitude to software.

baq · 5h ago

always has been, the difference is now the 'it compiles, ship it' loop is 10x-100x faster than 2 years ago

beefnugs · 6h ago

"At one point we tried “improving” the prompt with Claude’s help. It ballooned to 1,500 words. The agent immediately got slower and dumber. We went back to 103 words and it was back on track."

Isn't this the exact opposite of every other piece of advice we have gotten in a year?

Another general feedback just recently, someone said we need to generate 10 times, because one out of those will be "worth reviewing"

How can anyone be doing real engineering in such a: pick the exact needle out of the constantly churning chaos-simulation-engine that (crashes least, closest to desire, human readable, random guess)

joshka · 2h ago

One of the big things I think a lot of tooling misses, which Geoffrey touches on is the automated feedback loops built into the tooling. I expect you could probably incorporate generation time and token cost to automatically self tune this over time. Perhaps such things as discovering which prompts and models are best for which tasks automatically instead of manually choosing these things.

You want to go meta-meta? Get ralph to spawn subagents that analyze the process of how feedback and experimentation with techniques works. Perhaps allocate 10% of the time and effort to identifying what's missing that would make the loops more effective (better context, better tooling, better feedback mechanism, better prompts, ...?). Have the tooling help produce actionable ideas for how humans in the loop can effectively help the tooling. Have the tooling produce information and guidelines for how to review the generated code.

I think one of the big things missing in many of the tools currently available is tracking metrics through the entire software development loop. How long does it take to implement a feature. How many mistakes were made? How many errors were caught by tests? How many tokens does it take? And then using this information to automatically self-tune.

dhorthy · 5h ago

Hmm what sorts of advice in the last year are you referring to? Like the “run it ten times and pick the best one” thing? Or something else?

I kind of agree that picking from 10 poorly-promoted projects is dumb.

The engineering is in setting up the engine and verification so one agent can get it right (or 90% right) on a single run (of the infinite ish loop)

jjani · 5h ago

> Hmm what sorts of advice in the last year are you referring to?

They're almost certainly referring to first creating a fleshed out spec and then having it implement that, rather than just 100 words.

mistrial9 · 3h ago

the core might be - the difference between an LLM context window, and an agent's orders in a text. LLM itself is a core engine, running in an environment of some kind (instruct vs others?). Agents on the other hand, are descendants of the old Marvin Minsky stuff in a way.. it has objectives and capacities, at a glance. LLMs are connected to modern agents because input text is read to start the agent.. inner loops are intermediate outputs of LLM, in language. There is no "internal code" to this set of agents, it is speaking in code and text to the next part of the internal process.

There are probably big oversights or errors in that short explanation. The LLM engine, the runner of the engine, and the specifics of some environment, make a lot of overlap and all of it is quite complicated.

hth

thebiglebrewski · 6h ago

The agent terminating its own process was hilarious

ghuntley · 6h ago

It's why I called it Ralph. Because it's just not all there, but for some strange reason it gets 80% of there pretty well. With the right observational skills, you can tune it into 81, then 82, then 83, then 84. But there's always gaps, always holes. It's a lovable approach, a character, just like Ralph Wiggum.

nis0s · 6h ago

Why is this flagged?

cluckindan · 5h ago

Now I want to put one of these in a loop, give it access to some bitcoin, and tell it to come up with a viable strategy to become a billionaire within the next month.

dhorthy · 3h ago

Give it a spin

apwell23 · 6h ago

lame

bn-l · 6h ago

No it did not.

vntok · 4h ago

Do you have more current information than the authors who say it did?

Show HN: Clearcam – Add AI object detection to your IP CCTV cameras (github.com)

Show HN: Game demo made with my homemade game engine (reprobate.site)

Show HN: Bicyclopedia (bicyclopedia.lemoing.ca)

Show HN: I Built a XSLT Blog Framework (vgr.land)

Show HN: A lightweight ML model to predict music emotion - energy, valence, etc. (github.com)

Show HN: Port Kill – A lightweight macOS status bar development port monitor (github.com)

Show HN: Configurable Open Source Audio Spectrum Analyzer (github.com)

Show HN: Komposer, AI image editor where the LLM writes the prompts (komposer.xyz)

Show HN: JavaScript-free (X)HTML Includes (github.com)

Show HN: I was curious about spherical helix, ended up making this visualization (visualrambling.space)

Show HN: Luminal – Open-source, search-based GPU compiler (github.com)

Show HN: Creao – Vibe coding product for founders (creao.ai)

Show HN: I replaced vector databases with Git for AI memory (PoC) (github.com)

Show HN: Pinch – macOS voice translation for real-time conversations (startpinch.com)

Show HN: Publish Markdown – A tool to publish Markdown file in one click (publishmarkdown.com)

Show HN: A "Catalog of Catalogs" for Unified Metadata (github.com)

Show HN: Using Common Lisp from Inside the Browser (turtleware.eu)

Show HN: OS X Mavericks Forever (mavericksforever.com)

Show HN: PlutoPrint – Generate PDFs and PNGs from HTML with Python (github.com)

Show HN: Splice – CAD for Cable Harnesses and Electrical Assemblies (splice-cad.com)

Show HN: AgentState – Lightweight state manager for multi-agent AI workflows (github.com)

Show HN: ChartDB Cloud – Visualize and Share Database Diagrams (app.chartdb.io)

Show HN: brew-cleaner – CLI to bulk uninstall Homebrew formulae and free space (github.com)

Show HN: Run AI models directly in the browser – no server or internet required (private-ai-chat.vercel.app)

Show HN: FeOx – Fast embedded KV store in Rust (github.com)

Show HN: Project management system for Claude Code (github.com)

Show HN: LoadGQL – a CLI for load-testing GraphQL endpoints (apps.devanswers.org)

Show HN: Clyp – Clipboard Manager for Linux (github.com)

Show HN: Python library for fetching/storing/streaming crypto market data (github.com)

Show HN: I built aibanner.co to stop spending hours on marketing banners (aibanner.co)

Show HN: Whispering – Open-source, local-first dictation you can trust (github.com)

Show HN: Anchor Relay – A faster, easier way to get Let's Encrypt certificates (anchor.dev)

Show HN: Nestable.dev – local whiteboard app with nestable canvases, deep links (nestable.dev)

Show HN: I built a toy TPU that can do inference and training on the XOR problem (tinytpu.com)

Show HN: OpenAI/reflect – Physical AI Assistant that illuminates your life (github.com)

Show HN: Typed-arrow – compile‑time Arrow schemas for Rust (github.com)

Show HN: AIMless – a 10 KB single file P2P chat app with zero dependencies (github.com)

Show HN: Changefly ID + Anonymized Identity and Age Verification (changefly.com)

Show HN: Agex – An agent framework that lives in your Python runtime (ashenfad.github.io)

Show HN: Built a stock screener after getting frustrated with expensive tools (dashboard-finance.com)

Show HN: Strudel Flow, a pattern sequencer built with Strudel and React Flow (github.com)

Show HN: Claudable – OpenSource Lovable that runs locally with Claude Code (github.com)

Show HN: Header-only GIF decoder in pure C – no malloc, easy to use

Show HN: Fractional jobs – part-time roles for engineers (fractionaljobs.io)

Show HN: Bizcardz.ai – Custom metal business cards (github.com)

Show HN: I integrated my from-scratch TCP/IP stack into the xv6-riscv OS (github.com)

Show HN: We started building an AI dev tool but it turned into a Sims-style game (youtube.com)

Show HN: Map of YC-Funded Startups (patrik-cihal.github.io)

Show HN: I've made an easy to extend and flexible JavaScript logger (github.com)

Show HN: OverType – A Markdown WYSIWYG editor that's just a textarea

We put a coding agent in a while loop

Comments (67)