Andrej Karpathy: Software in the era of AI [video] (youtube.com)

Hey HN, Gabe and Alexander here from Hatchet. Today we're releasing Pickaxe, a Typescript library to build AI agents which are scalable and fault-tolerant.

Here's a demo: https://github.com/user-attachments/assets/b28fc406-f501-442...

Pickaxe provides a simple set of primitives for building agents which can automatically checkpoint their state and suspend or resume processing (also known as durable execution) while waiting for external events (like a human in the loop). The library is based on common patterns we've seen when helping Hatchet users run millions of agent executions per day.

Unlike other tools, Pickaxe is not a framework. It does not have any opinions or abstractions for implementing agent memory, prompting, context, or calling LLMs directly. Its only focus is making AI agents more observable and reliable.

As agents start to scale, there are generally three big problems that emerge: 1. Agents are long-running compared to other parts of your application. Extremely long-running processes are tricky because deploying new infra or hitting request timeouts on serverless runtimes will interrupt their execution. 2. They are stateful: they generally store internal state which governs the next step in the execution path 3. They require access to lots of fresh data, which can either be queried during agent execution or needs to be continuously refreshed from a data source.

(These problems are more specific to agents which execute remotely -- locally running agents generally don't have these problems)

Pickaxe is designed to solve these issues by providing a simple API which wraps durable execution infrastructure for agents. Durable execution is a way of automatically checkpointing the state of a process, so that if the process fails, it can automatically be replayed from the checkpoint, rather than starting over from the beginning. This model is also particularly useful when your agent needs to wait for an external event or human review in order to continue execution. To support this pattern, Pickaxe uses a Hatchet feature called `waitFor` which durably registers a listener for an event, which means that even if the agent isn't actively listening for the event, it is guaranteed to be processed by Hatchet and stored in the execution history and resume processing. This infrastructure is powered by what is essentially a linear event log, which stores the entire execution history of an agent in a Postgres database managed by Hatchet.

Full docs are here: https://pickaxe.hatchet.run/

We'd greatly appreciate any feedback you have and hope you get the chance to try out Pickaxe.

Comments (6)

almosthere · 1d ago

What I really like about it, is that this kind of project helps people learn what an agent is.

abelanger · 1d ago

Thanks! Our favorite resources on this (both have been posted on HN a few times):

- https://www.anthropic.com/engineering/building-effective-age...

- https://github.com/humanlayer/12-factor-agents

That's also why we implemented pretty much all relevant patterns in the docs (i.e. https://pickaxe.hatchet.run/patterns/prompt-chaining).

If there's an example or pattern that you'd like to see, let me know and we can get it released.

randomcatuser · 1d ago

Oh this is really cool! I was building out a bit of this with Restate this past week, but this seems really well put together :) will give it a try!

abelanger · 1d ago

Thanks! Would love to hear more about what type of agent you're building.

We've heard pretty often that durable execution is difficult to wrap your head around, and we've also seen more of our users (including experienced engineers) relying on Cursor and Claude Code while building. So one of the experiments we've been running is ensuring that the agent code is durable when written by LLMs by using our MCP server so the agents can follow best practices while generating code: https://pickaxe.hatchet.run/development/developing-agents#pi...

Our MCP server is super lightweight and basically just tells the LLM to read the docs here: https://pickaxe.hatchet.run/mcp/mcp-instructions.md (along with some tool calls for scaffolding)

I have no idea if this is useful or not, but we were able to get Claude to generate complex agents which were written with durable execution best practices (no side effects or non-determinism between retries), which we viewed as a good sign.

zegl · 1d ago

As a long time Hatchet user, I understand why you’ve created this library, but it also disappoints me a little bit. I wish more engineering time was spent on making the core platform more stable and performant.

abelanger · 23h ago

Definitely understand the frustration, the difficulty of Hatchet being general-purpose is that being performant for every use-case can be tricky, particularly when combining many features (concurrency, rate limiting, priority queueing, retries with backoff, etc). We should be more transparent about which combinations of use-cases we're focused on optimizing.

We spent a long time optimizing the single-task FIFO use-case, which is what we typically benchmark against. Performance for that pattern is i/o-bound at > 10k/s which is a good sign (just need better disks). So a pure durable-execution workload should run very performantly.

We're focused on improving multi-task and concurrency use-cases now. Our benchmarking setup recently added support for those patterns. More on this soon!

Andrej Karpathy: Software in the era of AI [video] (youtube.com)

Honda conducts successful launch and landing of experimental reusable rocket (global.honda)

The Grug Brained Developer (2022) (grugbrain.dev)

Working on databases from prison (turso.tech)

YouTube's new anti-adblock measures (iter.ca)

Show HN: Workout.cool – Open-source fitness coaching platform (github.com)

WhatsApp introduces ads in its app (nytimes.com)

Show HN: Unregistry – “docker push” directly to servers without a registry (github.com)

Resurrecting a dead torrent tracker and finding 3M peers (kianbradley.com)

Samsung embeds IronSource spyware app on phones across WANA (smex.org)

Start your own Internet Resiliency Club (bowshock.nl)

Why SSL was renamed to TLS in late 90s (2014) (tim.dierks.org)

Building Effective AI Agents (anthropic.com)

Harper – an open-source alternative to Grammarly (writewithharper.com)

Phoenix.new – Remote AI Runtime for Phoenix (fly.io)

New US visa rules will force foreign students to unlock social media profiles (theguardian.com)

The Zed Debugger Is Here (zed.dev)

Scrappy – Make little apps for you and your friends (pontus.granstrom.me)

Hurl: Run and test HTTP requests with plain text (github.com)

My iPhone 8 Refuses to Die: Now It's a Solar-Powered Vision OCR Server (terminalbytes.com)

Fossify – A suite of open-source, ad-free apps (github.com)

Generative AI coding tools and agents do not work for me (blog.miguelgrinberg.com)

Show HN: Chawan TUI web browser (chawan.net)

Show HN: I wrote a new BitTorrent tracker in Elixir (github.com)

Accumulation of cognitive debt when using an AI assistant for essay writing task (arxiv.org)

Microsoft suspended the email account of an ICC prosecutor at The Hague (nytimes.com)

Making 2.5 Flash and 2.5 Pro GA, and introducing Gemini 2.5 Flash-Lite (blog.google)

Nanonets-OCR-s – OCR model that transforms documents into structured markdown (huggingface.co)

How to modify Starlink Mini to run without the built-in WiFi router (olegkutkov.me)

Now might be the best time to learn software development (substack.com)

Websites are tracking you via browser fingerprinting (engineering.tamu.edu)

Iran asks its people to delete WhatsApp from their devices (apnews.com)

What happens when clergy take psilocybin (nautil.us)

MiniMax-M1 open-weight, large-scale hybrid-attention reasoning model (github.com)

Bzip2 crate switches from C to 100% Rust (trifectatech.org)

Brad Lander detained by masked federal agents inside immigration court (thecity.nyc)

Bento: A Steam Deck in a Keyboard (github.com)

Snorting the AGI with Claude Code (kadekillary.work)

Canyon.mid (canyonmid.com)

Datalog in Rust (github.com)

Show HN: Canine – A Heroku alternative built on Kubernetes (github.com)

OpenAI wins $200M U.S. defense contract (cnbc.com)

Guess I'm a rationalist now (scottaaronson.blog)

Compiling LLMs into a MegaKernel: A path to low-latency inference (zhihaojia.medium.com)

Childhood leukemia: how a deadly cancer became treatable (ourworldindata.org)

Is gravity just entropy rising? Long-shot idea gets another look (quantamagazine.org)

Modifying an HDMI dummy plug's EDID using a Raspberry Pi (downtowndougbrown.com)

AbsenceBench: Language models can't tell what's missing (arxiv.org)

TI to invest $60B to manufacture foundational semiconductors in the U.S. (ti.com)

Poline – An enigmatic color palette generator using polar coordinates (meodai.github.io)

Show HN: Pickaxe – a TypeScript library for building AI agents

Comments (6)