Show HN: Yet another memory system for LLMs
160 blackmanta 43 8/14/2025, 3:34:11 AM github.com ↗
Built this for my LLM workflows - needed searchable, persistent memory that wouldn't blow up storage costs. I also wanted to use it locally for my research. It's a content-addressed storage system with block-level deduplication (saves 30-40% on typical codebases). I have integrated the CLI tool into most of my workflows in Zed, Claude Code, and Cursor, and I provide the prompt I'm currently using in the repo.
The project is in C++ and the build system is rough around the edges but is tested on macOS and Ubuntu 24.04.
I am also trying to stabilize PDF text extraction to improve knowledge retrieval when I want to revisit a paper I read but cannot remember which one it was. Most of these use cases come from my personal use and updates to the tool but I am trying to make it as general as possible.
I am observing in my professional (non-Claude Max) life that context is a real limiter, from both the “too much is confusing the agent” and “I’m hitting limits doing basic shit” perspectives (looking at you, Bedrock and Github), and having a tool that will help me give an agent only what it needs would be really valuable. I could do more with the tools, spend less time trying to manually intervene, and spend less of my token budget.
I will attempt to run some small agents with custom prompts and report back.
How is savings of 40% on a typical codebase possible with block-level deduplication? What kind of blocks are you talking about? Blocks as in the filesystem?
(seems like there's some vague future plans for models like all-MiniLM-L6-v2, all-mpnet-base-v2)
https://github.com/jerpint/context-llemur
Although I developed it explicitly without search, and catered it to the latest agents which are all really good at searching and reading files. Instead you and LLMs cater your context to be easily searchable (folders and files). It’s meant for dev workflows (i.e a projects context, a user context)
I made a video showing how easy it is to pull in context to whatever IDE/desktop app/CLI tool you use
https://m.youtube.com/watch?v=DgqlUpnC3uw
Most “memory” layers I’ve seen for AI are either overly complex or end up ballooning storage costs over time, so a content-addressed approach makes a lot of sense.
Also curious — have you benchmarked retrieval speed compared to more traditional vector DB setups? That could be a big selling point for devs running local research workflow
I see stuff like this, and I really have to wonder if people just write software with bloat for the sake of using a particular library.
I can say one the the nice thing about Boost network implementation (ASIO) is fairly mature asychronous framework using a variety of techniques. Also if you need HTTP or Websockets you can use Beast which is built on top of ASIO.
And if you're using one thing from Boost, its easy to just use everything else you need and that Boost provides to minimize dependencies.
It’s an optional component.
What do you want the OP to do?
MCP may not be strictly necessary but it’s straight in line with the intent of the library.
Are you going to take shots at llama.cpp for having an http server and a template library next?
Come on. This uses conan, it has a decent cmake file. The code is ok.
This is pretty good work. Dont be a dick. (Yeah, ill eat the down votes, it deserves to be said)
Although that download is a monster, I think its like 1.6 GB even compressed. Its not modular at all, some of the modules depend on others and its impossible to separate them out (they've tried in the past)
But last I check there is ALOT they could have removed, especially support for older compilers like MSVC 200x (!), pre C++ 11/older GNU compilers, etc. without compromising functionality. I'm not if they got around to doing that.