Show HN: Eyesite – Experimental website combining computer vision and web design (blog.andykhau.com)

I applaud this effort, however the "Does it work?" section answers the wrong question. Anyone can write a trivial doc compressor and show a graph saying "The compressed version is smaller!"

For this to "work" you need to have a metric that shows that AIs perform as well, or nearly as well, as with the uncompressed documentation on a wide range of tasks.

marv1nnnnn · 31d ago

I totally agreed with your critic. To be honest, it's even hard for myself to evaluate. What I do is select several packages that current LLM failed to handle, which are in the sample folder, `crawl4ai`, `google-genai` and `svelte`. And try some tricky prompt to see if it works. But even that evaluation is hard. LLM could hallucinate. I would say most time it works, but there are always few runs that failed to deliver. I actually prepared a comparison, cursor vs cursor + internet vs cursor + context7 vs cursor + llm-min.txt. But I thought it was stochastic, so I didn't put it here. Will consider add to repo as well

ricardobeat · 31d ago

> But even that evaluation is hard. LLM could hallucinate. I would say most time it works, but there are always few runs that failed to deliver

You can use success rate % over N runs for a set of problems, which is something you can compare to other systems. A separate model does the evaluation. There are existing frameworks like DeepEval that facilitate this.

willvarfar · 31d ago

Dual run.

Run the same questions against a model with the unminified and the minified and show the results side-by-side and see how, in your subjective opinion, they hold up.

eden-u4 · 31d ago

why don't you ask the model about the shrinked system prompt and the original system prompt? in this way you can infer whether the same relevant informations are "stored" in the hidden state of the model.

Or better yet, check directly the hidden state difference between a model feed with the original prompt and one with the shrinked prompt.

This should avoid remove the randomness of the results.

rybosome · 31d ago

To be honest with you, it being stochastic is exactly why you should post it.

Having data is how we learn and build intuition. If your experiments showed that modern LLMs were able to succeed more often when given the llm-min file, then that’s an interesting result even if all that was measured was “did the LLM do the task”.

Such a result would raise a lot of interesting questions and ideas, like about the possibility of SKF increasing the model’s ability to apply new information.

timhigins · 31d ago

> LLM could hallucinate

The job of any context retrieval system is to retrieve the relevant info for the task so the LLM doesn't hallucinate. Maybe build a benchmark based on less-known external libraries with test cases that can check the output is correct (or with a mocking layer to know that the LLM-generated code calls roughly the correct functions).

marv1nnnnn · 31d ago

Thanks for the feedback. This will be my next step. Personally I feel it's hard to design those test cases (by myself)

SparkyMcUnicorn · 31d ago

It's also missing the documentation part. Without additional context, method/type definitions with a short description will only go so far.

Cherry picking a tiny example, this wouldn't capture the fact that cloudflare durable objects can only have one alarm at a time and each set overwrites the old one. The model will happily architect something with a single object, expecting to be able to set a bunch of alarms on it. Maybe I'm wrong and this tool would document it correctly into a description. But this is just a small example.

For much of a framework or library, maybe this works. But I feel like (in order for this to be most effective) the proposed spec possibly needs an update to include little more context.

I hope this matures and works well. And there's nothing stopping me from filling in gaps with additional docs, so I'll be giving it a shot.

klntsky · 31d ago

Shameless plug: I'm working on a public contest website for prompts compressing other prompts.

It will include evaluations and a public scoreboard.

It's not usable rn, but feel free to follow: https://github.com/klntsky/prompt-compression-contest/

enjoylife · 31d ago

Was going to point this out too. One suggestion would be to try this on libraries having recent major semvar bumps. See if the compressed docs do better on the backwards incompatible changes.

rco8786 · 31d ago

Yea I was disappointed to see that they just punted (or opted not to show?) on benchmarks.

gk1 · 31d ago

92% reduction is amazing. I often write product marketing materials for devtool companies and load llms.txt into whatever AI I’m using to get accurate details and even example code snippets. But that instantly adds 60k+ tokens which, at least in Google AI Studio, annoyingly slows things down. I’ll be trying this.

Edit: After a longer look, this needs more polish. In addition to key question raised by someone else about quality, there are signs of rushed work here. For example the critical llm_min_guideline.md file, which tells the LLM how to interpret the compressed version, was lazily copy-pasted from an LLM response without even removing the LLM's commentary:

"You are absolutely right! My apologies. I was focused on refining the detail of each section and overlooked that key change in your pipeline: the Glossary (G) section is no longer part of the final file..."

Doesn't exactly instill confidence.

Really nice idea. I hope you keep going with this as it would be a very useful utility.

marv1nnnnn · 31d ago

Oof, you nailed it. Thanks for the sharp eyes on llm_min_guideline.md. That's a clear sign of me pushing this out too quickly to get feedback on the core concept, and I didn't give the supporting docs the attention they deserve. My bad. Cleaning that up, and generally adding more polish, is a top priority. Really appreciate you taking the time to look deeper and for the encouragement to keep going. It's very helpful!

ricardobeat · 31d ago

Wait, are you also using an LLM to respond on Hacker News?

marv1nnnnn · 31d ago

haha, is it that obvious? I only let LLM polished this one. I am not native speaker and I was trying to be polite ^-^

marci · 31d ago

Damn... I saw your sentence starting with "Wait', and immediately thought "reasonning llm?"

thegeomaster · 31d ago

What is absolutely essential to present here, but is missing, is a rigorous evaluation of task completion effectiveness between an agent using this format vs the original format. It has to be done on a new library which is guaranteed not to be present in the training set.

As it stands, there is nothing demonstrating that this lossy compression doesn't destroy essential information that an LLM would need.

I also have a gut feeling that the average LLM will actually have more trouble with the dense format + the instructions to decode it than a huge human-readable file. Remember, LLMs are trained on internet content, which contains terabytes of textual technical documentation but 0 bytes of this ad-hoc format.

I am happy to be proven wrong on both points (LLMs are also very unpredictable!), but the burden of proof for an extravagant scheme like this lies solely on the author.

marv1nnnnn · 31d ago

Agree, actually this approach isn't even possible without the birth of reasoning LLM. In my test, reasoning LLM perform much better than non-reasoning LLM in interpreting the compressed file. Those LLMs are really good at understanding abstraction.

thegeomaster · 31d ago

My point still stands --- the reasoning tokens being consumed to interpret the abstracted llms.txt could have been used for solving the problem at hand.

Again, I'm not saying the solution doesn't work well (my intuition on LLMs has been wrong enough times), but it would be really helpful/assuring to see some hard data.

ianbicking · 31d ago

This mentions an SKF format for knowledge representation... but looking it up, I'm assuming it was invented just for this project?

Which is fine, but is there a description of the format distinct from this particular use? (I'm playing around with these same knowledge representation and compression ideas but for a different domain, so I'm curious about the ideas behind this format)

marv1nnnnn · 31d ago

the description is here: https://github.com/marv1nnnnn/llm-min.txt/blob/main/sample/c...

eric-burel · 31d ago

You'd want to make this tool domain specific - language, type of docs, perhaps target a specific documentation framework/format that is common and standardized enough. I fail to buy a content-agnostic summarization method, though I recognize it could be better than nothing. Also benchmark or it doesn't exist.

dmos62 · 31d ago

I would really like a benchmark showing that AIs can use this. Just the possibility that AI can understand the compressed format similarly well as the original excites me. How did you come up with the format?

marv1nnnnn · 31d ago

Honestly it's really funny. I have the initial idea, and then brainstorm with gemini 2.5 pro a lot, let it design the system. (And in prompt let it think like Jeff Dean and John Carmack ) But most version fails. Then somehow realized I can't let it design from scratch, I give gemini a structure I think is reasonable and efficient after seeing all those versions, let it polish based on that and it works much better.

dmos62 · 31d ago

That's a pretty cool approach!

infogulch · 31d ago

I wonder how this compares to KBLaM [1], which also has a preprocessing step to prepare a large amount of reference material for direct access by LLMs. One obvious difference is that it has a modified attention mechanism they call "rectangular attention". The paper was posted on HN a few times, but it hasn't generated any discussion yet.

[1]: Introducing KBLaM: Bringing plug-and-play external knowledge to LLMs | https://www.microsoft.com/en-us/research/blog/introducing-kb...

marv1nnnnn · 31d ago

never heard of this one! sounds really interesting

revicon · 31d ago

We've done some experimentation when using Claude Code and taken to just creating a "vendor" folder under our "docs" section of each of our repos and just pull down the readme file for every library we use. Then when I'm prompting Claude to figure something out, I'll remind it to go check "docs/vendor/awesomelib" or whatever and it does a fine job of checking the docs out before it starts formulating an answer.

This has done wonders for improving our results when working with TanStack Start or shadcn/ui or whatever.

I guess there's pieces of this that would be helpful to us, but there's too much setup work for me to mess with it right now, I don't feel like generating a Gemini api key, installing puppeteer, etc.

I already have all the docs pulled down, but reducing the number of tokens used for my LLM to pull up the doc files I'm referencing is interesting.

Is there a command line tool anyone has had luck with that just trims down a .md file but still leaves it in a state that the LLM can understand it?

TheTaytay · 31d ago

I’ve been creating a doc for each of my primary libs (using Claude Code of course). I like your vendor/readme idea. Do you find Claude going and reading more docs if it needs to?

revicon · 31d ago

I usually tell it to go read the relevant doc when I do the initial prompt to it when I start working on something. And sometimes I'll remind it during a conversation if I want to make sure it isn't re-inventing the wheel instead of using a feature that is already there in the lib. I run into that with TanStack Router from time to time as an example.

obviyus · 31d ago

I recently upgraded a project from Remix to React Router 7 but unfortunately all AI assistants still try to "fix" my code with the Remix imports / conventions. I've had to add a bunch of custom rules but that doesn't seem enough.

This seems super useful though. I'll try it out with the RR7 docs and see how well it works.

jsmith99 · 31d ago

I'm also using RR7 and Gemini 2.5 Pro just refused to believe me that I could import Link from react-router. Just ignored my instructions and went down a rabbit hold in copilot agent mode, deeper and deeper, trying every possible package name (none of which were installed). I've now created a copilot instructions file into which I've copied most of the RR7 migration docs.

corytheboyd · 31d ago

FWIW sounds like a great use case for some rules files. I’ve only worked with Cursor and Roo but they both support it.

This of course only works for the “stop recommending X” part of your problem, but maybe something like the project here helps fill in with up-to-date documentation too?

Both Cursor and Roo also support URL context additions, which downloads the page and converts it to a machine readable format to include in context. I throw documentation links into my tasks with that all the time, which works out because I know that I am going to be sanity checking generated code against documentation anyway.

cluckindan · 31d ago

Maybe they have read that RR and Remix are now the same thing.

bbor · 31d ago

Cool idea, if it works! That said, small nit:

> If you've ever used an AI coding assistant (like GitHub Copilot, Cursor, or others powered by Large Language Models - LLMs), you've likely encountered situations where they don't know about the latest updates to programming libraries. This knowledge gap exists because AI models have a "knowledge cutoff" – a point beyond which they haven't learned new information.

This isn’t quite right. LLMs don’t memorize APIs because they aren’t trained to do so in the first place. LLMs are intuitive algorithms; if you want them to (reliably…) follow a finite set of formal rules, then you’re gonna need RAG either way.

fcoury · 31d ago

Does it work with any technical doc? I see the CLI claims it's Python specific?

  > $ llm-min --help
  
  Usage: llm-min [OPTIONS]
  
  Generates LLM context by scraping and summarizing documentation for Python libraries.

dmos62 · 31d ago

There's a sample for svelte in the repo.

fcoury · 31d ago

Guess I missed it. Thank you.

k__ · 31d ago

Pretty cool.

Seems like it could be a nice addition to Aider to supplement repo maps.

claar · 31d ago

This project creates a "compact, machine-optimized format designed for efficient AI parsing rather than human readability" called "Structured Knowledge Format (SKF)".

It's not obvious to me that this is a good idea. LLMs are trained on human-readable text.

The author notes that non-reasoning LLMs struggle with these SKFs. Maybe that's a hint that human readable summaries would perform better? Just a guess.

Or perhaps a vector store?

marv1nnnnn · 31d ago

I think it's really a thing about reasoning model. Non-reasoning model struggles at math too. It's more like a protocol between two math genius, they could communicate with really abstract stuff

ramoz · 31d ago

I've been using Gemini to shorten llms.txt and stick those in my repo.

But what I find best - cloning doc sites directly into my repos in a root context folder. I have bash scripts for managing those, and I instruct Claude how to use them. Context7 I dont like for the same reasons I don't hook up any MCP to Claude Code.

tough · 31d ago

I made a little bun app / script that takes a config.toml file with tasks and downloads the files using fetch or git clone

it aint much but its simple and i control it

xamde · 31d ago

Reading all these comments, it seems we as a community don't yet have an idea how to write effectively for LLMs. E.g., I guess markdown tables are harder to process, since the row-column-value mapping first needs to be decoded. Or is this irrelevant?

theturtle32 · 31d ago

I would love to see an example of a full transcript of the generation process for a small-ish library, including all the instructions given to the LLM, its reasoning steps, and its output, for each step in the generation flow.

theturtle32 · 31d ago

Looks like I found what I was looking for: https://github.com/marv1nnnnn/llm-min.txt/blob/main/sample/s...

Edit: not quite.

akmandev · 31d ago

This is one of those rare ideas that's obvious in hindsight. Instead of trying to stuff full docs into an AI, you give it only what matters. Clear structure, minimal noise, and no magic. Nice!

ricardobeat · 31d ago

I'm a little disappointed. Was excited to try this, and it seemed to work initially. But then I gave it a real website to scrape, and it always hangs after only parsing ~10 out of 50+ pages, before even getting to the compression step.

Then I decided to try and switch to the local mode, and after ~ an hour figuring out how to build a markdown version of the docs I needed, hit the "object has no attribute 'generate_from_text'" error, as someone else also reported [1].

So I cloned the source and started to look around, and the method really doesn't exist, even though it's called from main.py. A comment above it says "Assuming LLMMinGenerator has a method to process raw text" and I immediately feel the waft of vibe coding... this is all a mirage. I saw a long README and assumed it was real, but that was probably written by an LLM as well. Would have been obvious by the 'IntegratedKnowledgeManifest_SKF' and 'GenerationTimestamp' keys in the 'SKF format' definition - the former makes no sense, and neither has any reason to be this verbose when the goal is compression.

marv1nnnnn · 31d ago

I just fixed the local version.. My bad, totally missed it.If you are still interested, could try it again. About the scrping step, this project using crawl4ai to scrape. Suppose the url is https://xxx/yy/, it will only scrape https://xxx/yy/*. You could post it as a github issue, will try to fix it.

esthor · 31d ago

It's valuable to reduce tokens if you can achieve the same results, but as others have pointed out, there's no evidence given to demonstrate that it does that.

It's a vibe-coded project and it definitely feels like a vibe-coded fun hack idea (which there's nothing wrong with having a little nerdfun!).

Some more critical thoughts, in case you are motivated to push this forward:

Biggest critical point: I think there may also be a misunderstanding of llms.txt vs. llms-full.txt proposals, and a conflation of the former with the latter. A "minified" version is likely redundant since the intention of llms.txt (as opposed to llms-full.txt) is:

> `llms.txt` is an index file containing links with brief descriptions of the content. An LLM or agent must follow these links to access detailed information.

> `llms-full.txt` includes all the detailed content directly in a single file, eliminating the need for additional navigation.

> A key consideration when using `llms-full.txt` is its size. For extensive documentation, this file may become too large to fit into an LLM's context window. (https://langchain-ai.github.io/langgraph/llms-txt-overview)

There's also lots of folks working together trying to figure this stuff out. Reach out to them and join their communities!

And if you're up for some more feedback:

- Curated knowledge < stochastic summarization -- One-shot minification for core domain knowledge of a product seems likely less effective and inherently risky compared to a domain expert human "minifying" it (and most professional/technical writing already does a kind of minification via progressive disclosure with introductions, quick start guides, navigation, etc.). If you want to convince people your way is the way, write evals that show it's better than a human expert at achieving agentic outcomes more efficiently and effectively (more reliably leads to successful desired outcomes).

- You introduce a new standard (which feels very vibe-coded...), and it's very specific, rigidly structured, verbose, nested, introduces additional semantic nodes where none existed before (e.g. instead of using the actual names of things, it creates new names and nests the old underneath), and it's generally complex and nested. This could lead to fairly long compute run-times and misses for a back-and-forth trying to understand the larger semantic surface area.

- confusing file types -- .txt AND .md files? Isn't llms.txt already markdown? And in your new llms-min.txt file, the text is actually more like code (highly structured data) and its own DSL.

- unintended (?) prompt injections - e.g., "You are an expert AI Software Engineer." in your examples (https://github.com/marv1nnnnn/llm-min.txt/blob/main/sample/c...)

- Your token reduction just counts the tokens, it's not actually informative about token (and compute) costs for an LLM actually try to use the files. If an LLM is going to have to reason extensively about the file to grok it, token count of the static file is largely meaningless.

- Production-ready agents that would need to have a shorter version of texts are likely already equipped with chunking, text search, semantic grouping, and other tools that they can call when encountering a large corpus without flooding their context window with tokens. These will typically be setup by an agent developer to deliver maximally efficient & effective tokens for their task and capabilities.

Anyway, none of the above was vibe-commented. But since you are vibe-coding this, maybe throw it all at your RooCode agent to develop a plan to address this critical feedback? :D Happy vibing!

marv1nnnnn · 31d ago

Thanks for your comments! Really helpful. I have seen some llms.txt like this: https://docs.agno.com/llms.txt which I don't think will help LLM in real tasks. Some background of this project is I think LLM performs better in abstraction than human do. Those AIME test scores are not joking.So maybe smart LLM don't have to communicate with another LLM in plain text, they have more efficient way to communicate. About the excessive token of LLM reasoning, I find it varies. Gemini 2.5 pro is really an overthinker, but claude 3.7 won't. Finaly, I think most vibe-coding task don't require deep understanding of how a package works. It's more like a information retrieval task, so a lot could be compressed.

Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format (github.com)

Show HN: StellarSnap – Explore NASA APODs, simulate orbits, learn astronomy (stellarsnap.space)

Show HN: How to Read Code (codedump.info)

Show HN: Personalized Wealth Management – Institutional Meets Consumer (fulfilledwealth.co)

Show HN: Container-compose – A Docker-compose like tool for Apple containers (github.com)

Show HN: Meow – An Image File Format I made because PNGs and JPEGs suck for AI (github.com)

Show HN: Hackernews Clone (The 1001st) (news.expatcircle.com)

Show HN: Tikt.com – Remove the "OK" from TikTok URL's to Download as MP3 or MP4 (tikt.com)

Show HN: McWig – A modal, Vim-like text editor written in Go (github.com)

Show HN: Tritium – The Legal IDE in Rust (tritium.legal)

Show HN: Eyesite – Experimental website combining computer vision and web design (blog.andykhau.com)

Show HN: DIY virtual HDMI monitor using "AR" glasses (github.com)

Show HN: Seastar – Build and dependency manager for C/C++ with Cargo's features (github.com)

Show HN: MSDL – A minimal description language and editor for system diagrams (github.com)

Show HN: I Built an Interactive Spreadsheet (reasonyx.com)

Show HN: I wrote a BitTorrent Client from scratch (github.com)

Show HN: gRPSQLite – A SQLite VFS for remote databases via gRPC (github.com)

Show HN: Tattoy – a text-based terminal compositor (tattoy.sh)

Show HN: WildcatDB – A persistent key-value store built for concurrency (github.com)

Show HN: MCP Index – a growing list of open source MCP server (github.com)

Show HN: Using ReARM as Version Manager (docs.rearmhq.com)

Show HN: Most users won't report bugs unless you make it stupidly easy

Show HN: Spark, An advanced 3D Gaussian Splatting renderer for Three.js (sparkjs.dev)

Show HN: Qrkey – Offline private key backup on paper (github.com)

Show HN: A “Course” as an MCP Server (mastra.ai)

Show HN: I made an online Unicode Cuneiform digital clock (oisinmoran.com)

Show HN: Shelly, terminal assistant that translates natural language into shell (github.com)

Show HN: I built a Mac app to restore Dock-click minimize and avoid tiny buttons (idemfactor.gumroad.com)

Show HN: High End Color Quantizer (github.com)

Show HN: I made a 3D printed VTOL drone (tsungxu.com)

Show HN: Tail Lens – Tailwind editor in browser (taillens.io)

Show HN: S3mini – Tiny and fast S3-compatible client, no-deps, edge-ready (github.com)

Show HN: Ikuyo a Travel Planning Web Application (ikuyo.kenrick95.org)

Show HN: The Roman Industrial Revolution that could have been (thelydianstone.com)

Show HN: Tool-Assisted Speedrunning the Boring Parts of Animal Crossing (GCN) (github.com)

Show HN: RomM – An open-source, self-hosted ROM manager and player (github.com)

Show HN: Get nutrition labels for any recipe (YouTube, Website, Text, Image)

Show HN: OllaMan – Intuitive Desktop UI Manager for Ollama AI Models (ollaman.com)

Show HN: Chili3d – A open-source, browser-based 3D CAD application

Show HN: Tapmytab – an open-source, Kanban with rich text editor on Chrome tab (github.com)

Show HN: PyDoll – Async Python scraping engine with native CAPTCHA bypass (github.com)

Show HN: Munal OS: a graphical experimental OS with WASM sandboxing (github.com)

Show HN: Let’s Bend – Open-Source Harmonica Bending Trainer (letsbend.de)

Show HN: ZeroConfigDNLA – Easy to run media server in Python (github.com)

Show HN: MidWord – A Word-Guessing Game (midword.com)

Show HN: Glowstick – type level tensor shapes in stable rust (github.com)

Show HN: GetHooky – a language-agnostic Git hook manager (ezpieco.github.io)

Show HN: I built a cleaner YouTube – no ads, sponsors, or doomscrolling (skipcut.com)

Show HN: Somo – a human friendly alternative to netstat (github.com)

Show HN: Stop It – an iOS app using visual therapy to help break habits (stop-it.app)

Show HN: Min.js style compression of tech docs for LLM context

Comments (52)