Show HN: TXT OS – Open-Source AI Reasoning, One Plain-Text File at a Time
I'm excited to share TXT OS — an open-source AI reasoning engine that runs entirely inside a single `.txt` file.
- No installs, no signup, no hidden code — just copy-paste the file into any LLM chat window (GPT, Claude, Gemini, etc.). - +22.4% semantic accuracy, +42.1% reasoning success, and 3.6× more stability (benchmarked on GSM8K and Truthful-QA). - Features Semantic Tree Memory, Hallucination Shield, and fully exportable logic. - MIT Licensed, zero tracking, zero ads.
Why did I build this? I wanted to prove that advanced reasoning and memory could be made open, portable, and accessible to anyone — just with pure text, no software or setup.
A note: I'm from China, and English is not my first language. This post and the docs were partly assisted by AI, but I personally reviewed and approved every line of content. All ideas, design, and code are my own work. If anything is unclear or could be improved, I really welcome your feedback!
I'm the author, and happy to answer any questions or suggestions here!
1. How does TXT OS store its “Semantic Tree Memory” between sessions? 2. When `kbtest` detects a hallucination, what happens next? 3. Any idea of the speed impact on smaller models like LLaMA-2-13B?
Thanks for sharing—excited to try it out!
We actually serialize the tree as a compact JSON-like structure right in the TXT file—each node gets a header like #NODE:id and indented subtrees. When you reload, TXT OS parses those markers back into your LLM’s memory map. No external DB needed—just plain text you can copy-paste between sessions.
--- When kbtest Fires
Internally it tracks our ΔS metric (semantic tension). Once ΔS crosses a preset threshold, kbtest prints a warning and automatically rolls you back to the last “safe” tree checkpoint. That means you lose only the bad branch, not your entire session. Think of it like an undo button for hallucinations.
--- Performance on LLaMA-2-13B
Benchmarks were on GPT-4, but on a 13B model you’ll see roughly a 10–15% token-generation slow-down thanks to the extra parsing and boundary checks. In practice that’s about +2 ms per token, which most folks find an acceptable trade-off for the added stability.
Hope that clears things up—let me know if you hit any weird edge cases!
does TXT OS work equally well with open-source models, or is it optimized more for models like GPT-4 or Claude?
I've actually tested TXT OS with about 10 different AIs already—you can check out the full rundown on my repo. Generally, ChatGPT, Grok, Claude, and Perplexity gave the smoothest and best experience. The others still work fine, but some, like Gemini, have minor quirks (Gemini randomly adds a weird parameter during initial setup, but it sorts itself out after the first step).
So, long story short, if you want a hassle-free experience, go with ChatGPT, Grok, Claude, or Perplexity!
Each formula plays a role in making the LLM more stable, coherent, and logically self-aware:
• = I - G + mc² defines semantic residue — how far the current output strays from meaning. • BigBig(G) recombines context & error to steer output back toward intent. • BBCR detects collapse and triggers reset → rebirth (like fail-safe logic). • BBAM models attention decay — restoring continuity over multiple steps.
Together, this makes the LLM act less like autocomplete… and more like a self-guided reasoner.