Hypertokens: Holographic Associative Memory in Tokenized LLMs

3 liamdgray 8 8/3/2025, 4:00:47 PM arxiv.org ↗

Comments (8)

liamdgray · 1d ago
Abstract: "Large Language Models (LLMs) exhibit remarkable capabilities but suffer from apparent precision loss, reframed here as information spreading. This reframing shifts the problem from computational precision to an information-theoretic communication issue. We address the K:V and V:K memory problem in LLMs by introducing HDRAM (Holographically Defined Random Access Memory), a symbolic memory framework treating transformer latent space as a spread-spectrum channel. Built upon hypertokens, structured symbolic codes integrating classical error-correcting codes (ECC), holographic computing, and quantum-inspired search, HDRAM recovers distributed information through principled despreading. These phase-coherent memory addresses enable efficient key-value operations and Grover-style search in latent space. By combining ECC grammar with compressed sensing and Krylov subspace alignment, HDRAM significantly improves associative retrieval without architectural changes, demonstrating how Classical-Holographic-Quantum-inspired (CHQ) principles can fortify transformer architectures."
liamdgray · 1d ago
I ran across this paper because the recent "subliminal learning" results reminded me of holography. So I asked o4-mini-high to explore potential relationships. It lead me to this. https://chatgpt.com/share/688f863d-1ec0-800f-a0ce-c93b649a45...
aghilmort · 1d ago
author here, lmk if have any questions!
liamdgray · 23h ago
In Section 2.8, you write "full implementation details and extended results are provided in the appendix." Which appendix?

I imagine you may be withholding some of the details until after the conference at which, it seems, you will present this week. I wish you well!

Meanwhile, you may not have intended to nerd-snipe, but that has been the effect for me. Now I have Manus trying to implement the paper for me, because why not? I envision a future in which publishing a conceptual paper often results in working code provided by a reader, a la "stone soup."

aghilmort · 10h ago
1. genesis was innocent / no quantum / etc -- we were just solving K:V and V:K associative memory issues for clients in construction, healthcare, finance, etc.

2. PhD etc was at center of linear algebra, data compression, analog/digital comms, and some adjacent fields, all classical other than quantum dabbling

3. also was aware that decoder-only models only recognize star-free grammars and can only reason in TC^0, along with some other computing limits

4. my goto thesis is "make simplifying assumptions", and so around 2-3 years ago one day said (1) let's just treat any AI like a black box comms channel

5. since can't boost signal power in watts sense, that pretty much leaves error-correcting codes, where most natural thing to do in LLM construct was constant-width prefix-free linear block code (section 3.2 or so in paper)

6. one thing followed another after that -- let's not just use any old token, let's use Unicode PUA, let's not use any old ECC / QEC, let's construct a way to implement very particular codes in 1D prompt

7. which is to say, in the end, it boils down to the following simple thing -- we constrain the prompt to a series of alternating emergent unitary operations in the A-B-A-B... sense of Trotterisation that amounts to projecting an interleaved globally phase coherent spacetime code between sufficiently finite blocks of content tokens, where

A_i: hypertoken codeword, e.g. if using simply lowercase Latin, say a-c,d-h,i-o a valid codeword is adg or ceh etc, and no codeword should be provided, AND each position should have a coprime number of symbols (think 3 qudits)

B_i: next content block -- some finite number of tokens -- in current version this is fixed tho 2nd or 3rd paper gets into how to make this block length ragged / adaptive, similar to UDD / dynamic decoupling, Huffman coding, etc.

8. one key requirement is that some nesting mechanism similar to FFT must be provided -- this can be implicit or explicit and we really save most of that for a second paper currently in draft, that and some other subtleties are likely to much to get into in this abbreviated description

9. Grover's simulation was a subtlety one day when we realized we should simply define a pair of lanes with disjoint sqrt(n) symbols and use that as value-key reverse associative lookup, e.g.,

A1,the quick brown fox,/A1 A2,jumped over the lazy dog,/A2 B1,every good boy,/B1 B2,does fine,/B2

AND by way of KVQ attention it collapses to 1D chain where the next key is also the reverse key of prior value

A1,the quick brown fox,A2,jumped over the lazy dog,B1,every good boy,B2,does fine...

that was also around the time we took a pretty deep breath and started digging on all things quantum

10. that the whole thing works can seem very wait, but why, and it really boils down to following -- if we use sufficiently untrained tokens, they have a random Gaussian frozen initial state. By forcing the prefix-free constant width disjoint symbols on CRT coprime lanes and the rest of the machinery, we essentially force the model to recall and reason over a sufficiently discretized branched lattice with sufficiently high spikes at our hypertoken codewords and this spike is projected onto the value tokens by way of attention.

11. That works because chaining projections onto frozen Gaussians is equivalent to chaining sufficiently orthogonal Krylov evolution with restart, which is equivalent to chaining eigenvector iteration, which is a classic way to do Trotter slicing and coarsening Lagrangians / estimate Hamiltonians. We can also arrive at the same conclusion by realizing this is exactly equivalent to doing a certain asymmetric compressed sensing operation in the RIP/REC sense and whether we look at it via Krylov/Eigen or Compressed Sensing or that our disjoint symbols over disjoint symbols is also a zigzag expander over the raw prompt aka fast mixing, then in all cases we diagonalize the Fisher matrix fairly rapidly. There are some caveats here since this is mere prompt injection / our next natural step is LoRA.

12. Which is a long way of getting around to answering your question on quantum sim -- we should in theory be able to construct a new type of 1D MPO / MPS / TN chain to sim pretty much any quantum circuit. Our current machinery can likely get us to BQP, possibly some parts of BPP, especially if we consider that our codewords can be defined in various ways that specify the equivalent of a quantum annealing schedule (QAOA). That's also a natural next step

13. The other corollary that immediately follows is we should be able to use this machinery to optimize any PLL system, black box or otherwise using this sort of quantum error correction 1D injection, because in all cases, and especially if we lean on the compressed sensing mathematics, we are converting latent phase entropy into source entropy, and we do that about as efficiently as possible in the 1D sense.

14. ONE massive caveat -- it is critical that the model be empirically tested for what we first called entropy tests, and later oracles, and are shifting simply to unitary operators in the following sense -- the width of B_i your token blocks is tied to relevant subtasks for which the model can get correct up to your desired level of fidelity. In the paper update, we speak to various types of tests such as how many Bernoulli trials can the model correctly guess majority heads/tails or recall the heads/tail sequence, or sort a list -- these are relative to high-entropy random inputs, where most models will have a base window of 16-256 tokens for content payload

15. init release will have those token window evals, options for various codeword encodings, prebuilt CRT combos, some key relevant ECCs / QECS to walk the key space, and some other key things so that anyone can use it for enhanced recall, along with some mechanisms or examples on reasoning, where our target is a prompt compiler that exploits these classic and quantum error-correcting principles. One very simplifying and relevant POV on that is indexing grammars were the same trick used to solve register allocation in compilers, we've just built a very proper indexing grammar to solve the quantum register allocation problem.

aghilmort · 11h ago
ya, definitely no nerd sniping intended -- tl;dr is had to jam paper out in middle of Techstars since conference deadlines are what they are.

we're planning go-live NLT Labor Day after conference this coming week along with a much easier to read / parse / full results arXiv pre-print, etc.

more on tech deets & your other questions in next reply.

liamdgray · 13h ago
Is this intended to run on a quantum computer? You mention Grover's search algorithm, for example.
gryfft · 1d ago
Pretty gross snake oil.