Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs

49 cpldcpu 5 5/4/2025, 11:35:30 PM arxiv.org ↗

Comments (5)

robwwilliams · 1h ago
This is just mind-bendingly weird and wonderfully creative. It can pay to work in the weeds! Bravo.
Bolwin · 1h ago
They're doing matrix operations in the Dram itself? That sounds insane and also fascinating
nkurz · 1h ago
Yup, and incredibly they are able to do this on standard RAM by "intentionally violating the timing parameters":

Processing-Using-DRAM (PUD) leverages the inherent analog operational characteristics of DRAM to enable highly parallel bit-serial computations directly within memory arrays. Prior research has demonstrated that commercial off-the-shelf DRAM can achieve PUD functionality without hardware modifications by intentionally violating the timing parameters.

These studies have established two fundamental PUD operations: RowCopy and majority-of-X (MAJX) (Fig. 1). The RowCopy operation facilitates data movement between different rows within a subarray by issuing a PRE command followed immediately by an ACT command before bitline precharging completes, enabling data transfer through the bitlines. This operation affects all cells along a row simultaneously, making it approximately 100 times faster than processor-mediated data movement. The MAJX operation performs a majority vote among X cells sharing the same bitline that are activated simultaneously, implemented in commercial DRAM by issuing ACT, PRE, and ACT commands in rapid succession without delays. This allows concurrent activation of 2∼32 rows. MAJX enables bit-serial computations that leverage the parallelism of subarrays with 65,536 columns, serving as the fundamental computational unit for PUD.

nayuki · 57m ago
This kind of low-level protocol manipulation of DRAM has some similarities to rowhammer attacks.
summarity · 1h ago
Getting LLM inference running on any thing is going to be the next “it runs Doom”