The primary reason for making chips this big at present, is to compute LLMs. Why have separate RAM in an LLM compute chip? It doesn't matter how wide you make the bus, it'll always be a bottleneck, and source of huge inefficiency.
The Von Neumann model of compute was great back when setup of ENIAC took days, and the run-times were shorter, but that's not the case with silicon ASICS and FPGAs.
For example, when Von Neumann got ahold of the ENIAC, he slowed it down by more than 60%. This is because it destroyed the inherent parallelism of the original hardware design.
It's time to back out of this premature optimization rabbit hole.
The Von Neumann model of compute was great back when setup of ENIAC took days, and the run-times were shorter, but that's not the case with silicon ASICS and FPGAs.
For example, when Von Neumann got ahold of the ENIAC, he slowed it down by more than 60%. This is because it destroyed the inherent parallelism of the original hardware design.
It's time to back out of this premature optimization rabbit hole.