The primary reason for making chips this big at present, is to compute LLMs. Why have separate RAM in an LLM compute chip? It doesn't matter how wide you make the bus, it'll always be a bottleneck, and source of huge inefficiency.
For example, when Von Neumann got ahold of the ENIAC, he slowed it down by more than 60%. This is because it destroyed the inherent parallelism of the original hardware design.
For example, when Von Neumann got ahold of the ENIAC, he slowed it down by more than 60%. This is because it destroyed the inherent parallelism of the original hardware design.