It's a shame the article chose to compare solely against AMD CPUs, because AMD and Intel have very different L3 architectures. AMD CPUs have their cores oranised into groups, called a CCX, each of which have their own small L3 cache. For example the Turin-based 9755 has 16 CCXs each with 32MB of L3 cache. Far less cache per core than the mainframe CPU being described. In contrast to this, Intel uses an approach that's a little closer to the Telum II CPU being described - a Granite Rapids AP chip such as 6960P has 432 MB of L3 cache shared between 72 physical cores, each with its own 2MB L2 cache. This is still considerably less cache, but it's not quite as stark a difference as the picture painted by the article.
This doesn't really detract from the overall point - stacking a huge per-core L2 cache and using cross-chip reads to emulate L3 with clever saturation metrics and management is very different to what any x86 CPU I'm aware of has ever done, and I wouldn't be surprised if it works extremely well in practice. It's just that it'd have made a stronger article IMO if it had instead compared dedicated L2 + shared L2 (IBM) against dedicated L2 + shared L3 (intel), instead of dedicated L2 + sharded L3 (amd).
No comments yet
elzbardico · 15m ago
It must be fun being a hardware engineer for IBM mainframes: Cost constraints for your designs can be mostly be left aside, as there's no competition, and your still existing customers have been domesticated to pay you top dollar every upgrade cycle, and frankly, they don't care.
Cycle times are long enough so you can thoroughly refine your design.
Marketing pressures are probably extremely well thought, as anyone working on Mainframe marketing is probably either an ex-engineer or almost an engineer by osmosis.
And the product is different enough from anything else, that you can try novel ideas, but not different enough that your design skills are useless elsewhere, and you can't leverage other's advancement idea.
erk__ · 5m ago
There is also no other place that will just implement conversions between utf formats, compression in hardware or various hashed and other crypto in hardware like they do.
jonathaneunice · 4h ago
Virtual L3 and L4 swinging gigabytes around to keep data at the hot end of the memory-storage hierarchy even post L2 or L3 eviction? Impressive! Exactly the kind of sophisticated optimizations you should build when you have billions of transistors at your disposal. Les Bélády's spirit smiles on.
zozbot234 · 40m ago
Virtual L3 and L4 looks like a bad deal today since SRAM cell scaling has stalled quite badly in recent fabrication nodes. It's quite possible that future chip designs will want to use eDRAM at least for L4 cache if not perhaps also L3, and have smaller low-level caches where "sharing" will not be as useful.
dragontamer · 26m ago
Does it?
Recycling SRAM when it becomes more precious seems like a more optimal strategy rather than having the precious SRAM sit idle on otherwise sleeping cores.
belter · 19m ago
The mainframe in 2025 is absolutely at the edge of technology. For some algorithms in ML where massive GPU parallelism is not a benefit, could even
get a strong comeback.
I got so jealous of some colleagues, once even considered getting into mainframe work. CPU at 5.5 Ghz continuously, (not peak...) massive caches, really, really non stop...
What languages are people still writing mainframe code in? In 2011 working for a prescription rx processor, COBOL was still the name of the game.
rbanffy · 4h ago
There's also lots of Java as well, and IBM is making a big effort on porting existing Unix utilities to z/OS (which is a certified UNIX). With Linux, the choices are the same as with other hardware platforms. I assume you'll find lots of Java and Python running on LinuxONE machines.
Running Linux, from a user's perspective, it feels just like a normal server with a fast CPU and extremely fast IO.
jiggawatts · 3h ago
> extremely fast IO.
I wonder how big a competitive edge that will remain in an era where ordinary cloud VMs can do 10 GB/s to zone-redundant remote storage.
Cthulhu_ · 3h ago
GB/s is one metric, but IOPS and latency are others that I'm assuming are Very Important for the applications that mainframes are being used for today.
FuriouslyAdrift · 2h ago
Latency is much more important than thoughput...
RetroTechie · 2h ago
On-site.
Speed is not the only reason why some org/business would have Big Iron in their closet.
inkyoto · 3h ago
Guaranteed sustained write throughput is a distinguished feature of the mainframe storage.
Whilst cloud platforms are the new mainframe (so to speak), and they have all made great strides in improving the SLA guarantees, storage is still accessed over the network (plus extra moving parts – coordination, consistency etc). They will get there, though.
bob1029 · 3h ago
You can do a lot of damage with some stored procedures. SQL/DB2 capabilities often go overlooked in favor of virtualizing a bunch of Java apps that accomplish effectively the same thing with 100x the resource use.
exabrial · 3h ago
Hah, anecdote incoming, but 100x times a resource usage is probably accurate. Given, 100x of a human hair is still just a minuscule grain of sand, but that’s the scale margins Mainframe operators work in.
As one grey beard said it to me: Java is loosely typed and dynamic compared to colbol/db2/pl-sql. He was particularly annoyed that the smallest numerical type a ‘byte’ in Java was quote: “A waste of bits” and that Java was full of “useless bounds checking” both of which were causing “performance regressions”.
The way mainframe programs are written is: the entire thing is statically typed.
thechao · 1h ago
When I was being taught assembly at Intel, one of the graybeards told me that the greatest waste of an integer was to use it for a "bare" add, when it was a perfectly acceptable 64-wide vector AND. To belabor the the point: he used ADD for the "unusual set of XORs, ANDs, and other funky operations it provided across lanes". Odd dude.
dragontamer · 22m ago
Reverse engineering the mindset....
In the 90s, a cryptography paper was published that more quickly brute forced DES (standard encryption algorithm back then) using SIMD across 64-bit registers on a DEC Alpha.
There is also the 80s Connection Machine which was a 1-bit SIMD x 4096-lane supercomputer.
---------------
It sounds like this guy read a few 80s or 90s papers and then got stuck in that unusual style of programming. But there were famous programs back then that worked off of 1-bit SIMD x 64 lanes or x4096 lanes.
By the 00s, computers have already moved on to new patterns (and this obscure methodology was never mainstream). Still, I can imagine that if a student read a specific set of papers in those decades... This kind of mindset would get stuck.
uticus · 1h ago
> ...it was a perfectly acceptable 64-wide vector AND.
sounds like "don't try to out-optimize the compiler."
thechao · 41m ago
In 2025, for sure. In 2009 ... maybe? Of course, he had become set in his ways in the 80s and 90s.
PaulHoule · 2h ago
I knew mainframe programmers were writing a lot of assembly in the 1980s and they probably still are.
BugheadTorpeda6 · 59m ago
For applications or middlewares and systems and utilities?
For applications, COBOL is king, closely followed by Java for stuff that needs web interfaces. For middleware and systems and utilities etc, assembly, C, C++, REXX, Shell, and probably there is still some PL/X going on too but I'm not sure. You'd have to ask somebody working on the products (like Db2) that famously used PL/X. I'm pretty sure a public compiler was never released for PL/X so only IBM and possibly Broadcom have access to use it.
COBOL is best thought of as a domain specific language. It's great at what it does, but the use cases are limited you would be crazy to write an OS in it.
pjmlp · 3h ago
RPG, COBOL, PL/I, NEWP are the most used ones. Unisys also has their own Pascal dialect.
Other than that, there are Java, C, C++ implementations for mainframes, for a while IBM even had a JVM implementation for IBM i (AS/400), that would translate JVM bytecodes into IBM i ones.
Additionally all of them have POSIX environments, think WSL like but for mainframes, here anything that goes into AIX, or a few selected enterprise distros like Red-Hat and SuSE.
BugheadTorpeda6 · 56m ago
It sounds like you are referring to AS/400 and successors (common mistake, no biggie) rather than the mainframes being referred to here that are successors of System 360 and use the Telum chips (as far as I am aware they have never been based on POWER, like IBM i and AIX and the rest of the AS/400 heritage). RPG was never a big thing on mainframes for instance. I've never come across it in about 10 years of working on them professionally. Same with NEWP, I've never heard of it. And Java is pretty important on the platform these days and not an attempt from the past. It's been pushed pretty hard for at least 20 years and is kept well up to date with newer Java standards.
Additionally, the Unix on z/OS is not like WSL. There is no virtualization. The POSIX APIs are implemented as privileged system services with Program Calls (kind of like supervisor calls/system calls). It's more akin to a "flavor" kinda like old school Windows and OS/2 than the modern WSL. You can interface with the system in the old school MVS flavor and use those APIs, or use the POSIX APIs, and they are meant to work together (for instance, the TCPIP stack and web servers on the platform are implemented with the POSIX APIs, for obvious compatibility and porting reasons).
Of course, you can run Linux on mainframes and that is big too, but usually when people refer to mainframe Unix they are talking about how z/OS is technically a Unix, which I don't think it would count in the same way if it was just running a Unix environment under a virtualization layer. Windows can do that and it's not a Unix.
specialist · 1h ago
Most impressive.
I would enjoy an ELI5 for the market differences between commodity chips and these mainframe grade CPUs. Stuff like design, process, and supply chain, anything of interest to a general (nerd) audience.
IBM sells 100s of Z mainframes per year, right? Each can have a bunch of CPUs, right? So Samsung is producing 1,000s of Telums per year? That seems incredible.
Given such low volumes, that's a lot more verification and validation, right?
Foundaries have to keep running to be viable, right? So does Samsung bang out all the Telums for a year in one burst, then switch to something else? Or do they keep producing a steady trickle?
Not that this info would change my daily work or life in any way. I'm just curious.
TIA.
detaro · 41m ago
It's something they'll run a batch for occasionally, but thats normal. Fabs are not like one long conveyor belt where a wafer goes in at the front, passes through a long sequence of machines and falls out finished at the end. They work in batches, and machines need reconfiguring for the next task all the time (if a chip needs 20 layers, they don't have every machine 20 times), so mixing different products and having products is normal. Low-volume products are going to be more expensive of course due to per-batch setup tasks being spread over fewer wafers.
In general scheduling of machine time and transportation of FOUPs (transport boxes for a number of wafers, the basic unit of processing) is a big topic in the industry, optimizing machine usage while keeping the number of unfinished wafers "in flight" and the overall cycle time low. It takes weeks for a wafer to flow through the fab.
bob1029 · 35m ago
It is non-trivial to swap between product designs in a fab. It can take many lots before you have statistical process controls dialed in to the point where yields begin to pick up. Prior performance is not indicative of future performance.
bell-cot · 1h ago
Interesting to compare this to ZFS's ARC / MFUvsMRU / Ghost / L2ARC / etc. strategy for (disk) caching. IIR, those were mostly IBM-developed technologies.
This doesn't really detract from the overall point - stacking a huge per-core L2 cache and using cross-chip reads to emulate L3 with clever saturation metrics and management is very different to what any x86 CPU I'm aware of has ever done, and I wouldn't be surprised if it works extremely well in practice. It's just that it'd have made a stronger article IMO if it had instead compared dedicated L2 + shared L2 (IBM) against dedicated L2 + shared L3 (intel), instead of dedicated L2 + sharded L3 (amd).
No comments yet
Cycle times are long enough so you can thoroughly refine your design.
Marketing pressures are probably extremely well thought, as anyone working on Mainframe marketing is probably either an ex-engineer or almost an engineer by osmosis.
And the product is different enough from anything else, that you can try novel ideas, but not different enough that your design skills are useless elsewhere, and you can't leverage other's advancement idea.
Recycling SRAM when it becomes more precious seems like a more optimal strategy rather than having the precious SRAM sit idle on otherwise sleeping cores.
I got so jealous of some colleagues, once even considered getting into mainframe work. CPU at 5.5 Ghz continuously, (not peak...) massive caches, really, really non stop...
Look at this tech porn: "IBM z17 Technical Introduction" - https://www.redbooks.ibm.com/redbooks/pdfs/sg248580.pdf
Running Linux, from a user's perspective, it feels just like a normal server with a fast CPU and extremely fast IO.
I wonder how big a competitive edge that will remain in an era where ordinary cloud VMs can do 10 GB/s to zone-redundant remote storage.
Speed is not the only reason why some org/business would have Big Iron in their closet.
Whilst cloud platforms are the new mainframe (so to speak), and they have all made great strides in improving the SLA guarantees, storage is still accessed over the network (plus extra moving parts – coordination, consistency etc). They will get there, though.
As one grey beard said it to me: Java is loosely typed and dynamic compared to colbol/db2/pl-sql. He was particularly annoyed that the smallest numerical type a ‘byte’ in Java was quote: “A waste of bits” and that Java was full of “useless bounds checking” both of which were causing “performance regressions”.
The way mainframe programs are written is: the entire thing is statically typed.
In the 90s, a cryptography paper was published that more quickly brute forced DES (standard encryption algorithm back then) using SIMD across 64-bit registers on a DEC Alpha.
There is also the 80s Connection Machine which was a 1-bit SIMD x 4096-lane supercomputer.
---------------
It sounds like this guy read a few 80s or 90s papers and then got stuck in that unusual style of programming. But there were famous programs back then that worked off of 1-bit SIMD x 64 lanes or x4096 lanes.
By the 00s, computers have already moved on to new patterns (and this obscure methodology was never mainstream). Still, I can imagine that if a student read a specific set of papers in those decades... This kind of mindset would get stuck.
sounds like "don't try to out-optimize the compiler."
For applications, COBOL is king, closely followed by Java for stuff that needs web interfaces. For middleware and systems and utilities etc, assembly, C, C++, REXX, Shell, and probably there is still some PL/X going on too but I'm not sure. You'd have to ask somebody working on the products (like Db2) that famously used PL/X. I'm pretty sure a public compiler was never released for PL/X so only IBM and possibly Broadcom have access to use it.
COBOL is best thought of as a domain specific language. It's great at what it does, but the use cases are limited you would be crazy to write an OS in it.
Other than that, there are Java, C, C++ implementations for mainframes, for a while IBM even had a JVM implementation for IBM i (AS/400), that would translate JVM bytecodes into IBM i ones.
Additionally all of them have POSIX environments, think WSL like but for mainframes, here anything that goes into AIX, or a few selected enterprise distros like Red-Hat and SuSE.
Additionally, the Unix on z/OS is not like WSL. There is no virtualization. The POSIX APIs are implemented as privileged system services with Program Calls (kind of like supervisor calls/system calls). It's more akin to a "flavor" kinda like old school Windows and OS/2 than the modern WSL. You can interface with the system in the old school MVS flavor and use those APIs, or use the POSIX APIs, and they are meant to work together (for instance, the TCPIP stack and web servers on the platform are implemented with the POSIX APIs, for obvious compatibility and porting reasons).
Of course, you can run Linux on mainframes and that is big too, but usually when people refer to mainframe Unix they are talking about how z/OS is technically a Unix, which I don't think it would count in the same way if it was just running a Unix environment under a virtualization layer. Windows can do that and it's not a Unix.
I would enjoy an ELI5 for the market differences between commodity chips and these mainframe grade CPUs. Stuff like design, process, and supply chain, anything of interest to a general (nerd) audience.
IBM sells 100s of Z mainframes per year, right? Each can have a bunch of CPUs, right? So Samsung is producing 1,000s of Telums per year? That seems incredible.
Given such low volumes, that's a lot more verification and validation, right?
Foundaries have to keep running to be viable, right? So does Samsung bang out all the Telums for a year in one burst, then switch to something else? Or do they keep producing a steady trickle?
Not that this info would change my daily work or life in any way. I'm just curious.
TIA.
In general scheduling of machine time and transportation of FOUPs (transport boxes for a number of wafers, the basic unit of processing) is a big topic in the industry, optimizing machine usage while keeping the number of unfinished wafers "in flight" and the overall cycle time low. It takes weeks for a wafer to flow through the fab.