Reverse engineering the 386 processor's prefetch queue circuitry

79 todsacerdoti 28 5/10/2025, 4:23:06 PM righto.com ↗

Comments (28)

myself248 · 2h ago
I remember reading about naive circuits like ripple-carry, where a signal has to propagate across the whole width of a register before it's valid. These seem like they'd only work in systems with very slow clocks relative to the logic itself.

In this writeup, something that jumps out at me is the use of the equality bus, and Manchester carry chain, and I'm sure there are more similar tricks to do things quickly.

When did the transition happen? Or were the shortcuts always used, and the naive implementations exist only in textbooks?

kens · 2h ago
Well, the Manchester carry chain dates back to 1959. Even the 6502 uses carry skip too increment the PC. As word sizes became larger and transistors became cheaper, implementations became more complex and optimized. And mainframes have been using these tricks forever.
kens · 4h ago
Author here. I hope you're not tired of the 386... Let me know if you have any questions.
sitkack · 3h ago
I'll never tire of any analysis you do. But if you are taking requests, I'd love two chips.

The AMD 29000 series, a RISC chip with many architectural advances that eventually morphed into the K5.

And the Inmos Transputer, a Forth like chip with built in scheduling and networking, designed to be networked together into large systems.

https://en.wikipedia.org/wiki/AMD_Am29000

https://en.wikipedia.org/wiki/Transputer

kens · 3h ago
Those would be interesting chips to examine, if I ever get through my current projects :-)
Zeetah · 1h ago
If you are doing requests, I'd love to see the M68k series analyzed.
moosedev · 59m ago
Another vote for the 68000 series :)
sitkack · 3h ago
At what number of layers is it difficult to reverse engineer a processor from die photos? I would think at some point, functionality would be too obscured to able to understand the internal operation.

Do they ever put a solid metal top layer?

kens · 3h ago
I've been able to handle the Pentium with 3 metal layers. The trick is that I can remove metal layers to see what is underneath, either chemically or with sanding. Shrinking feature size is a bigger problem since an optical microscope only goes down to about 800 nm.

I haven't seen any chips with a solid metal top layer, since that wouldn't be very useful. Some chips have thick power and ground distribution on the top layer, so the top is essentially solid. Secure chips often cover the top layer with a wire that goes back and forth, so the wire will break if you try to get underneath for probing.

anyfoo · 2h ago
Never, the 386 is way too important.
neuroelectron · 3h ago
Ok, now do 486.
kens · 3h ago
I'm not as interested in the 486; I went stright to the Pentium: https://www.righto.com/2025/03/pentium-multiplier-adder-reve...
guerrilla · 3h ago
I totally agree with your methodology. Stick to the classic leaps.
neuroelectron · 3h ago
Fair enough. But why?
kens · 3h ago
Because I saw a Navajo weaving of a Pentium and wanted to compare the weaving to the real chip: https://www.righto.com/2024/08/pentium-navajo-fairchild-ship...
neuroelectron · 3h ago
I was only joking but I'm glad you have decided to take it seriously.
yukIttEft · 1h ago
When are you going to implement the first electron-level 386 emulator?
siliconunit · 2h ago
very nice analysis! personally I'm a DEC alpha fan.. but I guess that's a too big endeavor.. (or maybe a selected portion?)
kens · 2h ago
So many chips, so little time :-)
lysace · 2h ago
I miss those dramatic performance leaps in the 80s. 10x in 5 years, give or take.

Now we get like 2x in a decade (single core).

zozbot234 · 28m ago
Sure, but that's why single-core performance is just not very interesting these days. Look at GPU's, power-efficient CPU cores, RAM, mass storage - these have all seen very remarkable improvement in this time-frame, albeit not literally 10x.
rasz · 1h ago
There was no performance improvement clock for clock between 286 and 386 when running contemporary 16 bit code https://www.vogons.org/viewtopic.php?t=46350
vnorilo · 57m ago
I wrote blitters in assembly back in those days for my teenager hobby games. When I could actually target the 386 with its dword moves, it felt blisteringly fast. Maybe the 386 didn't run 286 code much faster but I recall the chip being one of the most mind-blowing target machine upgrades I experienced. Much later I recall the FPU-supported quadword copy in 486dx and of course P6 meeting MMX in Pentium II. Good times.
to11mtm · 25m ago
You're 100% right that the 386 had a huge amount of changes that were pivotal in the future of x86 and the ability to write good/fast code.

I think a bigger challenge back then was the lack of software that could take advantage of it. Given the nascent state of the industry, lots of folks wrote for the 'lowest common denominator' and kept it at that (i.e. expense of hardware to test things like changing routines used based on CPU sniffing.)

And even then of course sometimes folks were lazy. One of my (least) favorite examples of this is the PC 'version' (It's not at all the original) of Mega Man 3. On a 486/33 you had the option of it being almost impossible twitchy fast, or dog slow thanks to turbo button. Or, the fun thing where Turbo Pascal compiled apps could start crapping out if CPU was too fast...

Sorry, I digress. the 386 was a seemingly small step that was actually a leap forward. Folks just had to catch up.

lysace · 49m ago
As did I :).

Imagine how it felt going from an 8086 @ 8 MHz to an 80486SX (the cheapo version without FPU) @ 33 MHz. With blazingly fast REP MOVSD over some form of proto local bus Compaq implemented using Tseng Labs ET4000/W32i.

lysace · 1h ago
Ok.

I'm speaking of e.g. the leap between the IBM PC in 1981 and the Compaq 386 five years later.

Or between that and the 486 another five years later or so.

shihabkhanbd · 2h ago
The two extra segment registers could be LDTR and TR, both of which hold a 16-bit selector index from the GDT (technically bit 2 is always zero).
kens · 6m ago
This appears to be a bot reposting comments from an older article on my blog.