This is an entirely uncontroversial take among experts in the space. x86 is an old CISC-y hot mess. RISC-V is a new-school hyper-academic hot mess. Recent ARM is actually pretty good. And none of it matters, because the uncore and the fabrication details (in particular, whether things have been tuned to run full speed demon or full power sipper) completely dominate the ISA.
In the past x86 didn't dominate in low power because Intel had the resources to care but never did, and AMD never had the resources to try. Other companies stepped in to full that niche, and had to use other ISAs. (If they could have used x86 legally, they might well have done so. Oops?) That may well be changing. Or perhaps AMD will let x86 fade away.
Basically the gist of it is that the difference between ARM/x86 mostly boils down to instruction decode, and:
- Most instructions end up being simple load/store/conditional branch etc. on both architectures, where there's literally no difference in encoding efficiency
- Variable length instruction has pretty much been figured out on x86 that it's no longer a bottleneck
Also my personal addendum is that today's Intel efficiency cores are have more transistors and better perf than the big Intel cores of a decade ago
fanf2 · 24m ago
Apple’s ARM cores have wider decode than x86
M1 - 8 wide
M4 - 10 wide
Zen 4 - 4 wide
Zen 5 - 8 wide
mort96 · 3h ago
This matches my understanding as well, as someone who has a great deal of interest in the field but never worked in it professionally. CPUs all have a microarchitecture that doesn't look like the ISA at all, and they have an instruction decoder that translates ISA one or more ISA instructions into zero or more microarchitectural instructions. There are some advantages to having a more regular ISA, such as the ability to more easily decode multiple instructions in parallel if they're all the same size or having to spend fewer transistors on the instruction decoder, but for the big superscalar chips we all have in our desktops and laptops and phones, the drawbacks are tiny.
I imagine that the difference is much greater for the tiny in-order CPUs we find in MCUs though, just because an amd64 decoder would be a comparatively much larger fraction of the transistor budget
whynotminot · 3h ago
An annoying thing people have done since Apple Silicon is claim that its advantages were due to Arm.
No, not really. The advantage is Apple prioritizing efficiency, something Intel never cared enough about.
wlesieutre · 3h ago
VIA used to make low power x86 processors
yndoendo · 3h ago
Fun fact. The idea of strong national security is the reason why there are three companies with access to the x86 ISA.
DoD originally required all products to be sourced by at least three companies to prevent supply chain issues. This required Intel to allow AMD and VIA to produce products based on ISA.
For me this is good indicator if someone that talks about good national security knows what they are talking about or are just spewing bullshit and playing national security theatre.
pavlov · 3h ago
And Transmeta…
mananaysiempre · 3h ago
> x86 didn't dominate in low power because Intel had the resources to care but never did
Remember Atom tablets (and how they sucked)?
wmf · 3h ago
That's the point. Early Atom wasn't designed with care but the newer E-cores are quite efficient because they put more effort in.
Findecanor · 3h ago
You mean Atom tablets running Android ?
I have a ten-year old Lenovo Yoga Tab 2 8" Windows tablet, which I still use at least once every week. It is still useful. Who can say that they are still using a ten-year old Android tablet?
No comments yet
mmis1000 · 3h ago
I have tried one before. And surprisingly, It did not suck as most people claimed to be. I can even do light gaming (warframe) on it with reasonable frame rate. (It's about 2015 ~ 2020 era). So it probably depends on manufacturer (or use case though)
(Also probably due to it is a tablet, so it have a reasonable fast storage instead of hdds like notebooks in that era)
YetAnotherNick · 3h ago
> how they sucked
Care to elaborate. I had the 9" mini laptop kind of device based on Atom and don't remember Atom to be the issue.
mananaysiempre · 3h ago
I had a Atom-based netbook (in the early days when they were 32-bit-only and couldn’t run up-to-date Windows). It didn’t suck, as such, but it was definitely resource-starved.
However, what I meant is Atom-based Android tablets. At about the same time as the netbook craze (late 2000s to early 2010s) there was a non-negligible number of Android tablets, and a noticeable fraction of them was not ARM- but Atom-based. (The x86 target in the Android SDK wasn’t only there to support emulators, originally.) Yet that stopped pretty quickly, and my impression is that that happened because, while Intel would certainly have liked to hitch itself to the Android train, they just couldn’t get Atoms fast enough at equivalent power levels (either at all or quickly enough). Could have been something else, e.g. perhaps they didn’t have the expertise to build SoCs with radios?
Either way, it’s not that Intel didn’t want to get into consumer mobile devices, it’s that they tried and did not succeed.
toast0 · 1h ago
Android x86 devices suffer when developers include binary libraries and don't add x86. At the time of Intel's x86 for Android push, Google didn't have good apk thinning options, so app developers had to decide if they wanted to add x86 libraries for everyone so that a handful of tablets/phones would work properly... for the most part, many developers said no; even though many/most apps are tested on the android emulator that runs on x86 and probably have binary libraries available to work in that case.
IMHO, If Intel had done another year or two of trying, it probably would have worked, but they gave up. They also canceled x86 for phone like the day before the Windows Mobile Continuum demo, which would have been a potentially much more compelling product with x86, especially if Microsoft allowed running win32 apps (which they probably wouldn't, but the potential would be interesting)
cptskippy · 3h ago
Atom used an in-order execution model so it's performance was always going to be lacking. Because it was in-order it had a much simpler decoder and much smaller die size, which meant you could crap the chipset and CPU on a single die.
Atom wasn't about power efficiency or performance, it was about cost optimization.
saltcured · 3h ago
I had an Atom-based Android phone (Razr-i) and it was fine.
kccqzy · 3h ago
They sucked because Intel didn't care.
criticalfault · 3h ago
Were they running windows or android?
SOTGO · 3h ago
I'd be interested to hear someone with more experience talk about this or if there's more recent research, but in school I read this paper: <https://research.cs.wisc.edu/vertical/papers/2013/hpca13-isa...> that seems to agree that x86 and ARM as instruction sets do not differ greatly in power consumption. They also found that GCC picks RISC-like instructions when compiling for x86 which meant the number of micro-ops was similar between ARM and x86, and that the x86 chips were optimized well for those RISC-like instructions and so were similarly efficient to ARM chips. They have a quote that "The microarchitecture, not the ISA, is responsible for performance differences."
gigatexal · 3h ago
You’ll pry the ARM M series chips of my Mac from my cold dead hands. They’re a game changer in the space and one of the best reasons to use a Mac.
I am not a chip expert it’s just so night and day different using a Mac with an arm chip compared to an Intel one from thermals to performance and battery life and everything in between. Intel isn’t even in the same ballpark imo.
But competition is good and let’s hope they both do —- Intel and AMD because the consumer wins.
mort96 · 3h ago
I have absolutely no doubt in my mind that if Apple's CPU engineers got half a decade and a mandate from the higher ups, they could make an amazing amd64 chip too.
kccqzy · 3h ago
That's not mostly because of a better ISA. If Intel and Apple had a chummier relationship you could imagine Apple licensing the Intel x86 ISA and the M series chips would be just as good but running x86. However I suspect no matter how chummy that relationship was, business is business and it is highly unlikely that Intel would give Apple such a license.
FlyingAvatar · 2h ago
It's pretty difficult to imagine.
Apple did a ton of work on the power efficiency of iOS on their own ARM chips for iPhone for a decade before introducing the M1.
Since iOS and macOS share the same code base (even when they were on different architectures) it makes much more sense to simplify to a single chip architecture that they already had major expertise with and total control over.
There would be little to no upside for cutting Intel in on it.
x0x0 · 2h ago
> That's not mostly because of a better ISA
Genuinely asking -- what is it due to? Because like the person you're replying to, the m* processors are simply better: desktop-class perf on battery that hangs with chips with 250 watt TDP. I have to assume that amd and intel would like similar chips, so why don't they have them if not due to the instruction set? And AMD is using TSMC, so that can't be the difference.
toast0 · 58m ago
I think the fundamental difference between an Apple CPU and an Intel/AMD CPU is Apple does not play in the megahertz war. The Apple M1 chip, launched in 2020 clocks at 3.2GHz; Intel and AMD can't sell a flagship mobile processor that clocks that low. Zen+ mobile Ryzen 7s released Jan 2019 have a boost clock of 4 GHz (ex: 3750H, 3700U); mobile Zen2 from Mar 2020 clock even higher (ex: 4900H at 4.4, 4800H at 4.2). Intel Tiger Lake was hitting 4.7 Ghz in 2020 (ex: 1165G7).
If you don't care to clock that high, you can reduce space and power requirements at all clocks; AMD does that for the Zen4c and Zen5c cores, but they don't (currently) ship an all compact core mobile processor. Apple can sell a premium branded CPU where there's no option to burn a lot of power to get a little faster; but AMD and Intel just can't, people may say they want efficiency, but having higher clocks is what makes an x86 processor premium.
In addition to the basic efficiency improvements you get by having a clock limit, Apple also utilizes wider execution; they can run more things in parallel, this is enabled to some degree by the lower clock rates, but also by the commitment to higher memory bandwidth via on package memory; being able to count on higher bandwidth means you can expect to have more operations that are waiting on execution rather than waiting on memory, so wider execution has more benefits. IIRC, Intel released some chips with on package memory, but they can't easily just drop in a couple more integer units onto an existing core.
The weaker memory model of ARM does help as well. The M series chips have a much wider out of order window, because they don't need to spend as much effort on ordering constraints (except when running in the x86 support mode); this also helps justify wider execution, because they can keep those units busy.
I think these three things are listed in order of impact, but I'm just an armchair computer architecture philosopher.
fluoridation · 24m ago
Does anyone actually care at all about frequencies? I care if my task finishes quickly. If it can finish quickly at a low frequency, fine. If the clock runs fast but the task doesn't, how is that a benefit?
My understanding is that both Intel and AMD are pushing high clocks not because it's what consumers want, but because it's the only lever they have to pull to get more gains. If this year's CPU is 2% faster than your current CPU, why would you buy it? So after they have their design they cover the rest of the target performance gain by cranking the clock, and that's how you get 200 W desktop CPUs.
>the commitment to higher memory bandwidth via on package memory; being able to count on higher bandwidth means you can expect to have more operations that are waiting on execution rather than waiting on memory, so wider execution has more benefits.
I believe you could make a PC (compatible) with unified memory and a 256-bit memory bus, but then you'd have to make the whole thing. Soldered motherboard, CPU/GPU, and RAM. I think at the time the M1 came out there weren't any companies making hardware like that. Maybe now that x86 handhelds are starting to come out, we may see laptops like that.
exmadscientist · 2h ago
> I have to assume that amd and intel would like similar chips
They historically haven't. They've wanted the higher single-core performance and frequency and they've pulled out all the stops to get it. Everything had been optimized for this. (Also, they underinvested in their uncores, the nastiest part of a modern processor. Part of the reason AMD is beating Intel right now despite being overall very similar is their more recent and more reliable uncore design.)
They are now realizing that this was, perhaps, a mistake.
AMD is only now in a position to afford to invest otherwise (they chose quite well among the options actually available to them, in my opinion), but Intel has no such excuse.
x0x0 · 1h ago
Not arguing, but I would think there is (and always has been) very wide demand for fastest single core perf. From all the usual suspects?
Thank you.
bryanlarsen · 2h ago
What's it due to? At least this, probably more.
- more advanced silicon architecture. Apple spends billions to get access to the latest generation a couple of years before AMD.
- world class team, with ~25 years of experience building high speed low power chips. (Apple bought PA Semi to make these chips, which was originally the team that build the DEC StrongARM). And then paid & treated them properly, unlike Intel & AMD
- a die budget to spend transistors for performance: the M chips are generally quite large compared to the competition
- ARM's weak memory model also helps, but it's very minor IMO compared to the above 3.
gigatexal · 1h ago
How many of those engineers remain, didn't a lot go to Nuvia that was then bought by Qualcomm?
bryanlarsen · 1h ago
Sure, but they were there long enough to train and instill culture into the others. And of course, since the acquisition in 2008 they've had access to the top new grads and experienced engineers. If you're coming out top of your class at an Ivy or similar you're going to choose Apple over Intel or AMD both because of rep and the fact that your offer salary is much better.
P.S. hearsay and speculation, not direct experience. I haven't worked at Apple and anybody who has is pretty closed lip. You have to read between the lines.
P.P.S. It's sort of a circular argument. I say Apple has the best team because they have the best chip && they have the best chip because they have the best team.
But having worked (briefly) in the field, I'm very confident that their success is much more likely due to having the best team rather than anything else.
Your Intel mac was stuck in the past while everyone paying attention on PCs were already enjoying TSMC 7nm silicon in the form of AMD Zen processors.
Apple Silicon macs are far less impressive if you came from an 8c/16t Ryzen 7 laptop. Especially if you consider the Apple parts are consistently enjoying the next best TSMC node vs. AMD (e.g. 5nm (M1) vs. 7nm (Zen2))
What's _really_ impressive is how badly Intel fell behind and TSMC has been absolutely killing it.
gigatexal · 1h ago
that ryzen laptop chip perform it'll just do it at a higher perf/watt than the apple chip will... and on a laptop that's a key metric.
tracker1 · 1h ago
And 20% or so of that difference is purely the fab node difference, not anything to do with the chip design itself. Strix Halo is a much better comparison, though Apple's M4 models do very well against it often besting it at the most expensive end.
On the flip side, if you look at servers... Compare a 128+core AMD server CPU vs a large core ARM option and AMD perf/watt is much better.
nottorp · 3h ago
Irrelevant.
There are two entities allowed to make x86_64 chips (and that only because AMD won the 64 bit ISA competition, otherwise there'd be only Intel). They get to choose.
The rest will use arm because that's all they have access to.
Oh, and x86_64 will be as power efficient as arm when one of the two entities will stop competing on having larger numbers and actually worry about power management. Maybe provide a ?linux? optimized for power consumption.
Avi-D-coder · 3h ago
From what I have heard it's not the RISCy ISA per se, it's largely arm's weaker memory model.
I'd be happy to be corrected, but the empirical core counts seem to agree.
hydroreadsstuff · 3h ago
Indeed, the memory model has a decent impact.
Unfortunately it's difficult to isolate in measurement.
Only Apple has support for weak memory order and TSO in the same hardware.
variadix · 1h ago
Instruction decode for variable length ISAs is inherently going to be more complex, and thus require more transistors = more power, than fixed length instruction decode, especially parallel decode. AFAIK modern x86 cores have to speculatively decode instructions to achieve this, compared to RISC ISAs where you know where all the instruction boundaries are and decoding N in parallel is a matter of instantiating N decoders that work in parallel. How much this determines the x86 vs ARM power gap, I don’t know, what’s much more likely is x86 designs have not been hyper optimized for power as much ARM designs have been over the last two decades. Memory order is another non-negligible factor, but again the difference is probably more attributable to the difference in goals between the two architectures for the vast majority of their lifespan, and the expertise and knowledge of the engineers working at each company.
w4rh4wk5 · 3h ago
Ok. When will we get the laptop with AMD CPU that is on par with a Macbook regarding battery life?
azornathogron · 3h ago
How much of the Mac's impressive battery life is due purely to CPU efficiency, and how much is due to great vertical integration and the OS being tuned for power efficiency?
It's a genuine question; I'm sure both factors make a difference but I don't know their relative importance.
SushiHippie · 3h ago
I just searched for the asahi linux (Linux for M Series Macs) battery life, and found this blog post [0].
> During active development with virtual machines running, a few calls, and an external keyboard and mouse attached, my laptop running Asahi Linux lasts about 5 hours before the battery drops to 10%. Under the same usage, macOS lasts a little more than 6.5 hours. Asahi Linux reports my battery health at 94%.
The overwhelming majority is due to the power management software, yes. Other ARM laptops do not get anywhere close to the same battery life. The MNT Reform with 8x 18650s (24000mAh, 3x what you get an MBP) gets about 5h of battery life with light usage.
John23832 · 2h ago
Tight integration matters.
Look at the difference in energy usage between safari and chrome on M4s.
pablok2 · 3h ago
How much instruction perf analysis do they do to save 1% (compounded) on the most common instructions
mmis1000 · 3h ago
I think it would only be fair to compare it when running some more resource efficient system.
Steamdeck with Windows 11 and SteamOS is a whole different experience. When running SteamOS and doing web surfing, the fan don't even really spin at all. But when running windows 11 and do the exact same thing, it just spins all the time and becomes kinda hot.
slimginz · 3h ago
IIRC There was a Jim Keller interview a few years ago where he said basically the same thing (I think it was from right around when he joined Tenstorrent?). The ISA itself doesn't matter, it's just instructions. The way the chip interprets those instructions is what makes the difference. ARM was designed from the beginning for low powered devices whereas x86 wasn't. If x86 is gonna compete with ARM (and RISC-V) then the chips are gonna need to also be optimized for low powered devices, but that can break decades of compatibility with older software.
sapiogram · 3h ago
It's probably from the Lex Friendman podcast he did. And to be fair, he didn't say "it doesn't matter", he said "it's not that important".
ZuLuuuuuu · 3h ago
There are a lot of theoretical articles which claim similar things but on the other hand we have a lot of empirical evidence that ARM CPUs are significantly more power efficient.
I used laptops with both Intel and AMD CPUs, and I read/watch a lot of reviews in thin and light laptop space. Although AMD became more power efficient compared to Intel in the last few years, AMD alternative is only marginally more efficient (like 5-10%). And AMD is using TSMC fabs.
On the other hand Qualcomm's recent Snapdragon X series CPUs are significantly more efficient then both Intel and AMD in most tests while providing the same performance or sometimes even better performance.
Some people mention the efficiency gains on Intel Lunar Lake as evidence that x86 is just as efficient, but Lunar Lake was still slightly behind in battery life and performance, while using a newer TSMC process node compared to Snapdragon X series.
So, even though I see theoretical articles like this, the empirical evidence says otherwise. Qualcomm will release their second generation Snapdragon X series CPUs this month. My guess is that the performance/efficiency gap with Intel and AMD will get even bigger.
WithinReason · 3h ago
Since newer CPUs have heterogeneous cores (high performance + low power), I'm wondering if it makes sense to drop legacy instructions from the low power cores, since legacy code can still be run on the other cores. Then e.g. an OS compiled the right way can take advantage of extra efficiency without the CPU losing backwards compatibility
toast0 · 3h ago
Like o11c says, that's setting everyone up for a bad time. If the heterogenous cores are similar, but don't all support all the instructions, it's too hard to use. You can build legacy instructions in a space optimized way though, but there's no reason not to do that for the high performance cores too --- if they're legacy instructions, one expects them not to run often and perf doesn't matter that much.
Intel dropped their x86-S proposal; but I guess something like that could work for low power cores. If you provide a way for a 64-bit OS to start application processors directly in 64-bit mode, you could setup low power cores so that they could only run in 64-bit mode. I'd be surprised if the juice is worth the squeeze, but it'd be reasonable --- it's pretty rare to be outside 64-bit mode, and systems that do run outside 64-bit mode probably don't need all the cores on a modern processor. If you're running in a 64-bit OS, it knows which processes are running in 32-bit mode, and could avoid scheduling them on reduced functionality cores; If you're running a 32-bit OS, somehow or another the OS needs to not use those cores... either the ACPI tables are different and they don't show up for 32-bit, init fails and the OS moves on, or the there is a firmware flag to hide them that must be set before running a 32-bit OS.
jdsully · 40m ago
I don't really understand why the OS can't just trap the invalid instruction exception and migrate it to the P-core. E.g. AVX-512 and similar. For very old and rare instructions they can emulate them. We used to do that with FPU instructions on non-FPU enabled CPUs way back in the 80s and 90s.
devnullbrain · 3h ago
Interesting but it would be pretty rough to implement. If you take a binary now and run it on a core without the correct instructions, it will SIGILL and probably crash. So you have these options:
Create a new compilation target
- You'll probably just end up running a lot of current x86 code exclusively on performance cores to a net loss. This is how RISC-V deals with optional extensions.
Emulate
- This already happens for some instructions but, like above, could quickly negate the benefits
Ask for permission
- This is what AVX code does now, the onus is on the programmer to check if the optional instructions can be used. But you can't have many dropped instructions and expect anybody to use it.
Ask for forgiveness
- Run the code anyway and catch illegal instruction exceptions/signals, then move to a performance core. This would take some deep kernel surgery for support. If this happens remotely often it will stall everything and make your system hate you.
The last one raises the question: which instructions are we considering 'legacy'? You won't get far in an x86 binary before running into an instruction operating on memory that, in a RISC ISA, would mean first a load instruction, then the operation, then a store. Surely we can't drop those.
wtallis · 3h ago
IIRC, there were several smartphone SoCs that dropped 32-bit ARM support from most but not all of their CPU cores. That was straightforward to handle because the OS knows which instruction set a binary wants to use. Doing anything more fine-grained would be a nightmare, as Intel found out with Alder Lake.
o11c · 3h ago
We've seen CPU-capability differences by accident a few times, and it's always a chaotic mess leading to SIGILL.
The kernel would need to have a scheduler that knows it can't use those cores for certain tasks. Think about how hard you would have to work to even identify such a task ...
mmis1000 · 1h ago
Current windows or linux executable format don't even list the used instruction though. And even it is listed, how about dynamic linkables? The program may decide to load library at any time it wishes, and the OS is not going to know what instruction may be used this time.
Findecanor · 3h ago
I think it is not really the execution units for simple instructions that take up much chip area on application-class CPUs these days, but everything around them.
I think support in the OS/runtime environment* would be more interesting for chips where some cores have larger execution units such as those for vector and matmul units. Especially for embedded / low power systems.
Maybe x87/MMX could be dropped though.
*. BTW. If you want to find research papers on the topic, a good search term is "partial-ISA migration".
flembat · 2h ago
That is quite a confession from AMD.
It's not X86 at all, just every implementation.
It is not like the ARM processors in Macs are simple any more, thats for sure.
cptskippy · 1h ago
The ISA is the contract or boundary between software and hardware. While there is a hardware cost to decode instructions, the question is how much?
As all the fanbois in the thread have have pointed out, Apple's M series is fast and efficient compared to x86 for desktop/server workloads. What no one seems to acknowledge is that Apple's A series is also fast and efficient compared to other ARM implementations in mobile workloads. Apple sees the need to maintain M and A series CPUs for different workloads, which indicates there's a benefit to both.
This tells me the ISA decode hardware isn't or isn't the only bottleneck.
bfrog · 3h ago
And yet... the world keeps proving Intel and AMD wrong on this premise with highly efficient Arm parts. While sure, there's bound to be improvements to make on x86 ultimately its a variable length opcode encoding with a complex decoder path. If nothing else, this is likely a significant issue in comparison to the nicely word aligned op code encoding arm has and surely given apples to apples core designs, the opcode decoding would be a deciding factor.
tinktank · 3h ago
PPA results comparing x86 to ARM say otherwise; take a look at Ryzen's embedded series and Intel's latest embedded cores.
bfrog · 3h ago
Have they reached apple m level of performance/watt after half a decade of the apple m parts being out yet? Do either AMD or Intel beat Apple in any metric in mobile?
tinktank · 1h ago
Yes and yes - again, go look at the published results.
ch_123 · 3h ago
> its a variable length opcode encoding with a complex decoder path
In practice, the performance impact of variable length encoding is largely kept in check using predictors. The extra complexity in terms of transistors is comparatively small in a large, high-performance design.
Jim Keller has a storied career in the x86 world, it isn't surprising he speaks fondly of it. Regardless:
>So fixed-length instructions seem really nice when you're building little baby computers, but if you're building a really big computer, to predict or to figure out where all the instructions are, it isn't dominating the die. So it doesn't matter that much.
Well, efficiency advantages are the domain of little baby computers. Better predictors give you deeper pipelines without stalls which give you higher clock speeds - higher wattages
In the past x86 didn't dominate in low power because Intel had the resources to care but never did, and AMD never had the resources to try. Other companies stepped in to full that niche, and had to use other ISAs. (If they could have used x86 legally, they might well have done so. Oops?) That may well be changing. Or perhaps AMD will let x86 fade away.
https://web.archive.org/web/20210622080634/https://www.anand...
Basically the gist of it is that the difference between ARM/x86 mostly boils down to instruction decode, and:
- Most instructions end up being simple load/store/conditional branch etc. on both architectures, where there's literally no difference in encoding efficiency
- Variable length instruction has pretty much been figured out on x86 that it's no longer a bottleneck
Also my personal addendum is that today's Intel efficiency cores are have more transistors and better perf than the big Intel cores of a decade ago
M1 - 8 wide
M4 - 10 wide
Zen 4 - 4 wide
Zen 5 - 8 wide
I imagine that the difference is much greater for the tiny in-order CPUs we find in MCUs though, just because an amd64 decoder would be a comparatively much larger fraction of the transistor budget
No, not really. The advantage is Apple prioritizing efficiency, something Intel never cared enough about.
DoD originally required all products to be sourced by at least three companies to prevent supply chain issues. This required Intel to allow AMD and VIA to produce products based on ISA.
For me this is good indicator if someone that talks about good national security knows what they are talking about or are just spewing bullshit and playing national security theatre.
Remember Atom tablets (and how they sucked)?
I have a ten-year old Lenovo Yoga Tab 2 8" Windows tablet, which I still use at least once every week. It is still useful. Who can say that they are still using a ten-year old Android tablet?
No comments yet
(Also probably due to it is a tablet, so it have a reasonable fast storage instead of hdds like notebooks in that era)
Care to elaborate. I had the 9" mini laptop kind of device based on Atom and don't remember Atom to be the issue.
However, what I meant is Atom-based Android tablets. At about the same time as the netbook craze (late 2000s to early 2010s) there was a non-negligible number of Android tablets, and a noticeable fraction of them was not ARM- but Atom-based. (The x86 target in the Android SDK wasn’t only there to support emulators, originally.) Yet that stopped pretty quickly, and my impression is that that happened because, while Intel would certainly have liked to hitch itself to the Android train, they just couldn’t get Atoms fast enough at equivalent power levels (either at all or quickly enough). Could have been something else, e.g. perhaps they didn’t have the expertise to build SoCs with radios?
Either way, it’s not that Intel didn’t want to get into consumer mobile devices, it’s that they tried and did not succeed.
IMHO, If Intel had done another year or two of trying, it probably would have worked, but they gave up. They also canceled x86 for phone like the day before the Windows Mobile Continuum demo, which would have been a potentially much more compelling product with x86, especially if Microsoft allowed running win32 apps (which they probably wouldn't, but the potential would be interesting)
Atom wasn't about power efficiency or performance, it was about cost optimization.
I am not a chip expert it’s just so night and day different using a Mac with an arm chip compared to an Intel one from thermals to performance and battery life and everything in between. Intel isn’t even in the same ballpark imo.
But competition is good and let’s hope they both do —- Intel and AMD because the consumer wins.
Apple did a ton of work on the power efficiency of iOS on their own ARM chips for iPhone for a decade before introducing the M1.
Since iOS and macOS share the same code base (even when they were on different architectures) it makes much more sense to simplify to a single chip architecture that they already had major expertise with and total control over.
There would be little to no upside for cutting Intel in on it.
Genuinely asking -- what is it due to? Because like the person you're replying to, the m* processors are simply better: desktop-class perf on battery that hangs with chips with 250 watt TDP. I have to assume that amd and intel would like similar chips, so why don't they have them if not due to the instruction set? And AMD is using TSMC, so that can't be the difference.
If you don't care to clock that high, you can reduce space and power requirements at all clocks; AMD does that for the Zen4c and Zen5c cores, but they don't (currently) ship an all compact core mobile processor. Apple can sell a premium branded CPU where there's no option to burn a lot of power to get a little faster; but AMD and Intel just can't, people may say they want efficiency, but having higher clocks is what makes an x86 processor premium.
In addition to the basic efficiency improvements you get by having a clock limit, Apple also utilizes wider execution; they can run more things in parallel, this is enabled to some degree by the lower clock rates, but also by the commitment to higher memory bandwidth via on package memory; being able to count on higher bandwidth means you can expect to have more operations that are waiting on execution rather than waiting on memory, so wider execution has more benefits. IIRC, Intel released some chips with on package memory, but they can't easily just drop in a couple more integer units onto an existing core.
The weaker memory model of ARM does help as well. The M series chips have a much wider out of order window, because they don't need to spend as much effort on ordering constraints (except when running in the x86 support mode); this also helps justify wider execution, because they can keep those units busy.
I think these three things are listed in order of impact, but I'm just an armchair computer architecture philosopher.
My understanding is that both Intel and AMD are pushing high clocks not because it's what consumers want, but because it's the only lever they have to pull to get more gains. If this year's CPU is 2% faster than your current CPU, why would you buy it? So after they have their design they cover the rest of the target performance gain by cranking the clock, and that's how you get 200 W desktop CPUs.
>the commitment to higher memory bandwidth via on package memory; being able to count on higher bandwidth means you can expect to have more operations that are waiting on execution rather than waiting on memory, so wider execution has more benefits.
I believe you could make a PC (compatible) with unified memory and a 256-bit memory bus, but then you'd have to make the whole thing. Soldered motherboard, CPU/GPU, and RAM. I think at the time the M1 came out there weren't any companies making hardware like that. Maybe now that x86 handhelds are starting to come out, we may see laptops like that.
They historically haven't. They've wanted the higher single-core performance and frequency and they've pulled out all the stops to get it. Everything had been optimized for this. (Also, they underinvested in their uncores, the nastiest part of a modern processor. Part of the reason AMD is beating Intel right now despite being overall very similar is their more recent and more reliable uncore design.)
They are now realizing that this was, perhaps, a mistake.
AMD is only now in a position to afford to invest otherwise (they chose quite well among the options actually available to them, in my opinion), but Intel has no such excuse.
Thank you.
- more advanced silicon architecture. Apple spends billions to get access to the latest generation a couple of years before AMD.
- world class team, with ~25 years of experience building high speed low power chips. (Apple bought PA Semi to make these chips, which was originally the team that build the DEC StrongARM). And then paid & treated them properly, unlike Intel & AMD
- a die budget to spend transistors for performance: the M chips are generally quite large compared to the competition
- ARM's weak memory model also helps, but it's very minor IMO compared to the above 3.
P.S. hearsay and speculation, not direct experience. I haven't worked at Apple and anybody who has is pretty closed lip. You have to read between the lines.
P.P.S. It's sort of a circular argument. I say Apple has the best team because they have the best chip && they have the best chip because they have the best team.
But having worked (briefly) in the field, I'm very confident that their success is much more likely due to having the best team rather than anything else.
re: apple getting exclusive access to the best fab stuff: https://appleinsider.com/articles/23/08/07/apple-has-sweethe... . Interesting.
Apple Silicon macs are far less impressive if you came from an 8c/16t Ryzen 7 laptop. Especially if you consider the Apple parts are consistently enjoying the next best TSMC node vs. AMD (e.g. 5nm (M1) vs. 7nm (Zen2))
What's _really_ impressive is how badly Intel fell behind and TSMC has been absolutely killing it.
On the flip side, if you look at servers... Compare a 128+core AMD server CPU vs a large core ARM option and AMD perf/watt is much better.
There are two entities allowed to make x86_64 chips (and that only because AMD won the 64 bit ISA competition, otherwise there'd be only Intel). They get to choose.
The rest will use arm because that's all they have access to.
Oh, and x86_64 will be as power efficient as arm when one of the two entities will stop competing on having larger numbers and actually worry about power management. Maybe provide a ?linux? optimized for power consumption.
I'd be happy to be corrected, but the empirical core counts seem to agree.
It's a genuine question; I'm sure both factors make a difference but I don't know their relative importance.
> During active development with virtual machines running, a few calls, and an external keyboard and mouse attached, my laptop running Asahi Linux lasts about 5 hours before the battery drops to 10%. Under the same usage, macOS lasts a little more than 6.5 hours. Asahi Linux reports my battery health at 94%.
[0] https://blog.thecurlybraces.com/2024/10/running-fedora-asahi...
Look at the difference in energy usage between safari and chrome on M4s.
Steamdeck with Windows 11 and SteamOS is a whole different experience. When running SteamOS and doing web surfing, the fan don't even really spin at all. But when running windows 11 and do the exact same thing, it just spins all the time and becomes kinda hot.
I used laptops with both Intel and AMD CPUs, and I read/watch a lot of reviews in thin and light laptop space. Although AMD became more power efficient compared to Intel in the last few years, AMD alternative is only marginally more efficient (like 5-10%). And AMD is using TSMC fabs.
On the other hand Qualcomm's recent Snapdragon X series CPUs are significantly more efficient then both Intel and AMD in most tests while providing the same performance or sometimes even better performance.
Some people mention the efficiency gains on Intel Lunar Lake as evidence that x86 is just as efficient, but Lunar Lake was still slightly behind in battery life and performance, while using a newer TSMC process node compared to Snapdragon X series.
So, even though I see theoretical articles like this, the empirical evidence says otherwise. Qualcomm will release their second generation Snapdragon X series CPUs this month. My guess is that the performance/efficiency gap with Intel and AMD will get even bigger.
Intel dropped their x86-S proposal; but I guess something like that could work for low power cores. If you provide a way for a 64-bit OS to start application processors directly in 64-bit mode, you could setup low power cores so that they could only run in 64-bit mode. I'd be surprised if the juice is worth the squeeze, but it'd be reasonable --- it's pretty rare to be outside 64-bit mode, and systems that do run outside 64-bit mode probably don't need all the cores on a modern processor. If you're running in a 64-bit OS, it knows which processes are running in 32-bit mode, and could avoid scheduling them on reduced functionality cores; If you're running a 32-bit OS, somehow or another the OS needs to not use those cores... either the ACPI tables are different and they don't show up for 32-bit, init fails and the OS moves on, or the there is a firmware flag to hide them that must be set before running a 32-bit OS.
Create a new compilation target
- You'll probably just end up running a lot of current x86 code exclusively on performance cores to a net loss. This is how RISC-V deals with optional extensions.
Emulate
- This already happens for some instructions but, like above, could quickly negate the benefits
Ask for permission
- This is what AVX code does now, the onus is on the programmer to check if the optional instructions can be used. But you can't have many dropped instructions and expect anybody to use it.
Ask for forgiveness
- Run the code anyway and catch illegal instruction exceptions/signals, then move to a performance core. This would take some deep kernel surgery for support. If this happens remotely often it will stall everything and make your system hate you.
The last one raises the question: which instructions are we considering 'legacy'? You won't get far in an x86 binary before running into an instruction operating on memory that, in a RISC ISA, would mean first a load instruction, then the operation, then a store. Surely we can't drop those.
The kernel would need to have a scheduler that knows it can't use those cores for certain tasks. Think about how hard you would have to work to even identify such a task ...
I think support in the OS/runtime environment* would be more interesting for chips where some cores have larger execution units such as those for vector and matmul units. Especially for embedded / low power systems.
Maybe x87/MMX could be dropped though.
*. BTW. If you want to find research papers on the topic, a good search term is "partial-ISA migration".
As all the fanbois in the thread have have pointed out, Apple's M series is fast and efficient compared to x86 for desktop/server workloads. What no one seems to acknowledge is that Apple's A series is also fast and efficient compared to other ARM implementations in mobile workloads. Apple sees the need to maintain M and A series CPUs for different workloads, which indicates there's a benefit to both.
This tells me the ISA decode hardware isn't or isn't the only bottleneck.
In practice, the performance impact of variable length encoding is largely kept in check using predictors. The extra complexity in terms of transistors is comparatively small in a large, high-performance design.
Related reading:
https://patents.google.com/patent/US6041405A/en
https://web.archive.org/web/20210709071934/https://www.anand...
>So fixed-length instructions seem really nice when you're building little baby computers, but if you're building a really big computer, to predict or to figure out where all the instructions are, it isn't dominating the die. So it doesn't matter that much.
Well, efficiency advantages are the domain of little baby computers. Better predictors give you deeper pipelines without stalls which give you higher clock speeds - higher wattages