Why did not numpy copy the J rank concept?

It's old, but very clear. I tried to read the ITTAGE paper but it assumes you know all that already. Also it doesn't actually fully specify a branch predictor because there are various hashes you need to calculate and it simply doesn't say what they use.

saagarjha · 9h ago

If the author is around, the final link points to http://localhost:1313/post/cpython-tail-call/#further-weirdn....

burnt-resistor · 9h ago

Some architectures have/had branch hint instructions.

https://arcb.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/a...

The impact of a branch miss is a particular pipeline stalls to flush the incorrect prediction. If there were resources available for the other branch to be speculatively executed concurrently and in parallel it might take less wall time.

pbsd · 8h ago

The Pentium 4 had branch hints in the form of taken/not taken prefixes. They were not found to be useful and basically ignored in every subsequent Intel microarchitecture, until Redwood Cove brought back the branch taken prefix in 2023.

Taniwha · 4h ago

Branch hint instructions essentially give you the initial value for your BTC entry, after that you want it to learn - in general though if you initially predict backwards branches and don't predict forwards ones it's almost as good.

Very few architectures have conditional indirect branches and they don't get used all that much:

- subroutine return: better predicted with a stack - virtual method dispatch: needs a predictor (for the destination, not the 'taken' - a different thing with multiple destinations chosen by the history than a normal branch destination which typically has a single destination and a history choosing whether taken or not) - dense case statements: similar to virtual method dispatch but maybe with a need for far more destinations

All these cases often involve a memory load prior to the branch, in essence what you are predicting is what is being loaded, and you want to keep feeding the pipe while you wait for the load to complete

nynx · 10h ago

I must be missing something here. How would this help predict interpreter dispatch? Those won’t be a function of previous branch history or pc, which may very well be independent of the next opcode. They’d be a function of state in memory or registers.

saagarjha · 9h ago

Interpreters are just like normal programs, but splatted out a bit. In particular, they have branches and loops just like normal programs. The challenge for processors is that these high level constructs are far apart and dispatched through an interpreter loop, which obfuscates them. Being able to reach further back in history lets you recover this kind of information "through" the intervening bits.

dzaima · 9h ago

If your interpreter is interpreting a program with unpredictable branches, of course no predictor will magically make your interpreter get branches better predicted than an equivalent compiled program will.

The question here is about all other branching the interpreter will do. i.e. even if you have a unpredictable `if (a+b < 0)`, there's still the dispatching to the "load-variable" and "add" and "load-constant" and "less-than" and "do-branch" opcodes, that still will benefit from being predicted, and they could very well if you have it repeated in a loop (despite still having a single unpredictable branch), or potentially even if you just have a common pattern in the language (e.g. comparison opcodes being followed by a branch opcode).

brigade · 10h ago

In a hot loop, the next opcode can be predicted quite well from the history of previous opcodes executed, especially once have a couple iterations available in your history. And the opcodes executed in an interpreter are generally equivalent to the dispatch branch target.

achierius · 10h ago

"very well may be" but oftentimes isn't. Branch history does in practice do a very good job of predicting what target you're going to take for an indirect branch.

nynx · 10h ago

Sure. I can easily see that often being the case for arbitrary code but not interpreter dispatch loops.

jonstewart · 9h ago

I learned about computed goto a dozen years ago, tried it out in my interpreter, and got worse performance in that Haswell era than with a trusty switch statement. Branch predictors have made computed goto less compelling for a good long time.

Tail call is a different matter…