Ques about Branch Predictors

Devil Slayer · May 6, 2015

I have been reading about CPU architectures for the past few months and while I understand the idea of most of it (I am not a programmer) I am quite confused by the Branch Predictor. So I have a few questions:

I understand what Branch Predictor is supposed to do fundamentally but since I have never programmed software I don't know how much it affects IPC.
Say I were to remove the Branch Predictor from a Haswell core. By what factor would IPC drop? What will be the effect on die size?
Say I were to take the Branch Predictor in Haswell and magically make it perfect. By what factor would IPC increase?
Do GPU cores use Branch Predictors as well?

Frick · May 7, 2015

I'm pretty sure these things are near impossible to answer accurately unless you actually designed the chip in question.

EDIT: BTW, did you read this? A quick google brought it up, might hold something interesting.

http://www.agner.org/optimize/microarchitecture.pdf

Devil Slayer · May 7, 2015

Well, I only ask for a rough estimate.

WhoDecidedThat · May 8, 2015

BUMP!

kn00tcn · May 8, 2015

this sounds like a question for david kanter, maybe you can also check out his past realworldtech articles on cpu architectures for some intense reading

Aquinus · May 8, 2015

Devil Slayer said:
I understand what Branch Predictor is supposed to do fundamentally but since I have never programmed software I don't know how much it affects IPC.

To understand why branch predictors are important you need to know a little more about CPUs, in particular super scalar CPUs. Most CPUs now are super scalar where each core has a pipeline; various stages in which CPU instructions are processed. What a super scalar CPU does is that once an instruction is loaded through the CPU, it will get put into the pipeline and the next instruction will start getting processed even though the last instruction isn't complete and is still working its way through the pipeline.

...but what in the word does this have to do with branch prediction?! ... a lot.
When you have a bunch of instructions lined up in the pipeline, it's really important to find out exactly which instruction is going to execute. The branch predictor figures out if the instruction will branch or not branch. A branch is like a "jump to subroutine" instruction. For example on the HCS12 assembly, the BNE instruction stands for "branch no equal" which means when register A and register B are not equal, branch to the memory location in the first operand of the instruction. So you run into a case where dependent on data (which may not be calculated yet,) where you have to determine to branch or not.

When a branch prediction misses, you get a pipeline stall. Basically the CPU has to toss away everything in the pipeline and start from the miss-predicted branch. As a result, you take a performance hit for every miss-prediction and the bigger the pipeline (cough, Netburst and Bulldozer, cough) the longer the stall.

Devil Slayer said:
Say I were to remove the Branch Predictor from a Haswell core. By what factor would IPC drop? What will be the effect on die size?

You would destroy performance depending on the workload. Branch heavy workloads would suffer, compute heavy workloads would not.

Devil Slayer said:
Say I were to take the Branch Predictor in Haswell and magically make it perfect. By what factor would IPC increase?

Depends on the workload because the improvement is highly conditional upon the code that's running, so there is no meaningful number that could be provided here.

Devil Slayer said:
Do GPU cores use Branch Predictors as well?

GPU cores tend to be much more basic than CPU cores. Since stream processors I don't think are pipelined (if they are, they're very short,) so it wouldn't need it due to the nature of the core. No pipeline = no branch predictor.

The branch predictor will never be perfect, it can only predict the outcome of something because what it needs to know may not eve have occurred yet, so a "perfect" branch predictor is one with a pipeline of 1, or in other words, no pipeline. No pipeline means no changes that invisible to the CPU therefore all branch operations will be correct.

Edit: It's also harder to predict branches when the pipeline is really long, every extra stage opens the possibility for a more complex set of instructions to invalidate the prediction and the longer the pipeline the worse the stall.

Edit 2: Branch prediction happens in the CPU itself. Even a driver developer doesn't have to think about this because the CPU just makes it happen. This is handled entirely in hardware without any intervention.

System Name	Black MC in Tokyo
Processor	Ryzen 5 5600
Motherboard	Asrock B450M-HDV
Cooling	Be Quiet! Pure Rock 2
Memory	2 x 16GB Kingston Fury 3400mhz
Video Card(s)	XFX 6950XT Speedster MERC 319
Storage	Kingston A400 240GB \| WD Black SN750 2TB \|WD Blue 1TB x 2 \| Toshiba P300 2TB \| Seagate Expansion 8TB
Display(s)	Samsung U32J590U 4K + BenQ GL2450HT 1080p
Case	Fractal Design Define R4
Audio Device(s)	Line6 UX1 + some headphones, Nektar SE61 keyboard
Power Supply	Corsair RM850x v3
Mouse	Logitech G602
Keyboard	Cherry MX Board 1.0 TKL Brown
VR HMD	Acer Mixed Reality Headset
Software	Windows 10 Pro
Benchmark Scores	Rimworld 4K ready!

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 4TB External
Display(s)	Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply	96w Power Adapter
Mouse	Logitech MX Master 3
Keyboard	Logitech G915, GL Clicky
Software	MacOS 12.1

Ques about Branch Predictors

Devil Slayer

New Member

Frick

Fishfaced Nincompoop

Devil Slayer

New Member

WhoDecidedThat

kn00tcn

Aquinus

Resident Wat-man