Branch Prediction: From CPUs to GPUs and TPUs

How does branch prediction work on CPUS?

So essentially, in an NVIDIA GPU, the smallest unit of execution - a warp (which is made up of 32 threads that are locked together - i.e. they execute the same instruction at the same time) is responsible for the high parallelization responsible that is often talked about. Basically in things like Matrx multiplication, every operation looks identical that is, you multiply one row of the first mat with the second mat’s column and add them up. So this multiply and add operation can be run on each thread BUT on different data. And being locked together, it means that these threads will execute the instructions in the same lockstep.

But what happens when the instructions are not identical? Say for example, you have a conditional branch in your code, like an if-else statement. In this case, some threads might take the if branch, while others might take the else branch. And since all the threads in a warp must execute the same instruction at the same time, this would lead to a situation where some threads are idle while others are executing the branch. This is known as branch divergence, and it is a major cause of performance degradation in GPUs because they must execute the commands serially now since you can’t execute both if and else at the same time. This is handled in CPUs via branch predictors, which try to predict the outcome of the branch and execute the instructions speculatively.

TPUs take it a level up - dedicating even a larger portion of the on-chip computations to operations like matrix multiplication. Google’s TPUs use a systolic array architecture, which is a grid of processing elements that are connected in a way that allows data to flow through the array in a highly parallel manner. The need for fetching and writing data are eliminated here, what TPUs do is pass the tensors/arrays on which matmul must be performed from one cell to the other directly without writing to disk saving massive read/write overhead.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • ULT: Unifying Teacher-Student RL with Transformers
  • Breaking down SREGym!
  • Breaking down SREGym!
  • Let's Paint! Shall we?.
  • A simple and intuitive guide to using uv - an awesome tool from astral!