Gpu Arch

24 Jul 2024

I think this is one of the most exciting times in computer architecture for many years. In CPUs we have x86, Arm and RISC-V, three increasingly well-supported application architectures.

And GPU architectures are, of course, becoming much more important. GPUs are obviously a critical piece of machine learning infrastructure. They are also making massively parallel computing available to help solve a wider range of computational problems outside of machine learning.

There is one problem though. How GPU architectures work is much less widely understood than is the case for CPUs. CPUs are often complex, but much of that complexity (pipelines, instruction and data caches, branch prediction, out-of-order execution, register renaming, virtual memory and so on) is largely hidden from the user, and can often be ignored. Doing things in parallel, though, is inherently more complex and GPUs necessarily expose a lot of complexity.

There is also a degree of mystique about GPUs. The recent (and excellent) Acquired podcast episode on Nvidia stated that for GPUs:

The magical unlock, of course, is to make a computer that is not a von Neumann architecture.

I know the Acquired hosts didn’t mean to imply that GPUs are ‘magical’ but still the use of the word in connection with GPUs somehow seems appropriate. GPUs don’t use a ‘von Neumann’ architecture. Instead they use an unspecified, unnamed architecture, and if we don’t know what it is then its necessarily mysterious.

To make matters worse, there are several barriers to learning how GPUs work:

Architectural Variations: Each of the major GPU architectures has a somewhat different implementation.

Inconsistent Terminology: The major GPU designers use inconsistent terminology, both between themselves and with terms commonly used for CPUs. So a CPU ‘core’ is very different to Nvidia’s ‘CUDA core’.

Legacy Terminology: Some of the terms used are carried forward from the graphics origins of the GPU designs (e.g. ‘shaders’).

Software Abstraction: Some GPU complexity can be hidden from the user behind the abstractions used by software packages used to program GPUs. This helps when implementing programs, but sometimes it prevents users from developing an adequate picture of how the architecture works.

CUDA’s Dominance: Because Nvidia’s CUDA has been the market leader in GPU computing for many years now, the CUDA terminology tends to be used in many explanations making it hard to read across to other architectures.

No wonder it all seems a bit mysterious.

But as GPUs become more important, this seems like an unsatisfactory state of affairs. Let’s try to cut through the mystery!