gpu cache

05 Mar 2023

Pascal and Intel HD 4600 (Gen 7.5) - global loads don’t get first level caching. Only texture accesses do.

Haswell’s iGPU has pretty fast accesses to a 128 KB iGPU-private cache, which sort of mitigates the lack of a first-level cache. Pascal doesn’t cache regular loads unless you opt-in via CUDA, which is just ugh for OpenCL.

And then there’s downright pain with older hardware. The HD 5850 starts swapping over the PCIe bus even though I check for max memory allocation limits and stay within it. GPU-Z shows “dynamic memory” going up even before “dedicated memory” fills.

Fermi doesn’t support read_imageui() so no testing through the texture path on that GPU. Thankfully it has a proper first level cache setup for regular memory accesses