PS4's Asynchronous Fine-Grain Compute GPU: What the Heck Can it Do?
Mark Cerny’s “The road to PS4” presentation at GameLab 2013, in Barcelona, was one of the best I heard in sixteen years of writing on games. He managed to touch some very specific points, decorating them with a lovely story of friendship and personal initiative to create what can easily be defined a lovely feat of storytelling performed by a master narrator.
I’m not surprised to hear that he gave eight hour-long presentations before, because I never heard someone else talking about GDRR5, Cell and cold hard silicon components making it sound like an epic bedtime story (if you don’t believe me just check the full video embedded at the bottom of this post). He also managed not to sound overly technical, allowing even laymen to understand what he was talking about.
Yet, like every passionate geek, he still slipped into techie talk at least once, leaving a lot of questions in many of his listeners, and that’s when he was talking about the GPU that will power the graphics of the PS4, and its future-proof features mentioned under that extremely technical-sounding umbrella named “Asynchronous Fine-Grain Compute.”
This is actually a very relevant topic, as it empowers that “rich feature set” that Cerny envisioned for the future of the console and that will give developers the tools to create better games over the years. But what does “Asynchronous Fine-Grain Compute” even mean?
Normally, a GPU is used to execute graphics commands rendering what we see on our screen, while compute commands, that normally simulate the world around us, drive the artificial intelligence and prompt the software to react to our actions, are handled by the CPU.
The GPU of the PS4 has been optimized to break that barrier, thanks to the shared memory pool and to a secondary bus that allows it to read and write directly from and to the system memory. The number of compute commands that the architecture can queue has also been dramatically increased (to 64) in order to run a relatively large number of small programs simultaneously or almost (fine-grain).
In layman terms, what this means is that the GPU can directly support the CPU in performing compute tasks when there are free resources that would normally remain unused in a traditional system with separate CPU and GPU memory. If you have a PC and you like to benchmark your video memory usage while you game, you probably noticed that there are plenty times in which way less than 100% of the available resources are being used, and this system makes sure that those resources are put to good use, while still handling the graphics commands at the same time (asynchronous).
One of the most interesting yet obscure parts of the presentation was exactly the examples brought by Cerny on what those applications to be run by the GPU could be. He mentioned ray casting for audio, decompression, physics simulation, collision detection and world simulation. Some of those are almost self-explanatory, but some definitely aren’t. Let’s examine each element in detail.
Ray Casting for Audio
Ray casting, or more commonly ray tracing, is normally a technique used in computer graphics to generate an image by tracing the path of light through the pixels on a plane (represented by the screen) and accurately simulating the effects the light would create when encountering the objects drawn. It normally renders an extremely high quality picture, but its hardware costs are prohibitive, and that’s why it’s not normally used in game design.
Those properties are then applied to each ray and a new indirect and modified ray is traced from the obstacles itself. The process is repeated over and over until all the objects on the way are cleared or the ray finally intersects the position of the listener. When that happens the sound components are combined by the sound engine and the listener finally hears the sound realistically modified by the environment around him.
Of course the more resources are available, the more rays can be cast and the more complex and realistic the simulation can be. If you want to read about this technique in full detail, you can find a patent describing it here. Of course it’s quite complicated, but also extremely interesting.
Games use texture compression in order to save rendering bandwidth. In most cases if they used raw textures, bandwidth would run out way too fast and creating richly detailed worlds would basically be impossible.
Of course the use of compression is normally very lossy. For instance, the popular DXT5 format compresses every block of 16 pixels to a fixed size of 128 bits. This isn’t ideal, as some blocks are more complex than others, and having a fixed ratio means that the more complex blocks will generate artifacts, while the blocks that could be compressed more (as they are simpler) aren’t compressed as much as they could be.
In addition to this, most games’ full texture sets are often too big to be used at the same time, and are switched in and out of memory during loading screens. When loading screens are not possible, textures need to be compressed very heavily to avoid or mitigate that annoying pop-in effect that you see in many open world games.
Using compute resources on the GPU for decompression opens interesting possibilities, like, for instance, the use of Variable Bit Rate compression/decompression algorithms that compress textures more efficiently (compressing more the less complex pixel clusters and less the blocks that show more complexity). This has the two pronged effect of increasing final quality and removing or reducing the need for loading screens and mitigating pop-in.
Of course textures aren’t the only asset that is compressed and needs to be decompressed. Audio and animation data are other examples that already normally use Variable Bit Rate compression. By moving the decompression process to the GPU when available, the resources of the CPU (that on the PS4 isn’t exactly a monster) can be used more efficiently.
Simulating physics is normally done by the CPU, and it’s a quite resource-intensive activity, especially in games that feature destructible environments that have a lot of fragments bouncing all over the place (of course assuming that those fragments are actually physically simulated and not just particles). The ability to simulate physics on the GPU already exists on PC. If you have an Nvidia video card you probably heard of PhysX, which is a good example on how this works.
What PhysX does is primarily offloading physical calculations that normally would burden the CPU to the GPU, in order to free resources on the CPU itself. The most interesting element, though, is that it also allows developers to create effects that would normally be unpractical to simulate on the CPU, increasing realism and allowing for more visual glitz in games.
Above you can see an example with Borderlands 2, but you can see a lot more here. Just click on the “i” near each title to check the related comparison video.
We don’t know if the GPU of the PS4 will actually run PhysX, as the engine is associated with Nvidia cards, and Sony has chosen the rival AMD as its component vendor. despite that the unified architecture will facilitate the implementation of similar effects despite the difference in brand.
As a final note on this topic, solid objects aren’t the only ones benefiting from this kind of GPU-based simulation, fluids are another relevant example, alongside fire, smoke and so forth.
Collision detection is linked to physical simulation, as it’s the ability to detect the intersection of two models and prevent it with the appropriate physical effect. It’s quite resource-intensive and more so if the models (or their hitboxes) are very complex. That’s why many games belonging to the current generation have only environmental collision detection (preventing players and NPCs to clip into elements of the environment) and no collision detection between mobile models.
Even when collision detection is enabled between all models, it’s often “a posteriori”, meaning that it’s calculated only after two bodies have collided. It’s less hardware-intensive but also less precise and stable, and it’s also normally late in the reaction by a frame or more.
Offloading collision detection to the GPU allows to allocate more resources to it, ensuring that it’s enabled between all models and possibly “a priori”, or precisely calculated prior to the moment of collision by analyzing the trajectory of physical objects, ensuring better fluidity and fidelity and no delay in reaction.
This is a much less specific topic, as compute resources can easily be applied to almost every element behind the simulation of the game’s world, this doesn’t just include elements governed by physics, but also those driven by the artificial intelligence and by other factors.
NPCs populating a city sporting complex action and reaction patterns, fishes in the water (yes, even something like the allegedly super-advanced fishes from Call of Duty: Ghosts), birds in the sky, weather simulation, traffic flow and quite a lot of other details that can make the world more realistic and immersive can be offloaded to the GPU in order to use its resources and those of the CPU more efficiently. You can find an slightly old but interesting (and not too complex) paper illustrating an example between many here.
The features mentioned above cover a large variety of aspects of game development, and we don’t even yet know if they’re all Cerny is planning, as they could have been brought just as examples of a larger picture. One thing is for sure, though: the ability to flexibly use the CPU or the GPU for compute actions might save developers a lot of headaches in allocating CPU resources that on consoles always tend to be rather limited.
They also have the potential to give us worlds that not only look better, but also sound better, act more naturally on their own and react in more realistic ways to our actions, feeling more alive and immersive. Obviously, at least for now, this is all theory, but it’s a theory that I can’t wait to see applied a few years from now.