During the tail of an extremely interesting presentation at at the Semana Informática in Lisbon Naughty Dog Lead Programmer Jason Gregory described the PS4’s CPU, its memory, it’s GPU, its cache architecture and more in great detail, also explaining how the studio optimizes those resources to achieve superior results thanks to the knowledge of the strengths and weaknesses of the hardware, taking full advantage of it and making code “run really fast.”
First of all, Gregory explained that while 8 gigabytes of ram seem like a lot, only five are allocated to games, and they can be filled up quite easily:
Even in the PlayStation 4 you have 5 gigs, which seems like a lot but you’ll be amazed by how quickly it fills up.
Due to that Naughty Dog is very careful on how the memory is budgeted and allocated in order to use it efficiently.
Memory fragmentation is one of the worst enemies, because it causes a game to run out of memory a lot faster than it normally would. Naughty Dog solves that by custom-tailoring memory allocators to match the software’s allocation patterns.
The studio also uses explicit memory maps in their engine, that tells at any given time where the memory is being used and what kind of memory it is.
That’s quite important because many development kits, especially PS3 ones, have twice as much memory, used for debug purposes. That means that game features should not to use that memory.
Moving on to multi core processor management, Gregory explained how The PS4, which he defined as a “highly paralel machine,” works.
There are 8 CPU cores, that are “higher quality more powerful processors than what you have on the main CPU of the PS3,” and they’re organized into two clusters.
Gregory also explains that the GPU is “more powerful than it’s necessary to render graphics at 1080p at 60 hz” and that the idea of the designers of the PS4 was to give the console extra GPU resources because it’s “incredibly good at doing massive amounts of parallel processing” and they envisioned that game designers are going to take advance of that processing to do physics, cloth simulation, fluids and more on the GPU.
With the PS3 Naughty Dog developed in conjunction with Sony’s ICE team a “Job system” to make efficient use of the multi-core CPU, and a similar one has been created for the PS4.
Only six cores are available for games, as two are used by the operating system. The Job system also uses the GPU on top of these six cores, as it’s enabled to run code. Each CPU core runs a worker thread, and while the first takes care of the main game loop, other jobs are allocated between the remaining five cores.
On the other hand the GPU takes care of the rendering and the GPGPU (genral purpose GPU programming) Wavefronts, basically the physics, cloth and similar computing mentioned above.
Another very important concept is optimization, and a crucial element of it is the “80/20 rule,” meaning that your program spends 80% of its time running 20% of your code. The rest of that code is run very seldom.
So when you optimize your code, you don’t want to optimize that 80%, because you’d be wasting your time. On the other hand Naughty Dog focuses on that 20% that really matters, and that allows them to get the most bang for their buck.
Knowing the hardware is very important as well, as there are optimizations that can be adapted only to one specific piece of hardware, given a deep knowledge of its inner workings.
Memory caching is a very relevant part of optimization, as modern processors take a fairly high number of cycles to access data from the main RAM, which is big. Then there’s a much smaller memory cache named L2, that is also much faster to be accessed, the L1 cache that is even faster, and then there are Regs on the chip itself that are super small, but basically instantaneous.
Keeping high performance data small helps thanks to this, as it can fit in the cache, which can be accessed extremely quickly. Having them small and contiguous in memory is even more beneficial.
The PS4, specifically, has eight cores arranged in two clusters. The L2 cache is actually split in two, one for each cluster, and communicating from each cluster to its own L2 cache takes 26 cycles. Communication between a cluster and the L2 cache attached to the other cluster is much slower, taking 190 cycles.
In addition to that, when you read a single byte in the main RAM, it also loads a whole 64 bytes of memory into the cache.
The knowledge of those PS4-specific quirks allows the studio to optimize the code so that it avoids having clusters that communicate with the “wrong” L2 cache and puts data on separate cache lines, removing the chance of conflicts and slowdowns.
A last very interesting detail is that the PS3 had really terrible branch prediction hardware in the CPU, meaning that “If” branches in code often caused bad performance if they weren’t given “all sorts of hints” simply because the CPU wasn’t good at predicting what the code would do.
On the other hand the PS4 has “really really good” branch prediction hardware, that will “guess” what the code will do, removing the need for all the additional work that was necessary on the PS3.
One thing is for sure: hearing this talk it’s hard not to feel even more excited about seeing the first Naughty Dog PS4 title. We’ll have to wait and see if the studio’s technomancy will really make the platform sing, but my money is on “yes.”