inFAMOUS: Second Son is definitely one of the most visually impressive games available now on a new generation system, and at the Game Developers Conference Lead Engine Programmer Adrian Bentley Explained the ins and outs of its engine and how the resources of the PS4 were used for the game.
Interestingly enough, the first thing we learn is that the game uses only 4.5 GB of RAM and 6 cores of the eight included in the PS4’s CPU. The data about memory matches what was shared about The Order: 1886.
Here’s what we learned from the panel:
- The RAM of the PS4 allowed Sucker Punch to increase memory budgets by four to eight times.
- I/O (input/output) speed was a big problem, even from the hard drive. Input/Output speed is the communication speed between the drive and the CPU.
- Measures were taken to reduce I/O pressure, caching seven more streaming chunks with a memory budget of about 240 MiB (Mebibyte, basically a more professional equivalent of megabyte) and using more and bigger media streaming pages with a budget of 40 MiB.
- Texture atlases were used for many purposes, with a budget of over 200 MiB. A texture atlas is a large texture that includes many sub-textures that can be used for many objects, instead of having separate texture files for each object.
- Ambient index was cached per static vertex with a budget of about 30 MiB.
- The tiled light list was stored for forward pass (4 MiB).
- Code was kept simple, using big linear buffers.
- Most of the 4.5 Gb of RAM actually available were used. You can see the full allocation below.
We also learn that the CPU does hundreds of different jobs while running inFAMOUS: Second Son. Each “job” is a single computing task.
Below we can see the difference in the tasks performed by the SPUs (Synergistic Processing Unit) of the PS3’s Cell processor to render inFAMOUS 2, and the six available cores of the PS4’s CPU while rendering inFAMOUS: Second Son.
In the gallery below you can see how tasks are split between threads and jobs are queued while rendering a scene.
The PS4’s CPU is defined “decent,” able to handle 30,000 draws instanced in 10,000 actual draw calls, 100-400 asynchronous raycasts per frame, 50-100 animated characters with 300+ bones, even if prefetch is not a replacement for the SPU’s direct memory access.
It was also hard to max out the CPU (while we learned in a previous interview that the GPU is used at its maximum capacity most of the time), with 50-70% used for main jobs and 5-16% for other threads.
Below we can see a breakdown of the Geometry buffers between the various effects used in the game.
Physically based materials and lighting were used. They’re less intuitive for artists, but better for lighting changes. Deferred shading was also implemented in most cases besides preintegrated skin, anistropic cloth and hair and glass.
A variety of post production effects are implemented, broken down in the slide below:
The following effects were also used:
- Indirect Diffuse Lighting with a budget of 25 MiB for the whole world and a resource cost of 1-3 ms. Data was cached to avoid redundant computation.
- Indirect Specular Lighting with Local Specular Cubemaps and Screen Space Specular Reflection.
You can see a screenshot comparison showing the game with and without those effects here.
GPU compute was used for some of the game’s rendering, having a positive effect for caching data like tiled lighting and specular probe application, particle and mesh processing, and easier code generation. Yet it also comes with some problems. While compute is fast, synchronization with the CPU can be a problem. It was solved by frontloading compute phases or using compute queues running alongside the graphics pipeline. The register count was also reduced to hide latency better. Ultimately GPU compute is defined “Super awesome”
The following slides show how compute was used for different effects:
As a wrap-up, Bentley mentioned a few elements we (or at least developers among us, gamers will just see the effects) could see in the future:
- Much more threading and compute.
- lighter weight instantiation.
- Improvement in perforce sync time.
- less manual ambient and faster baking.
- Better distant environment LOD (level of detail).
- Overhauled pathing system.
- Easier scripting reference for parts of objects.
- Fewer heavy weight objects.
If you want to see all the slides used in the presentation, you can download them here. Below you can also see more work-in-progress shots.