Guerrilla Reveals Lighting Tech for Future Games; Talks Horizon Zero Dawn AA and 2160p checkerboard
Guerrilla Games showcases the advanced techniques used in Horizon Zero Dawn, and reveals a lighting tech that didn't made it in the game, but is ready for future titles.
During a presentation at Siggraph 2017 in Los Angeles, Gerrilla Games Principal Tech Programmer Giliam de Carpentier and Kojima Productions’ Kohei Ishiyama hosted a presentation titled Decima Engine: Advances in Lighting and AA, giving a look at the techniques used for anti-aliasing and 2160p checkerboard on PS4 Pro in Horizon Zero Dawn, and even a new lighting tech that will be used in future games by Guerrilla.
The developer’s Decima Engine is one of the first middleware to support area lights, but the initial implementation did not support GGX (a shading model that better imitates the look of real reflections on rough surfaces). Recently the developer started updating the area light system, adding GGX support to the spherical area light implementation.
Unfortunately, integrating GGX into area lights isn’t easy and requires approximation. The team picked the light bending trick described by Brian Karis, which is cheap (in terms of hardware resources) and has successfully been used in other engines before.
The results are pretty decent, but it can cause distortions, especially at grazing angles, so Guerrilla had to do its own optimization to the approximation system.
On the left in the gallery below you can see Karis’ model, in the middle a reference model, and on the right Guerrilla’s result, .
Below you can see the system applied to Horizon Zero Dawn. On the left you see Karis’ model, while on the right is Guerrilla’s optimization. Keep in mind that this technique was finished only after the release of Horizon. Yet it has been retro-fitted into the game for internal use, and is ready to be applied to future games.
Moving on to the anti-aliasing techniques used in Horizon Zero Dawn, the developer tried multiple.
Initially they tried using only FXAA, and then SMAA, but testing made clear that any technique based purely on post-processing wouldn’t cut it. They work for removing jaggies on clean and clear models, but they were insufficient for the needs of Horizon.
On top of that, post-processing techniques like SMAA are expensive. SMAA takes up to 2 milliseconds per frame to render when looking at the grass in the game. That’s three times slower than when looking up at the sky.
That caused the team to turn to temporal anti-aliasing, and ultimately to settle on a combination of FXAA and Guerrilla’s own variation of TAA. This takes at most 1 ms per frame on PS4 at 1080p.
Below you can see a detail on Aloy’s bow using (in order) no AA, FXAA, SMAA and Guerrilla’s technique using FXAA+TAA.
In the table you can read the cost in millisecond per frame of each technique when looking up at the sky and down at the grass.
Here’s another comparison on Aloy’s hair. From the first image to the fourth we see no AA (rendered in 0.31 ms), FXAA (0.55 ms), SMAA (1.26 MS) and FXAA + TAA (0.86 MS).
According to Guerrilla, TAA helps not only with sampling more details, but also with temporal stability. Below you can check out a video comparing the techniques in motion.
de Carpentier also talked about Guerilla’s implementation of 2160p Checkerboarding on PS4 Pro for the game.
The first image shows the same bow detail from before at 1080p with the FXAA+TAA solution applied. It targets 60 FPS. The second image is in 2160 P. With 16x supersampling it runs at around 1 FPS, so it’s certainly not feasible, but it shows what Guerrilla would have liked to achieve ideally if performance was not an issue.
The last image below shows what you could get to natively run at 30 FPS on a PS4 pro, 1512p with FXAA+TAA applied. It’s sharper than 1080p (left) but it’s still far from the reference (middle).
Last, but not least, we see the checkerboard rendering used in the game on PS4 Pro (left). It runs at 30 FPS like the native 1520p (right), but it shows a lot more detail, closer to the reference (center). Unfortunately, you can see a difference in sharpness between the checkerboard image and the reference
This is an inherent property of how Guerrilla does checkerboard, valuing detail over sharpness. Sharpness is great when you look at an image from very close by, but checkerboard allows details to be visible (or at least hinted at) over a wider range of viewing distances. This system also contributes more to the final image quality when downscaled to 1080p for normal full HD TV sets.
The solution used by Guerrilla Games is also stable temporally, as you can see in the video below.
Checkerboard rendering works by shading only 50% of the pixels per frame, with the other 50% shaded in the next, so ideally you get a result similar to native resolution every two frames. Dynamic scenes normally complicate the process, so you render native resolution hints like depth within each frame. Alpha-tested and alpha-blended objects may also need to be sampled and processed at native resolution within each frame.
The team could have selected checkerboard 1800p instead, and that might have given them similar or even better sharpness, but images would have been less detailed and would have had a more filtered look. They decided to look for a way to still get the detail level of 2160p at the cost of sacrificing some of the potential sharpness.
The way they found is to remove all the native resolution hints, and only used native resolution buffers for the UI and for the final backbuffer. Everything else ran at checkerboard resolution, including alpha-tested objects, alpha blended objects and post effects.
That took care of the performance issues, but they still needed to find away to make things work without the native resolution hints. Ultimately, they managed to do so by taking inspiration from how they implemented temporal anti-aliasing at 1080p.
Normally when doing checkerboarding you render 50% of the pixel centers for each frame. The other 50% is guessed by the system thanks to the native resolution hints. What Guerrilla did instead is to remnder 50% of the pixel’s corners each frame.
This provides two samples for each pixel instead of one sample for each two pixels. They can average the two closest corners samples to get the value of each pixel’s center. This even avoids the dither artefacts that are often a result of checkerboard rendering.
Every two frames this system provides four samples per pixel, resulting in the same AA stability provided by TAA at 1080p. The difference is that TAA works with samples on the edges of pixels, while this system works with samples on the corners.
Extra work was required on diagonal lines, and this is solved by applying FXAA on the checkerboard pixels, and applying a method called “Tangram” as it works similarly to the popular puzzle game.
The whole system is summarized by de Carpentier as follows. Of course, it’s highly technical.
- We first render in checkerboard resolution without any native-res hints. We also apply all post effects at that point, do the tonemapping, encode the output in either sRGB or PQ, and store the result in a 10-bit format.
- We then transform that into a tangram, and we store that in YCoCg color space.
- The YCoCg tangram is then sampled by the FXAA pass, which outputs a new tangram.
- The reason we use YCoCg is because FXAA does most of its work on luminance data, and this allows us to sample 4 luminance values per texture gather.
- The output of the FXAA pass is stored in a ping-ponged buffer, giving us the current and previous frame to resolve from.
- As a final pass, we take the current and previous tangram, convert back to RGB, and reject or reproject based on standard criteria.
- If history is to be rejected, we only sample the current tangram, giving us effectively 2 samples per pixel.
- And if history is accepted, we blend the previous and current value 50/50, and we end up with effectively 4 samples per pixel.
- And conceptually, this last part is quite similar to our TAA solution in 1080p, but without the sharpening.
- And that’s how we resolve in checkerboard mode in less than 2 milliseconds.