DirectX 12 Requires Different Optimization on Nvidia and AMD Cards, Lots of Details Shared

DirectX 12 Requires Different Optimization on Nvidia and AMD Cards, Lots of Details Shared

At GDC 2016 in San Francisco, Nvidia and AMD hosted a joint panel on DirectX 12, that DualShockers attended.  Developer Technology Engineers Gareth Thomas and Alex Dunn explained quite a lot of interesting details about working with Microsoft’s new API, also putting the accent on the fact that different optimization is required for a few features, prompting developers to include dedicated code for different brands of video cards.

We also heard about the enabling of a few interesting features like conservative rasterization, that allows for beautiful effects like the ray-traced shadows in Tom Clancy’s The division.

Below you can check out a summary of the most interesting points, and the slides that were showcased during the presentation. Most of the data was obviously developer-facing, and very technical, but there are definitely some points even gamers like us can take away from the presentation.

  • DirectX 12 is for those who want to achieve maximum GPU and CPU performance, but there’s a significant requirement in engineering time, as it demands developers to write code at a driver level that DirectX 11 takes care of automatically,. For that reason, it’s not for everyone.
  • Since it’s “closer to the metal” than DirectX 11, it requires different settings on certain things for Nvidia and AMD cards.
  • With DirectX 12 you’re not CPU-bound for rendering.
  • The command lists written in DirectX 12 need to be running as much as possible, without any delay at any point. There should be 15-30 of them per frame, bundled into 5-10 “ExecuteCommandList” calls, each of which should include at least 200 microseconds of GPU Work. Preferably more, up to 500 microseconds.
  • Scheduling latency on the operating system’s side takes 60 microseconds, so developers should put at least more than that in each call, otherwise what’s left of the 60 microseconds would be wasted idling.
  • Bundles, which are the main new feature of DirectX 12, are great to send work to the GPU very early in each frame, and that’s very advantageous for applications that require very low latency like VR.
  • They’re not inherently faster on the GPU. The gain is all on the CPU side, so they need to be used wisely. Optimizing bundles diverges for Nvidia and AMD cards, and require a different approach. In particular, for AMD cards bundles should be used only if the game is struggling on the CPU side.
  • Compute queues still haven’t been completely researched on DirectX 12. For the moment, they can offer 10% gains if done correctly, but there might be more gains coming as more research is done on the topic.
  • Since those gains don’t automatically happen unless things are setup correctly, developers should always make sure whether they do or not, as poorly scheduled compute tasks can result in the opposite outcome.
  • The use of root signature tables is where optimization between AMD and Nvidia diverges the most, and developers will need brand-specific settings in order to get the best benefits on both vendors’ card.
  • When developers find themselves with not enough video memory, DirectX 12 allows them to create overflow heaps in system memory, moving resources out of video memory at their own discretion.
  • Using aliased memory on DirectX 12 allows to save GPU memory even further.
  • DirectX 12 introduces Fences, which are basically GPU semaphores, making sure that the GPU has finished working on a resources before it moves on to the next.
  • Multi-GPU functiinality is now embedded in the DirectX 12 API.
  • It’s important for developers to keep in mind the limitations in bandwidth of different version of PCI (the interface between motherboard and video card), as PCI 2.0 is still common, and grants half the bandwidth of PCI 3.0.
  • DirectX 12 includes a “Set Stable Power State” API, and some are using it. It’s only really useful for profiling, and even then only some times. It reduces performance and should not be used in a shipped game.
  • When deciding whether to use a pixel shader or a compute shader, there are “extreme” difference in pros and cons on Nvidia and AMD cards (as shown by the table in the gallery).
  • Conservative rasterization lets you draw all the pixels touched by a triangle of your 3D models. It was possible before using a geometry shader trick, but it was quite slow. Now it’s possible to enable neat effects like the ray traced shadows in Tom Clancy’s The Division. In the picture in the gallery below you can see the detail of the shadow, with the bike’s spokes visible on the ground. That wasn’t possible without using a tray traced twchnique, which is enabled only with conservative rasterization.
  • Tiled resources can now be used on 3D assets, and grant “extreme” performance and memory saving benefits.
  • DirectX 11 is still “very much alive” and will continue to be on the side of DirectX 12 for a while.
  • Developers can’t mix and match DirectX 11 and DirectX 12. Either they commit to DirectX 12 entirely, or they shouldn’t use it.

Incidentally, if you’re a developer and you’re interested in the full audio recording of the presentation, we’re happy to share. Just contact

[On-lication reporting: Steven Santana]