Sam Glassenberg - Lead Program Manager for D3D10, Microsoft

Sam's an engaging public speaker, combining technical presenting with The Funnyâ„¢ to keep his audience tuned in. His presentation focussed on D3D10 development and how the new API impacts graphics programming, and performance, on D3D10 hardware.

Driver model affects the API

Starting with the driver model, Sam quickly pointed out that the driver model can somewhat affect the API, and that WDDM affects D3D10 in subtle ways. There's no more DEVICE_LOST to handle on a display mode change, for example, and that because a large chunk of the display driver is now in user process space, a crash in the driver just kills the process responsible, not the entire system.

Now you handle DEVICE_HUNG on your 3D device inside your app, to detect hung cases, and a new DEVICE_REMOVED enum exists to let your app know the GPU you were executing on is no longer available. Think multi-GPU, but not just SLI, Crossfire or Multichrome.

Sam made a point to note that the new VM lets you share surface data across processes, and not just app threads, letting you save resources when building systems with more than one device context and processing space.

He made the point that Microsoft have taken the opportunity to somewhat wipe the slate clean with D3D10 and refine the API and the runtime, by virtue of the new driver model on Vista. Big changes include the driver and hardware working on Set/Draw calls without the CPU, and Set calls mapping much more closely to actual register changes on the hardware.

That's part of the state-change overhead removal that's big news with D3D10, allowing Richard to throw out his Batch, Batch, Batch presentation after years of preaching. Cue developer cheers when Sam pointed that out.

No caps bits

The removal of device caps was Sam's other favourite topic. You no longer detect the GPU and then walk the caps path into code maintenance hell. All D3D10 hardware devices support the base features, so now your app should only care about performance scaling via the usual means, rather than feature scaling based on the underlying hardware, whatever that may be.

He let developers know that the API, especially with the new Geometry Shader (GS) stage, opens up new opportunities for moving more of your rendering algorithm to the GPU.

State change overhead

Moving back to stage changes, he mentioned that state is now set atomically by the driver, and that multiple states can be stored and set on the hardware, caching state blocks there for performance. Previously with D3D9, Set calls would just crap on the driver, but with D3D10 (and also D3D9 on Vista, but to a lesser extent) they're handled closer to the metal, with little CPU interaction.

More ways to render to a surface

Sam let the aseembled developers know that in addition to having double the per-pass rendertarget count compared to D3D9 (up to 8 now), you can also now bind texture arrays as rendertargets, and have masked, indexed writes into those arrays using D3D10 'views'.

The new flexibility now also maps more directly to the hardware, the surface headers fed to the chip pretty much telling the chip what can be done with those data blocks, rather than driver and CPU needing to do that particular bit of hand-holding.

Developers, be careful

And while the new API changes make it easier to do general programming on the GPU with many overheads minimised or gone, that you can still trample all over the runtime and GPU in code if you're not careful, wasting performance. It's harder to get it wrong, but it can still be done. The message was for developers to stay smart and exploit the API.

GS and DrawAuto()

Talking about the Geometry Shader, Sam was realistic about performance expectations concerning the GS, somewhat hinting that without massive on-chip intermediary space in silicon, building a perfect first-gen GS is largely out of the question for the IHVs. Instead, developers should use the new functionality but be mindful about what they'll be using it for, and consequently asking the hardware to do.

In essence, layering another render stage into the API -- and one that can data amplify geometry, before rasterisation and pixel shading takes place -- does add complexity and another place for a developer to get it wrong, or for the hardware to go slow. Developers should be mindful of the power of the GS, but realistic about its usage. Sam mentioned the 1K of primitives you can possibly output, as a means to think about that further.

He also mentioned the DrawAuto function call for GS programs that doesn't wait for the GS to finish primitive output. Rather it'll send what's done down the pipe at that particular point in time, while letting the GS carry on processing asynchronously so you don't get backed up waiting for data to raster and get shaded further down the pipe.