Richard Huddy - Worldwide Developer Relations Manager, ATI


Richard's talk concerned D3D10 performance on upcoming ATI hardware. Confirming it was a unified part in hardware, he asked the assembled crowd to think of it as a 17 shader unit part, so that he wouldn't have to give the game away before it was announced officially. Surely he understands we know already that it's 19? Regardless, he forged on.

He started with a quick rundown of some basic techniques for making current D3D9 hardware happy, performance wise. Old, oft-spoken mantras of keep the pre- and post-VS caches hot, compress vertices if need be, in order to match cache line width when fetching; let D3DXOptimiseMesh do the hard work because it knows how big line sizes are; mask off unused channels so the driver and hardware have auto-vectorisation opportunities aplenty.

He said that those pre-raster caches would get bigger in upcoming hardware, and that in general you should try not to dump on any of the chip's internal storage in upcoming hardware, and to profile and always think about access patterns for data. Random access destroys your coherency, of course.

D3D10

Moving on to D3D10, Richard mentioned that the first D3D10 drivers under Vista won't be heavily optimised, and that using existing ATI D3D9 hardware on Vista, or the upcoming new part, will show modest improvements because of the new runtime. But given that developers currently code around a flawed runtime and driver model, performance differences won't be huge.

Upcoming hardware now does most state validation and setup itself, rather than via the driver (on the CPU), so you get a win there.

Multiple hardware states can be in flight on upcoming hardware, too, letting the chip quickly configure itself when changing thread types, or swapping between threads of the same type but which are executing using differing states. States can be cached on chip, too, so developers should think about state block reuse in their code, for mild wins.

GS limitations

First-gen D3D10 designs that must implement the Geometry Shader stage will initially be performance limited in some places, he warned. Developers should be aware of geometry coherency in the GS, and to certainly watch out when using flow control to generate new primitives.

He also cautioned the assembled masses that on upcoming hardware, output space for primitive data from the GS is limited. That's not to say developers shouldn't write GS programs that spit out lots of data, but to be mindful of the buckets assigned to holding everything, before they have to be streamed out to card memory.

He urged developers to think up new uses for the GS, and to tell ATI about it so they can check usage patterns and GS program algorithms for optimisation opportunities and future hardware direction. Developers were reminded that as well as amplification, the GS can kill polys before they're sent to rasterisation, so you can do geometry culling in GS programs if you wish, using the GS to deamp.

New D3D9 hardware yet to come

As a quick aside, Richard confirmed that ATI had some more D3D9 parts on the way, pre-Vista. With R580+ launching in late August and RV560 and RV570 a wee while later, developers were given a quick reminder that D3D9 development should be far from dead.

Don't be lax

Huddy then made a point to make sure you turn off bits of the chip, if you're not using them for a particular part of your rendering. If the hardware's in the right state when it's rendering, transparent speedup opportunities are everywhere on the chip if you do the right thing. Upcoming hardware is designed to do well regardless (as all GPUs are, really), but there are still wins to be had by being observant and on top of your code, just like you should be with D3D9.

The new Batch, Batch, Batch

Richard made a joke that he's been giving essentially the same presentation for 10 years to developers, and that with D3D10 he finally gets to throw it away. Batch, Batch, Batch doesn't apply anymore! What standard advice to replace it with, then?

He started by telling the developers that, to be smart with upcoming ATI D3D10 hardware is to stick close to ATI devrel and let them see your shaders, to make sure they're doing the right thing. Developers also shouldn't think that D3D9 algorithms won't work well on new hardware. Far from it, expect the new parts to be very good at D3D9, so don't throw existing work and techniques out if you know they're good.

Lastly, he urged developers not to use the SDK samples as performance indicators on the new hardware, when it arrives. It's common for developers to use SDKs almost as cut-and-paste repos, for certain effects, but they shouldn't do that this time around (at least with the current SDK). Use the samples as inspiration, not as the solution.

Summary

Richard's message was clear: D3D10 (and all that infers in terms of driver models and a new runtime) offers you, as a developer, many more opportunities for programmer productivity and creativity, simply because you have to worry about less as you engineer your code. But at the same time you should still be wary about what that new power affords you, and that you can still end up making the hardware slow if you're not careful.

And, being Worldwide Developer Relations Manager means his message that ATI devrel is there to be used as a resource to getting good work done was presented loud and clear.