Bruce Dawson

Bruce Dawson is one of Microsoft's DirectX ISV guys, and his presentation focused on application development in terms of performance profiling. Bruce's early message was don't make performance profiling a last minute ship you do before going to master. It needs to be part of the software dev cycle from the first instance with Bruce urging Windows game developers to make use of a well understood profiling infrastructure, especially to get the most out of the GPU and multi-core CPUs.
Bruce talked about having clearly designed performance goals that allow for Draw* call costs from the beginning, using those goals and profiling to focus development to make sure the performance targets are hit. Having the user adjust the performance experience only get you so far, and having the user throw more hardware at what could be your problem as a developer isn't really what you want.
Bruce urges developers to create a representative benchmark or test levels to use for performance profiles and public demos, to help collect lots of data from the users who'll eventually run the final game. Collecting that as often as the development process allows is key. A couple of laughs from the crowd ensued when Bruce suggested developers should use lower spec machines than they're used to, to make sure the game runs well on those (and thus automatically well on anything with a better spec).
Then came the slightly dubious advice of expecting around 40-60% of a modern CPU core to be taken up by the driver and OS on a Windows system, as overhead. Whether that was to urge developers to make the most efficient use of remaining CPU time where the reality is there should reasonably be more available, or whether Bruce was speaking from experience, wasn't clear. Automated performance profiling can also be one of the biggest boons to developers, allowing nightly performance testing of builds without developer interaction, with Bruce telling the audience to make sure that's in their next and upcoming games, to catch performance issues early and often.
The next round of slides were concentrated on making sure developers were using the best available performance testing tools. Intel vTune, AMD Code Analyst, Microsoft's CLR profiler if it's managed code, the ANTS profiler, Microsoft PIX, NVIDIA PerfKit and AMD GPU PerfStudio were all mentioned for Windows performance analysis, along with Event Tracing for Windows (ETW). Developers should use those, and others, to continually performance profile their applications, but not to the point of obsession or exclusion of other important development that needs to take place.
Regular performance testing should also happen on release builds, without any asserts, logging or debug code, if possible, on machines not used for development. Care should be taken to make sure the application runs as well as possible in that instance, too, Bruce recommending turning off the DWM on Vista (although it's disabled in full-screen exclusive mode anyway), making sure Vsync is off and things like the Windows Sidebar are disabled, to give the game the best chance. It's an inverse environment in places to the one an end-user will use to experience the game, but it removes more chances of external software interference into your application's runtime performance.
For graphics debugging, Bruce pushed PIX as a means to get the big picture, using its default frame tracing to get an idea of what's going on. PIX's ability to do per-pixel debugging, especially in terms of pixel history (to show how a pixel has been shaded), can be a key tool to figure out why a pixel looks the way it does, if it's not what you're expecting. PIX will capture draw calls per frame, so you can check your call budget, and it'll tell you constant update frequency, and where you change state.
File I/O Bottlenecks
Dawson moved on to talking about file I/O next, saying that bottlenecks here simply don't get enough attention. His advice was simple:
- Don't compile your shaders from HLSL stored on disk in the middle of a frame, because the disk will slow you down (he was quite serious, presumably that's happened)
- Use asynchronous I/O if you can, to load resources, so you don't block the CPU waiting for the return
- Use I/O worker threads to control that asynchronouse loading scheme
- Fully memory map large files if you have the virtual address space, which can be a huge win on 64-bit systems
- remember to use the right file access flags to trigger disk I/O fast paths (SEQUENTIAL_SCAN and RANDOM_ACCESS are hints for Windows to do the right thing)
Helping Windows Do The Right Thing
Next up was helping Windows do the right thing. Only run one heavyweight thread per CPU core so that the thread scheduler can do the right thing, and don't try and outsmart it by forcing threads to run on certain cores, like you might do on Xbox 360. The PC isn't a console and you need to respect other apps that might be running while yours is, so let the Windows scheduler manage your processor usage for you.
If you know the rough size of your application's working set, use SetProcessWorkingSetSize() to let Windows know you're going to ask for roughly that amount of memory, so it can know that allocation is coming and move things out of the way if need be. Dawson spent good time on those points, for developers writing games for Windows, especially those whose engines might do things a little differently on Xbox 360, or other target platforms.