Other bits and pieces
PerfHUD also contains some additional features, such as a (fairly useless) debug window where the saving grace is an app filter, and a frame profiler. The frame profiler runs a quick set of experiments on your frame to show you bottlenecks in the GPU and driver, and the frame profiler groups collected experiment results by state, into what it calls buckets.
So for any given frame profile, you'll get to see what you see in the real-time Performance Dashboard, but as a single frame snapshot and in a bit more detail. The profiler will list all the draw calls by bucket, so you can spot the most expensive one, and the profiler will tell you how many pixels each draw call affected. That lets you mentally sort optimisation opportunities quickly, so you can decide to go after the expensive draw call that's taking a large chunk of frame time but isn't contributing much to the on-screen result, or whatever you like.
The best thing about this feature is that clicking on any draw call in the results list will scrub the frame to that point so you can see exactly which call it is and what it's drawing, related to the state bucket you're interested in.
You'll see unit utilisation as you'd expect, displayed by unit or as a graph that maps out the utilisation per draw call. That graph can be hard to read with large numbers of draw calls, but the overall coloured picture can be a useful one, letting you see where the frame stage bottlenecks lie. NVIDIA's double-Z rate gets its own graph too, so you can see where in the frame that feature of the GPU is enabled.
There's also an advanced view to the frame profiler, where you can see draw calls grouped by state, and see at-a-glance what unit utilisation is, and see resources relevant to the context of that draw call. Unfortunately, the frame profiler's advanced view was the one most likely to crash PerfHUD for us, be it in Andy's demo or otherwise.
Our biggest complaint with the frame profiler isn't what it shows you, or even how it goes about it. Rather there's no way to save the information out for further analysis, and there's no way to script the system to help you get more productive. As your draw call count goes up, the frame profiler and frame debugger get harder to use efficiently from a workflow perspective, and we always found ourselves yearning for ways to dump data and program the thing to make it more worth our while to use.
The frame analysis tools are likely where the real work in a debugging or analysis session is going to be done in PerfHUD, by virtue of their attention to the per-frame detail they expose, so we'd look for more improvement there in future versions.
Update
NVIDIA mention that they're looking at export of data from the frame profiler, and it's something they might be able to fold into PerfHUD 5.1, which is set for release in the near future. The data export format is likely to be good old easy-to-parse CSV, which will make any possible integration into other tools nice and simple.