512-bit had to be a significant investment. Is it here to stay at the top, or is it a bridge to something else?

It's a great technological accomplishment. It allowed for the best cost per bandwidth to be achieved on our products. Now that we've proven it can be done, we can certainly decide to use this weapon again, as required. But I won't comment on future products ;-)

Does a 512-bit bus require a die size that's going to be in the neighbourhood (or bigger) of R600 going forward?

No, through multiple layers of pads, or through distributed pads or even through stacked dies, large memory bit widths are certainly possible. Certainly a certain size and a minimum number of “consumers” is required to enable this technology, but it's not required to have a large die

If you look at the three ultra-threaded dispatch processors of Xenos, R520/R580, and R600. . .how much do they have in common, and how different is the one in R600 from its predecessors?

Another loaded question. In fact, too much to discuss here. I gave a presentation to Stanford that spent a couple of pages on the R6xx's thread processor, which gives you an idea of the changes. Basically, we needed to virtualize the main resources, which are the shaders, constants and GPR. As well, we have many more resources and many more types of resources compared to previous designs. Finally, we also have many different types of threads, all running simultaneously, in the same shader core.

So while it's an evolution of previous designs, it's an order of magnitude increase in complexity. Imagine, keeping multiple types of threads in flight, potentially running many different shaders, all sharing resources and having to orchestrate that. I believe it was the single hardest design in the chip. It ended up not using any old code, but it certainly was inspired by the old designs. Only took about 10 design engineers, for 2.5 years ;-)

Click for a bigger version

Is there any specific hardware support for CFAA that didn't exist in previous Radeon families? Should we have any expectation of further developments in CFAA with more filter types?

Yep. The shader is capable of accessing the compression information for pixels, for example. This was not possible on R5xx, so that we would have had to decompress before running shader code. As well, access to the fragment data directly as a texture was not possible directly – I believe through a somewhat convoluted way it was, but it wasn't easy. All of that is changed in the R600, for example. As for new CFAA types, it's really a driver/development issue now. I know that modifications of the adaptive AA filter are going on, and I believe that a few other items are on the agenda. It's really a simple question of SW resources (what a loaded statement). I think that the latest types are quite cool and offer a lot of variations and an ability for people to pick custom settings.

Does CFAA have any possible applicability to the "shader aliasing" issue that ISVs face?

Partially. It can sample outside the pixel, which would allow aliasing reduction, though at some cost to high frequency information. But the option is there for a user to balance things to his liking. But fundamentally, you would need to run shaders that have high frequency content at a higher rate than the current rate done in MSAA (in MSAA, a fully covered pixel gets 1 fragment only) to more thoroughly address the problem.