With the R5xx series of desktop processors ATI took the decision of having a design where they could change the ALU to texture ratio from 1:1 to 3:1, however NVIDIA are still scaling their texture units with the number of ALU pipelines - while we suspect that NVIDIA's decision is largely based on the current development path of the G7x series and may change with their next generation parts, this does currently mark a clear differentiation between ATI and NVIDIA's development paths. It also appears to be the case that ATI have settled on 16 texture units with the, completely separate, R520/R580 and Xenos (XBOX 360) developments. Can you give us an understanding of why you have settled on this number of texture units when your competition is still scaling upwards?

[Eric Demers] I really can't comment on NV's architecture, since I don't know how scalable a design it is. However, for us, we've designed the shader core to have a linear ALU scalability. We can do, 1, 2, 3, 4, etc... ALUs. We decided upon 3 based on the latest applications profiles (we used FarCry, HL2, D3, 3dmark05 and a few others for basis check), to give the best "bang for the buck". More would of cost more and not given as much, less would of not maximized the performance per area. As for the number of texture units, it's more based on available BW. The actual amount of memory BW per engine clock hasn't drastically changed since our 9700 products (ratio close to 3:4, now 7:8), but the textures have been more and more BW consuming (64b texels, larger 2kx2k or even 4kx4k textures are becoming more common). Consequently, if we scaled up the number of texture units, we would have been somewhat unbalanced compared to previous products, and with textures consuming even more BW, it would have been unbalanced for us.

In fact, I think that adding more texture is really somewhat backwards looking, were you want to shoot for high fillrates, especially in older games -- Newer games call for much more ALU instead. It's really if your designing for last year's games or next years. We chose next year's.

Having said that, I would not mind more texture power (assuming more BW), but I would not want to reduce the ratio of ALU : TEX.

[Richard Huddy] We have the same kind of fundamental research as NVIDIA as to what kinds of shaders are being used now but also we go out to those developers who we think set the technical agenda and we look not only for their straight answers, which is not enough, but we get their shaders and we suggest changes to those shaders which might give better performance in different cases and we see which one we like. So we talk to the likes of Epic and so UE3 is all taken into account, and we talk to John Carmack. For instance, I’ll be out soon on an architecture tour soon and will be visiting Crytek I will ask them what balance they are looking at in their shaders and actually ask for the shaders because the balance they are looking at the software level is not actually the same at the hardware level as the driver has a quite different understanding of what that is.

On the flip side of that, though, is that with the performances of high end chips such as R580 end users are going to be more inclined towards using higher quality modes such as Anisotropic Filtering, which eats up more texture sampling capacity (especially with the new angle invariant mode added on the X1000 series). Given the spatial sampling requirements of Anisotropic Filtering would suggest that the operation is going to be optimal for cache hits, presumably this is going to be more bound by sample rate? Do you think we'll see cases where R580's relative performance to R520 is hurt more when AF is enabled, or do you just expect the increased ALU performance of R580 to offset that in most cases?

[Eric Demers] I haven't checked completely, but roughly, X1900 AF seems to have about the same performance hit as X1800 AF. Having said that, it should, logically, have a slightly higher performance hit on X1900 than X1800, due to the higher texture bottleneck. However, in all these cases, the R580 can never be slower, per cycle, than an X1800 – It's always going to be at least the same or faster (or much faster). As well, the fetch4 support will actually allow the X1900 to improve performance in single component textures (such as shadow maps), so, effectively, the X1900 will be even faster, in general, on texturing. Finally, unless you're running a very simple app that just is texture limited, the R580's extra shading firepower will kick in to always improve performance more than any texture offset, to give effectively higher performance per cycle, than an R520.

What do you see as being the balance of bandwidth utilisation between textures and ROP's at the moment?

[Eric Demers] Right now, they seem to balance at 1:1 (TEX:ROP), but the trend is towards lowering ROPs, in general. The reality is that shading per pixel is increasing, which usually means many ALUs and many textures per pixel, as well as many cycles per pixel. Since we need only 1 ROP per cycle per pixel, effectively, the ROP throughput requirement is going down on new apps. An RV530 is a prime example – It doesn't have more ROP than the R515, but having triple the shading and double the Z, it's around 2x the speed of the R515 in a lot of cases. Finally, with HDR becoming more popular, the BW requirements of these pixels is high, so that the ROP throughput is possibly going down, even though each operation is 2x wider.