One of the things we asked Andy was that you can maybe see the number of fixed function units on the GPU just decreasing, and the chip becoming more and more generally programmable over time. Do you see that too?
Well, yes and no. I wouldn't quite say that, and one reason is certain of the things that the hardware does, like rasterisation and culling, they will be high speed and fixed function, but they do a particular thing with a particular amount of precision. So for instance if you do antialiasing with a general purpose unit, it'd be about 10x more expensive to reach the same performance using a dedicated ROP. Fixed function is always more efficient in terms of cost for the capabilities. Take something like texture filtering, where you want trilinear or anisotropic and the unit is the way it is, and we know how to build it well, so we'd almost always look to use that rather than a general purpose unit. But that being said, as much as we can we'll generalise. Whenever there's something where we won't take too big a penalty, we'll usually generalise there if we're able. And if available area is growing fast, you might decide at some point, going back to the AA example, that it's not that hard and you can burn some of your compute power to do that, relative to everything else that's being done.
So do you see that being the case even 10 or 15 years down the line, where the chip is still a decent mix of fixed and general?
Let's assume that Moore's law continues for about 10 years. I can probably calculate that right in front of you here. 10 years, given roughly double the number of transistors every 2 years, that 5 doublings, so 32x. And this year it's pretty reasonable to build a 1B transistor chip, so a 32B transistor chip....think about that. If that's all ALUs, that's a whole lot of ALUs. I certainly won't say you can't make use of that, but I don't know if that's what you do with all that silicon. You might do some other special thing. There was some talk yesterday from Evolved Machines*, and that really works for face recognition and voice recognition and things like that, so maybe you'll implement their algorithm as a fixed function block in hardware, along with your ton of ALUs. Who knows what will happen. But yeah, I think over time the trend is definitely just more and more compute units and for us, we're looking to double the performance now roughly every year.
Is that an accelerated performance curve compared to what's happening now and happened previously
It's about what we've been doing for some time. If you go all the way back to 1997 and you plot the curve, we started out doubling every 6 to 9 months, and we've fallen back to doubling every 9 to 12 now.
So if we just class doubling as going twice as wide, is there a point where you just don't want to go any further?
I think we still have a hell of a ways to go before we run out of gas on wider! We still have plenty of problems with a lot of parallelism. Think about the guys who came here to talk about CUDA. We could probably go 100x or 1000x wider and they'd still make use of the hardware. So there's a certain class of problem where we wouldn't run in to limits any time soon with an approach like that. And there are others things we know we have to add, regardless of the approach, like double precision.
Am I right in saying that for that kind of doubling, your control logic at the front of the chip for scheduling and thread control and setup scales in a similar fashion, or maybe not so much?
It's pretty straightforward, and in fact that's why we can build the different flavours at different widths, so it's roughly as you'd expect.








