Beyond3D - T&L Investigated

T&L Investigated - Page 12

Published on 11th Jan 1999, written by Kristof Beets for Consumer Graphics - Last updated: 27th Apr 2007

4. According to the same developers loading object local would guarantee maximum speed, but it isn't very flexible so they told me its possible to morph these objects by sending over changes to the vertices of those objects, or to upload new object on the fly (just like new textures). Now this will result in a speed drop... is this similar to the problems we see when texture trashing occurs?

I'm not sure, but I'm going to guess that game developers will have to tune their vertex buffer managers in the same way they have to tune their texture managers today. An application will always need to do some amount of tuning to ensure the best performance.

5. The bandwidth to local memory is used today for accessing the framebuffer (read and write), z- and stencil-buffer (read and write), texture memory (read and some uploading), RamDAC. Now all of this is done today over a bus with a width of 128bits at 200Mhz (and even a bit higher on more expensive products, usually its lower however). Now if we look a GeForce256 and we assume that a local vertex cache is used then this means an additional data stream (reading and some writing for dynamic objects) to local memory. IMHO this memory is already saturated... its already at it limits and now you add even more datastreams... can you keep up speed?

Well first of all, much of the rendering does not require the bandwidth you talk about. When an application is not blending (like, for example, the first (possibly multitextuerd) pass of a game like Quake3, you do not need RGB reads. When an app renders triangles that are behind triangles already rendered into Z, there are no Z writes. And, well, there's lots of neat tricks that the GeForce does to maximize its efficiency on reads/writes to that memory. CPUs face the same problem and solve it with an L1 and L2 cache. But in addition to that, there's already been some customers who have talked about their DDRAM (double data rate memory) GeForce 256 products, and that will provide up to double the current bandwidth.

6. The press release mentions: "32 texture samples per clock". Now is this sustained or is this an average number relying on a texture cache on the chip?

I'm not really sure to be honest.

7. The press release also mentions: "8 hardware lights". However the two NVIDIA developer support guys I talked to told me you only have 2 true hardware lights. They said that "if" you are really geometry limited you would see a slowdown for adding more than 2 lights. However they quickly added that this would not happen in real world situations wince the board will be fill-rate limited 99% of the time. Because of this "almost forced" fill-rate limit you have so many cycles left that allowing the software people to use 8 lights does not cause a hit. Is this correct ? Doesn't this mean that your product is very unbalanced (too much T&L and not enough fill-rate) ?

I think you're mistaken. Obviously there's some misinformation out there. For the record, the GeForce 256 has 8 (EIGHT, between 7 and 9, 4 + 4) lights in hardware. Infinite lights come at very little cost to the pipeline's performance, and local lights (point / spot lights) come at a bit more cost, but still beat the hell out of anything I've ever seen a CPU do as far as performance goes.

T&L Investigated - Page 12

Page Navigation