Well, the results seem again odd at first.  I personally would have liked to see some benchmarks at 640x480, after all you don’t want too much influence from texture trashing when testing T&L so lower resolution is always better if you want to single out T&L performance from other factors. 

For now I can imagine at least one thing to explain this. When you use hardware T&L you create vertex buffers in the local ram of a card. These buffers contain static geometry and they can be either in Video Ram or in AGP Ram. Now, assume for a moment that they are in local Video Ram (which should be the fastest). If these buffers are there and they use up memory it might very well be that the card is running into either bandwidth problems (accessing the vertex buffers uses up bandwidth which otherwise could have been used for the rendering) or texture trashing occurs (textures swapping in and out of memory) due to the fact that the vertex buffer take up room. One way to figure this out would be to run the test also at 800x600 and 640x480 and see if it changes anything.  

I  am surprised to see that in “both” tests the highest peak and highest minimum frame-rate is obtained using the Software T&L setting. This leads me to believe that the issue at hand is indeed bandwidth. My educated guess of what is happening is thus that the hardware T&L uses vertex buffers (for the static geometry) in Local Video Memory. The T&L core has to access these buffers and this uses bandwidth. This use of bandwidth takes away bandwidth from the rendering core, which access the same memory for textures and various buffers (Z, front and back buffers). Because using hardware T&L introduces an extra load on the local Video Memory it is slowed down. In software mode the T&L core does not access the Local Video Memory and as a result more bandwidth is “relatively” available for the rendering, which results in higher results (if bandwidth is the limiting factor). The conclusion of this is that this game is Bandwidth limited.  Often people call this fill-rate limited (which is partly correct since the limited bandwidth limits the possible fill-rate). So it seems to me that 3dfx might be right in the end: fill-rate and bandwidth is the main issue today, and as the results from HardOCP show: even DDR isn’t fast enough. 

Note that all of this is based on logical deductions, there is, however, no guarantee that this explanation is correct, it is a possible explanation but there are many others possible since we don’t know the exact internal working of the benchmark. I do want to point out that the effect I describe above was one of the issues we mentioned in our critical T&L articles, which pointed out weak points, and possible issues with this new technology. 

Ok, that should cover about everything for now, we hope.  Of course, there will probably be more issues that will arise as time progresses, more questions. As these come up, we hope to be able to answer these.  We can’t answer every question in our articles, but we address the bigger ones.  The smaller ones can usually be found answered in our forums.  There has been a lot of activity there about T&L (and other things) lately.  We encourage you to stop in and take a look at these, as well as raise any questions you yourself might have...

Go to the forum... And remember these specific T&L Benchmarking threads:

Questioning the T&L article Started by Newb

Do we really need T&L ? Started by rookie

How fast can software T&L be ?  Started by MTM

This forums seems Biased Started by Pat

Taking the load of the CPU... Started by Evil_Bob