Below you can now find the questions and comments from Milton (italic) followed by some quick comments from me. I do advise reading Milton’s whole article before checking out this (if not some “quotes” might be confusing).

Milton’s question - Does Mad Onion consider this test a synthetic test?  If so, why use a customized software engine that will be used in actual games?  Why not simply use a brute-force approach to see how many triangles a T&L engine (CPU or GPU) can process?  If this is just a synthetic test, which produces an arbitrary number, isn't NVIDIA right to be concerned?  After all, the benchmark seems to indicate that the T&L engine on the GeForce is a sham!

Mad Onion definitely sees this as a synthetic test, simply because they have in-game tests in the benchmark, too. The first 2 scenes are in-game tests. Everything that follows is mainly synthetic tests that can give you “good” information. BUT, this is only if you know what they are testing and only if you know what they mean! 

So, why do they use the different version of engines? Well, let me summarize what I explained it the main article. See, you want to measure throughput. Now, with hardware, there is only one-way: you use Microsoft’s Direct3D Hardware T&L interface. There is no right, no wrong… well actually there is, since there is more than one way to pass scene information to a hardware device. Now, since NVIDIA was involved with this project they must have been able to explain the most optimal way. Actually, Microsoft already describes the most optimal way in its DX SDK docs, so I am pretty sure Mad Onion used that (the whole thing has to do with vertex buffers, but I won’t bother you with that… at least not right now). So, for hardware, there is just one single option. The reason for using different software engines is simply because of the various levels of optimizations (see original article for more details). 

Really good game engines come with their own optimized software T&L engines, so why not use something similar for the throughput test? The test would then show you the maximum throughput for a “more or less” average game engine doing just T&L. 

So, is GeForce a sham? Well, yes and no (confusing, eh?). It all depends on what you were expecting from the hardware. These numbers don’t lie, they show what is possible in this special test case and these numbers show that the more lights you use the more GeForce has a problem. 

Now, should we be comparing Hardware T&L with the CPU T&L implementation? Probably not since for the real world it does not have that much meaning (since the CPU never will be in this situation, meaning it will ever be doing 100% T&L). The results as they are right now do teach us 2 things, though. One is that “GeForce is not very good at handling many local light sources” (severe impact on peak throughput). The second conclusion is much trickier: “A High End P3 (and Athlon) can beat the GeForce hardware T&L implementation when its using 100% of its time for executing an optimized T&L software engine that is more or less representative for real in-game engines.”  

Does this mean GeForce is a sham? Well, maybe it’s just not as dominant as some people expected. Its not the mighty ruler that NVIDIA’s PR department would have loved to promote, but it still isn’t bad at all. The main point is that the situation tested here is just a test, it’s synthetic and it’s always very risky to make conclusions based on these tests. The main issue to remember here is that in a game your CPU might only have 25 or 50% of the total time to do T&L, so you should probably compare numbers that are 50 to 75% lower (for software) with the full numbers given out by the hardware to be slightly representative with true in-game situations. 

But the risk here is that we have normal consumers running benchmarks, people who only look at one simple thing: higher is better. Unfortunately, most synthetic tests do not allow you to make simple statements like: higher is better. Synthetic tests usually require careful thinking and analyzing before you can make a solid conclusion. This is why tests like Quake III Arena time demos are so easy for people, higher really is better there (usually that is- we could talk about difference between max and min and sustained rates), but in 3Dmark 2000 higher is not always better, and simple comparisons are not always possible. 

Milton’s question - He also agrees that games in the near future will contain as many as 150,000 triangles per scene, which is the same number that are used in the 3DMark T&L test.  Will any software engine really be able to handle that many, or more, triangles?  If so, don't we expect software companies to follow the lead of id Software, and offer a "higher poly-count" mode explicitly for hardware T&L-enabled cards?  The number of companies which have pledged support for the GeForce is fairly high, and many developers have responded enthusiastically to the notion of support for more polygons per scene and more detailed models and worlds. 

The big point here is: what and when is this future? I suspect this future is mainly the moment when we all have T&L. NVIDIA started to move towards T&L and one year from now T&L might be a must have feature for most games. Will software be able to handle this? Well, since we know that in a game you only have 25-50% of the time for T&L you know that this will not happen! LOD (=Level of Detail) is thus a big issue, but I personally expect that we will see a very quick drop off. Software T&L will start to die, new games starting development right now (or a couple of months ago) will ignore software T&L completely, they will rely on hardware. The period of change will see some bastard solutions where you can select T&L level of detail or turn of some options. 

Actually, from dealing with companies, I’ve found that most are more excited about having better AI and physics due to the CPU offload, rather than more detailed environments.