Milton’s question - Again, isn't NVIDIA right to be concerned?  What good is a synthetic test in this situation (I'll have more on that below)?  Doesn't the deck seem unfairly stacked against the GeForce, to some degree?  Isn't real-world performance the part that we're interested in?  Why should I care that a graphics engine, software or otherwise, can spin a few shapes around at 30fps with various simple light sources and shadows?  If that engine cannot handle specular highlights, multiple shadows, reflections, etc, then what good is it to me as an indicator of in-game performance?  If it isn't an indicator of in-game performance, then what does it indicate?  And how (or why) is that useful to me?

This test is useful if you know what to look at, it’s going to be very useful to compare various hardware T&L implementations, and it’s also probably useful to compare CPUs. But you need to know what you are comparing. The comparison hardware vs. software is probably the least useful as explained above. But once other companies show their T&L hardware solutions it will be very interesting to see where NVIDIA stands. This benchmark also shows us the impact of extra light sources. More light sources can have an impact and this test shows you what happens. It places some of the beat-up comments from PR departments in perspective. Peak Throughput is always interesting to check for possible negative influences. But as explained, synthetic tests should be analyzed much more carefully than in-game tests. The conclusions you can make from the test do have an impact on what a game might do under certain circumstances and running on certain hardware. For example, say company X brings out a new Hardware T&L engine, which has 6 hardware lighting pipelines (Where GeForce only has 1). Such a company would see no drop for the 4 lights test and only a minimal drop for the 8 lights test. Now, imagine for a moment that NV15 only is a higher clocked NV10 (or a higher fill-rate solution with the same T&L), then we would know that for lighting, this new T&L from company X will be much better in games that use more light sources. Synthetic tests have meaning, but just not in the way that a normal consumer understands it. It’s not simple and easy to understand output.  

Milton’s Comment - Have we forgotten that 3D Winbench 98 showed the Intel i740 and ATI Rage Pro chipsets to be much better performers than a Voodoo2 card?  Tom's Hardware had a fascinating article on the subject, showing how some benchmarks were easily fooled by video card drivers into posting results that were far short of reality.  Let's not forget that a benchmark that doesn't reflect real-world performance is often useless except as a marketing tool for products that offer lackluster performance in actual games and applications.  Consumers are not helped by lousy or misleading benchmarks.

This is why we need to know exactly what a benchmark measures and does. Now, in the case of 3Dmark 2000, we know the scene since it’s described in the read-me that no normal consumer reads. The problem with 3D Winbench was that companies introduced illegal driver tricks and hacks, the drivers did not do everything what they should have done. Actually, testing full real-world performance is close to impossible. Let me explain this with an example: A real game responds to keyboard (or other) input, but when you play a time demo, a file is played. So, a file is read and commands that were logged are executed; this is not equal to what would happen in a real world game, in the game you press keys and the CPU handles those key presses in real-time. So, even a Quake III Arena time demo is not perfect.  

I personally believe that synthetic test can be very revealing, they can show you how good/bad a driver is, they can show what is implemented and if its done correctly and often it also allows you to punch holes in marketing talk. If marketing says 8 million and you measure in an optimal test case 4, you know something is wrong and someone needs to give a good solid answer. Unfortunately, for a consumer such tests can be confusing and misleading. This is why websites that provide reviews should explain what every number means, conclusions and comments should be supported by explanations. Higher is better is often just too simplistic for a benchmark. Again, take for example Quake III. Performance drops with higher resolution or higher color depth…great. But how much does it drop and why does it take a bigger dive after 1024x768? All of this stuff is in benchmark results, but simple fps numbers don’t show it. This is why we, at Beyond 3D, are plotting MPixel graphs; they show the impact of things like texture memory and texture swapping. There is no simple, stupid, benchmarking. Analyzing numbers takes more than just saying: X is faster than Y so X must be a better product.