Mad Onion describes this test scene as follows:

The High Polygon Count test has 30 toruses with 5000 triangles each = 150 000 triangles simultaneously on screen, an amount we think will be typical in games in one years time.

I pretty much agree that these kinds of polygon counts will be realistic one year from the release of this benchmark.

Who is Doing T&L, and How?

T&L is a very mathematically intense operation that operates on the vertices of the triangles that form an object. Essentially, every vertex that forms the object is transformed in space, which means that its coordinates undergo a translation, a rotation and a scaling (relative movement). Lighting is also a mathematically intense process that takes into account light position and orientation, viewer position and orientation, surface position and orientation, and material properties. Simply said, T&L is nothing more than mathematics. Now the big question is, who is doing these mathematical calculations?

In the case of NVIDIA's GeForce, the "GPU" is doing these mathematics. The GPU has special pipelines, which are optimized to execute the T&L mathematics. Basically, the GPU has a piece of hardware on-board specialized and optimized to do this work. Naturally, this hardware has limitations- since it is hardware and is running at a certain clockspeed, it has a certain maximum throughput.

Now, what if you don't have such a wonderful, slightly expensive, NVIDIA GeForce powered 3D card? You have mathematical operations that need to be executed. Who can do those? The answer: your main system CPU. Since you have no special custom hardware to do T&L you have to use your general purpose CPU, which is your Intel or AMD processor. Now, these general purpose CPUs are not specifically optimized, they are general purpose, which means they have to be able to do anything in terms of mathematics. This means your main CPU can do much more that the optimized, specialized GPU of the GeForce can. But, this general-purpose design means that, clock-per-clock, the general-purpose processor should be slower than the specialized processor. So, your main CPU, using software, will do T&L. Eventually, what needs to be done is that someone has to write a program that tells your general-purpose CPU how it should do mathematics that have the same end-result required for T&L. Now, for the next question: who is going to write that program?

Special Optimized Implementations

As I have already explained, T&L is a very mathematically intense process, and both Intel and AMD knew this when they designed their CPUs. Both companies also knew that 3D graphics are very popular and are the way of the future. So, both companies implemented special instructions (a program contains "instructions" that tell the general purpose CPU what to do) that are optimized for 3D, which included T&L operations. Basically, the P-III and Athlon are not "so" general-purpose; they have some "specialized" stuff on-board. The main point is this: software is not a static thing, there is no one-way to do something. Software allows you to reach the same conclusion through various ways, and this is where Special Custom Optimizations come into play.

As was stated before, Intel and AMD have both implemented special instructions for 3D purposes. These instructions are different; even more, an Athlon CPU is not at all the same as a Pentium III. While both processors can execute the same programs, they are internally different. This means that they might execute the same program differently, but with the same end result. Now, because of this software, 3D can be optimized for a certain CPU.

What Different Optimizations are There?

First of all, we have Microsoft. Microsoft designed DirectX, and DirectX contains T&L support (starting with v7). DirectX is mainly designed to allow coders to access special hardware- hardware that is optimized for certain tasks (mostly graphical). Now, ever since the start, Microsoft has supported what is known as "software emulation" for those people that don't have that special hardware. For example, an old Matrox Millennium has no real 3D hardware on board, yet this board is able to display Direct3D games. How? Through software emulations, you can have software doing what hardware could have done if you did have it. Now the same is available for T&L. Microsoft DirectX gives coders access to hardware that can do this. But, if your hardware can't do it, there is a software fallback mode known as Direct3D Software T&L. Basically, this means that Microsoft has written a program that can do T&L on the main host CPU. Now, the problem with Microsoft is that they have never been really good in optimizing stuff (hey, look at the optimized state of Windows, *grin*). And from the past we know that their software implementations are not all that great. The software renderer is rather slow compared to some of the software render modes that commercial games have. So are we stuck with Microsoft's T&L implementation? Nope.

As I explained before, AMD and Intel chips might speak the same language, but they do have their differences. And, these differences can be exploited to get overall better performance. Both Intel and AMD realized that T&L is very important, and, because of this, they had a look and they designed their own optimized programs to do T&L. The AMD software development kit (freeware) contains sample programs that explain to the coders out there how they should do T&L on the Athlon processor. I assume that Intel has similar sample code available for developers. Mad Onion, who made the benchmark, realized that Microsoft's implementation isn't the best out there, so they made their own software T&L based on optimization hints and tips from AMD and Intel. The result is that 3DMark 2000 has various settings for Software T&L: a setting for Microsoft's implementation, a setting for Intel Pentium III, a setting for AMD Athlon, and a setting from AMD K6x processors.

After all this, we have T&L that still has to be done. Who does it? If you have T&L Hardware, the hardware, which is optimized, should do it. If you don't have hardware, then software must do all the work using your main general-purpose CPU. Which software implementation should you use? Well, the fastest, of course!

We all know that there is software and software- some games rock and some games suck. The same is true for T&L software implementations: some will be great, some will be mediocre, and some will be pathetic. Which one you use depends on which is available. Most games out there have specially optimized software T&L in their 3D engines. Why? Well, simply because hardware T&L is so new and companies started to design games well before we had the hardware, doing the best they could with limited resources. Many of those companies spend months, even years, creating those software implementations. Of course, with some help from the guys that know their stuff, specifically Intel and AMD. Mad Onion, being a good software company, did their optimization work, and thus you can select from various implementations.