For now I can imagine at least one
thing to explain this. When you use hardware T&L you create vertex
buffers in the local ram of a card. These buffers contain static geometry
and they can be either in Video Ram or in AGP Ram. Now, assume for a moment
that they are in local Video Ram (which should be the fastest). If these
buffers are there and they use up memory it might very well be that the
card is running into either bandwidth problems (accessing the vertex buffers
uses up bandwidth which otherwise could have been used for the rendering)
or texture trashing occurs (textures swapping in and out of memory) due
to the fact that the vertex buffer take up room. One way to figure this
out would be to run the test also at 800x600 and 640x480 and see if it
changes anything.
I
am surprised to see that in “both†tests the highest peak and highest
minimum frame-rate is obtained using the Software T&L setting. This
leads me to believe that the issue at hand is indeed bandwidth. My educated
guess of what is happening is thus that the hardware T&L uses vertex
buffers (for the static geometry) in Local Video Memory. The T&L core
has to access these buffers and this uses bandwidth. This use of bandwidth
takes away bandwidth from the rendering core, which access the same memory
for textures and various buffers (Z, front and back buffers). Because
using hardware T&L introduces an extra load on the local Video Memory
it is slowed down. In software mode the T&L core does not access the
Local Video Memory and as a result more bandwidth is “relatively†available
for the rendering, which results in higher results (if bandwidth is the
limiting factor). The conclusion of this is that this game is Bandwidth
limited. Often people call
this fill-rate limited (which is partly correct since the limited bandwidth
limits the possible fill-rate). So it seems to me that 3dfx might be right
in the end: fill-rate and bandwidth is the main issue today, and as the
results from HardOCP show: even DDR isn’t fast enough.
Note that all of this is based on logical
deductions, there is, however, no guarantee that this explanation is correct,
it is a possible explanation but there are many others possible since
we don’t know the exact internal working of the benchmark. I do want to
point out that the effect I describe above was one of the issues we mentioned
in our critical T&L articles, which pointed out weak points, and possible
issues with this new technology.
Ok, that should cover about everything for now, we hope. Of course, there will probably be more issues that will arise as time progresses, more questions. As these come up, we hope to be able to answer these. We can’t answer every question in our articles, but we address the bigger ones. The smaller ones can usually be found answered in our forums. There has been a lot of activity there about T&L (and other things) lately. We encourage you to stop in and take a look at these, as well as raise any questions you yourself might have...
Go to the forum... And remember these specific T&L Benchmarking threads:
Questioning the T&L article Started by Newb
Do we really need T&L ? Started by rookie
How fast can software T&L be ? Started by MTM
This forums seems Biased Started by Pat
Taking the load of the CPU... Started by Evil_Bob