Further Tests

 

Overdraw Reduction

We'll use the VillageMark test to see what GeForce FX's performance is like under the conditions presented by this benchmark.

GeForce 4 Ti4600 250 176 119 76 53
GeForce FX 392 291 198 130 93
GeForce FX (400/400) 341 246 164 107 76
 
GeForce FX 57% 65% 66% 71% 75%
GeForce FX (400/400) 36% 40% 38% 41% 43%

Looking at the performances of the boards under Villagemark, which has high levels of overdraw, we can see that the 5800 Ultra has about a 75% advantage over the GeForce 4, which is more or less in line with its fill-rate and bandwidth difference.

Now, let's take a look at the effects render order has on GeForce FX to see if we can gauge the effectiveness of its HSR routines. To do this we'll use "Humus"'s GL_EXT_reme benchmark.

Overdraw factor 3 GeForce 4 Ti4600 394.58 813.88 564.73
GeForce FX 726.88 1396.16 1006.03
GeForce FX (400/400) 535.3 1064.22 751.27
Overdraw factor 8 GeForce 4 Ti4600 150.92 474.07 311.61
GeForce FX 279.37 788.02 546.01
GeForce FX (400/400) 205.43 617.9 414.98
 
Overdraw factor 3 GeForce 4 Ti4600 106% 43%
GeForce FX 92% 38%
GeForce FX (400/400) 99% 40%
Overdraw factor 8 GeForce 4 Ti4600 214% 106%
GeForce FX 182% 95%
GeForce FX (400/400) 201% 102%

Looking at this render order tester, from the percentage differences we can see that the GeForce4 is actually more efficient at rejecting pixels than the 5800 Ultra is. When we take the 5800 running at 400/400 into account we can see that the effective efficiency of the pixel rejection scheme is inversely proportional to the clock speeds. Again, could this be memory latency issues being displayed?

Given the render order test results and the Villagemarks results, it would seem that occluded pixel rejection routines on GeForce FX are no more efficient than GeForce4.

Stencil Operations

The importance of Stencil Buffer rendering speed is going to increase once DoomIII benchmarks become available. PowerVR's 'FableMark' demo, while in no way indicative of DoomIII's performace, makes extensive use of Stencil operations and it can be used to gauge the overall performance when Stencils are in operation.

GeForce 4 Ti4600 76.1 49.7 30.8 18.6 12.8
GeForce FX 159.1 105.5 65.8 40.8 28.9
GeForce FX (400/400) 132.4 86.3 54.0 33.3 23.6
 
GeForce FX 109% 112% 114% 119% 126%
GeForce FX (400/400) 74% 74% 75% 79% 84%

With Fablemark we can see that the 5800 has a much bigger performance advantage over GeForce4, to the tune of over double at high resolutions, beyond the straight fill-rate and bandwidth difference between the two. As mentioned earlier in the preview, GeForce FX has an optimised rendering path for stencils, such that the 5800 can output 8 per clock, and given the performance here it would certainly seem that this is working.

One thing to remember with Fablemark is that it is not a DirectX9 application so it will not utilise the double-sided-stencil optimisation found on DirectX9 boards such as GeForce FX, hence there could be further performance advantages over DX8 class chips with stencil rendering if this feature is enabled.

Software Vs Hardware Geometry Performance

GeForce FX utilises a new array based Vertex Shading unit, which is a departure from the NV2X line. With the core running at 500MHz NVIDIA claim a potential triangle throughput of 350 Million tris/sec.

With this type of performance we'd expect to see large performance gains from utilising 3DMark2001SE and a few gaming titles. First, let's take a look at 3DMark's Dragothic (High Detail), Nature, and Vertex Shader tests.

Software 37.2 37.3 37.2 37.2 37.3
Hardware 155.6 152.4 150.5 139.9 114.8
%Diff 76% 76% 75% 73% 68%
 
Software 61.1 60.3 58.5 45.4 34.1
Hardware 158.5 118.0 98.4 77.4 54.6
%Diff 159% 96% 68% 70% 60%
 
Software 76.0 74.1 71.3 74.4 74.8
Hardware 202.4 193.7 174.9 147.3 123.5
%Diff 166% 161% 145% 98% 65%
 
Software 162.1 125.5 108.8 89.6 67.2
Hardware 173.6 129.7 111.1 91.5 68.1
%Diff 7% 3% 2% 2% 1%

Looking at the synthetic test Geometry performance tests from 3DMark2001SE we can see that in all circumstances the hardware geometry processing of the 5800 Ultra is providing a performance increase over the geometry abilities of the Pentium 4 3.06GHz CPU.  These increases range from quite small in the Advanced Shader test, indicating that this is already limited by the Pixel Shader performance, to very large in the Nature and pure Vertex test.

3DMark2001SE's tests are only demos and hence not representative of an actual game, which has many more calculations running on the CPU than just geometry processing. Let's take a look at the performance of Dungeon Siege and Max Payne titles using software and hardware geometry processing.

Software 61.8 62.6 62.3 60.8 60.3
Hardware 68.8 69.0 67.7 67.2 67.0
%Diff 11% 10% 9% 11% 11%
 
Software 106.1 105.1 106.0 104.8 90.8
Hardware 150.6 150.9 150.8 136.5 104.2
%Diff 42% 44% 42% 30% 15%

We can see in Dungeon Siege that hardware geometry is providing a modest gain, but more or less uniform in all resolutions, which suggest that other parts of the game engine are still very limited by the CPU. Max Payne has a reasonable performance increase in the lower resolutions, but obviously this tapers back somewhat at higher resolutions, where it is becoming more fill-rate limited.