Pixel Shaders

To take a look at the Pixel Shader performance of GeForce FX we'll use 3DMark2001SE' three pixel shader tests.

 

 

GeForce 4 Ti4600 192.1 182.5 123.2 78.2 54.6
GeForce FX 201.2 193.4 190.9 184.9 143.3
GeForce FX (400/400) 202.4 198.8 192.8 168.1 122.9
 
GeForce FX 5% 6% 55% 136% 162%
GeForce FX (400/400) 5% 9% 56% 115% 125%

The straight Pixel Shader test on 3DMark2001SE shows a very larger performance difference between GeForce4 and GeForce FX, with the FX having a 162% performance advantage at the highest resolution – this would certainly indicate that the performance, certainly on DX8 shaders, has increased quite significantly between the two architectures. The GeForce4 is becoming fill-rate (shader) limited very early, while the 5800 at both speeds scales up much higher, maintaining a CPU/geometry limitation much more, with a fill-rate limitation coming in a little at 1600x1200. The 400/400 5800 shows a small fill-rate limitation at 1280x1024 where it has dropped back slightly from the 500/500 5800 because of its small clock speed deficit.

 

 

GeForce 4 Ti4600 141.0 115.2 77.4 48.3 33.8
GeForce FX 158.5 118.0 98.4 77.4 54.6
GeForce FX (400/400) 170.2 138.9 100.7 64.3 45.2
 
GeForce FX 12% 2% 27% 60% 62%
GeForce FX (400/400) 21% 21% 30% 33% 34%

As we’ve discussed before, the nature test, alongside the Pixel and Vertex Shaders, uses plenty of alpha textures, which requires a lot of bandwidth. At the higher resolutions the fill-rate graph shows quite an even distribution of the three boards, roughly in line with their relative bandwidths. Again, the GeForce 4 becomes fill-rate/bandwidth limited quite early in the resolutions, but the 500/500 5800 scales up much higher, with the fill-rate or bandwidth limitation taking effect mainly at 1280x1024.

What is also quite interesting is that the 5800 running at 400/400 is slightly faster at the lower resolutions than then 5800 running at 500/500 is. This could be due to memory latencies with the DDR-II RAM. The latencies would be lower running at 400MHz, though the memory bus will be tuned to account for this better at higher, more bandwidth limited resolutions, at 500MHz speeds.

 

 

GeForce 4 Ti4600 174.4 139.2 102.8 71.1 52.6
GeForce FX 173.6 129.7 111.1 91.5 68.1
GeForce FX (400/400) 182.9 145.9 107.8 75.0 55.5
 
GeForce FX 0% -7% 8% 29% 29%
GeForce FX (400/400) 5% 5% 5% 5% 6%

The Advanced Shader test displays some interesting results, in that the 5800 does not have such a huge performance advantage over the GeForce4 here as it did in the other Shader tests. This test has two rendering paths, a PS1.1 path, that the GeForce4 will take, and a PS1.4 path, which runs in fewer passes. Previously NVIDIA have had no native support for the PS1.4 specification, but with GeForce FX they should be able to execute it because to be PS2.0 compliant you have to be backwards compatible with previous shader revisions, so in theory the FX should be able to run the PS1.4 shaders via its PS2.0 support.

Geforce FX has a rather complicated architecture though. With DirectX9 / PS2.0 compliancy calling for Floating point precision the FX’s PS2.0 shaders will have to execute over its floating point hardware, and in addition to the floating point processors, NVIDIA have also supplied a number of (legacy) integer operators. At present it's thought that these legacy integer operators have the same level of functionality as GeForce4, and hence would not be able to run PS1.4 shaders, so it's possible that they are running PS1.4 code via the floating point hardware. If this is the case then it would seem as though, given clock speed parity, the 5800 is worse at executing PS1.4 shaders over the floating point pipeline than GeForce 4 is running PS1.1 (at integer precision, as GeForce4 has no float support).