Pixel Shader Performance

With X1900's pixel shader pipelines being tripled in comparison to X1800, such that there are 48 pixels being operated on in parallel in the shader pipelines, pixel shading should obviously be X1900's forte in relation to X1800.

ATI R580 Pixel Shader Performance

  PS1.1 Procedural  PS1.4 Procedural Procedural  PS2.0 Procedural  PS2.0 1 Light (FP)  PS2.0 1 Light (PP)  PS2.0 3 Lights (FP)  PS2.0 3 Lights (PP)  PS2.0a 3 Lights (FP)  PS2.0a 3 Lights (PP) 
X1900 XTX  685.5  441.2  453.5  613.2  615.4  387.1  388.1  170.7  171.3 
X1800 XT  653.9  422.2  311.5  249.2  249.4  140.6  140.5  58.0  58.1 
X850 XT PE  513.0  285.1  259.1  213.4  213.8  120.6  120.8  50.3  50.2 

frames per second

ATI R580 Pixel Shader Performance Diff

  PS1.1 Procedural  PS1.4 Procedural Procedural  PS2.0 Procedural  PS2.0 1 Light (FP)  PS2.0 1 Light (PP)  PS2.0 3 Lights (FP)  PS2.0 3 Lights (PP)  PS2.0a 3 Lights (FP)  PS2.0a 3 Lights (PP) 
X1900 XTX to X1800 XT  4.8%  4.5%  45.6%  146.1%  146.7%  175.3%  176.2%  194.3%  194.7% 
X1900 XTX to X850 XT PE  33.6%  54.8%  75.0%  187.4%  187.9%  220.9%  221.2%  239.3%  241.0% 

percentage

The performance here actually herald little surprises, and behaves as we would expect it to. At the earlier tests the pixel shader lengths are fairly small, with a high texture to math ratio, hence the performance of the X1900 is equalised to the X1800 as it is constrained by its texture rate and bandwidth. As we move down the tests, though, the ratio of texture instructions to math instructions changes with more math being utilised, hence the performance difference between the two increases due to the increased shader rate of the X1900. As we've discussed, math instructions are fairly bandwidth free, at least in relation to many other operations such as texturing or colour sampling, hence when the workload calls for it the X1900 can achieve very close to its 3x shader rate, as it does here, however whether the workload calls for it in current game titles is what we'll be testing later.

ATI R580 ShaderMark 2.1 Performance

  s2  s3  s4  s5  s6  s7  s8  s9  s10  s11  s12  s13  s14  s15  s16  s17  s18  s19  s20  s21  s22  s23  s24  s25  s26 
X1900 XTX  2010  1818  1888  1598  1813  1640  1059  2146  1672  1555  806  759  990  810  976  1244  117  392  155  178  360  409  267  246  244 
X1800 XT  1312  905  971  746  905  875  661  1815  1385  896  635  653  798  443  560  661  70  262  80  119  290  323  225  187  196 
X850 XT PE  1090  768  822  636  768  701  428  1341  866  728  281  373  421  301  354  450  48  147  48    217    151  128  131 

frames per second

ATI R580 ShaderMark 2.1 Performance Diff

  s2  s3  s4  s5  s6  s7  s8  s9  s10  s11  s12  s13  s14  s15  s16  s17  s18  s19  s20  s21  s22  s23  s24  s25  s26 
X1900 XTX to X1800 XT  53.2  100.9  94.4  114.2  100.3  87.4  60.2  18.2  20.7  73.5  26.9  16.2  24.1  82.8  74.3  88.2  67.1  49.6  93.8  49.6  24.1  26.6  18.7  31.6  24.5 
X1900 XTX to X850 XT PE  84.4  136.7  129.7  151.3  136.1  134  147.4  60  93.1  113.6  186.8  103.5  135.2  169.1  175.7  176.4  143.8  166.7  222.9  65.9  76.8  92.2  86.3 

percentage difference


The Shadermark v2.1 tests don't have gains for the X1900 over the X1800 quite as exaggerated as Rightmark, however there are still some fairly large gains, with the performance differences ranging from around 20%-110%. The performance differences between tests 20 and 21 are fairly interesting as this represents the gains from a static branching test case (20) to dynamic branching (21) and although X1900 is faster in both cases, the performance gain for dynamic branching is about 15%, whereas the X1800's performance gain is 49% because it is far more shader limited in the first place.

It should be noted that Shadermarks precision tests always show a precision of s23e8 for X1900/X1800 and s16e7 for X850, which correspond to FP32 and FP24 respectively. All of the additional shader pipelines of X1900 are fixed at full FP32 precision, and no instructions are executed in partial precision.