Pixel Shader Performance

As we've noted in the architectural overview, the underlying ALU structure and the number of pipelines for the R520 chips stays the same as R420/R423/R480, however changes to the shader capabilities have been made, boosting it from Shader 2.x to 3.0, the instruction scheduler has been changed and many other elements surrounding the memory handling is different. Here we'll see if these changes have improved the utilisation on the ALU's within R520's Pixel Shader core.




PS1.1 Procedural 664.0 522.7 510.8 467.8 30.0% 11.7%
PS1.4 Procedural Procedural 428.8 328.4 284.9 254.9 50.5% 28.8%
PS2.0 Procedural 316.8 251.9 244.6 240.8 29.5% 4.6%
PS2.0 1 Light (FP) 254.6 202.6 212.8 198.3 19.7% 2.2%
PS2.0 1 Light (PP) 254.8 202.8 212.5 197.9 19.9% 2.5%
PS2.0 3 Lights (FP) 143.0 113.9 120.5 112.2 18.6% 1.5%
PS2.0 3 Lights (PP) 143.1 114.0 120.5 112.1 18.7% 1.7%
PS2.0a 3 Lights (FP) 59.2 47.2 50.2 46.6 17.9% 1.4%
PS2.0a 3 Lights (PP) 59.2 47.3 50.1 46.6 18.1% 1.7%

By looking at the performances of the X1800 XL in relation to the X800 XT we see that in many cases for the Rightmark tests the newer architecture isn't actually performing that much higher than the old - in these particular tests, that more or less rely solely on the Pixel Shaders, there probably isn't much room for the scheduler to optimise the shader processing as they are probably at fairly high utilisation anyway, which is why there is little difference between the two generations of architectures on the longer shaders. Perversely, were we do see the greater benefit are the shorter shaders, and the reason for this is likely because these are a little more texture bound and the improvements we've already witnessed in texture handling comes in to effect and the latencies involved in texture fetching this gives the shader dispatch processor greater leeway to be able to schedule the batches more effectively between the available processing units.

shader 2 1339 1063.0 1074 981.0 24.7% 8.4%
shader 3 918 732.0 763 706.0 20.3% 3.7%
shader 4 985 783.0 818 754.0 20.4% 3.8%
shader 5 752 603.0 630 583.0 19.4% 3.4%
shader 6 917 732.0 763 702.0 20.2% 4.3%
shader 7 887 707.0 696 631.0 27.4% 12.0%
shader 8 671 534.0 426 376.0 57.5% 42.0%
shader 9 1812 1342.0 1308 1193.0 38.5% 12.5%
shader 10 1374 1022.0 846 753.0 62.4% 35.7%
shader 11 909 724.0 717 650.0 26.8% 11.4%
shader 12 619 426.0 277 245.0 123.5% 73.9%
shader 13 638 453.0 372 327.0 71.5% 38.5%
shader 14 808 599.0 419 370.0 92.8% 61.9%
shader 15 427 342.0 299 275.0 42.8% 24.4%
shader 16 569 454.0 350 309.0 62.6% 46.9%
shader 17 643 503.0 448 397.0 43.5% 26.7%
shader 18 68 54.0 48 43.0 41.7% 25.6%
shader 19 256 196.0 147 132.0 74.1% 48.5%
shader 20 82 66.0 48 45.0 70.8% 46.7%
shader 21 151 120.0



shader 22 285 233.0 214 199.0 33.2% 17.1%
shader 23 316 259.0



shader 24 221 181.0 149 139.0 48.3% 30.2%
shader 25 180 141.0 127 118.0 41.7% 19.5%
shader 26 190 149.0 130 120.0 46.2% 24.2%

The ShaderMark v2.1 performances show higher gains between the two generations of boards, with the X1800 XL showing as much as a 74% performance increase over the X800 XT, however bear in mind that along with the architectural differences the shader profile differences have to be factored in as well as these tests will be able to run under the Shader Model 3.0 profile on the X1800's but not on the older Radeons.