Further Pixel Shader Tests

Let's take a look at a few more Pixel Shader tests from RightMark 3D, but first the Pixel Shader tests from Marko Dolenc's Fill-rate Tester.

PS 1.1 - Simple 4048.1 2766.1 1622.0 1489.9 1278.1
PS 1.4 - Simple 4048.0 2766.1 1622.0 1489.9 1278.1
PS 2.0 - Simple 4048.0 2766.1 1622.0 1489.9 1278.1
PS 2.0 PP - Simple 4048.0 2766.1 1622.0 1489.9 1278.1
PS 2.0 - Longer 2048.3 1398.6 816.4 749.8 642.8
PS 2.0 PP - Longer 2048.3 1398.5 816.4 749.8 642.8
PS 2.0 - Longer 4 Registers 2048.3 1398.6 816.4 749.8 642.8
PS 2.0 PP - Longer 4 Registers 2049.2 1398.6 816.4 749.8 642.8
PS 2.0 - Per Pixel Lighting 588.7 401.5 182.1 214.8 184.1
PS 2.0 PP - Per Pixel Lighting 588.7 401.5 182.1 214.8 184.1
 
PS 1.1 - Simple 46.3% 149.6% 171.7% 216.7%
PS 1.4 - Simple 46.3% 149.6% 171.7% 216.7%
PS 2.0 - Simple 46.3% 149.6% 171.7% 216.7%
PS 2.0 PP - Simple 46.3% 149.6% 171.7% 216.7%
PS 2.0 - Longer 46.5% 150.9% 173.2% 218.6%
PS 2.0 PP - Longer 46.5% 150.9% 173.2% 218.6%
PS 2.0 - Longer 4 Registers 46.5% 150.9% 173.2% 218.6%
PS 2.0 PP - Longer 4 Registers 46.5% 151.0% 173.3% 218.8%
PS 2.0 - Per Pixel Lighting 46.6% 223.3% 174.1% 219.7%
PS 2.0 PP - Per Pixel Lighting 46.6% 223.3% 174.1% 219.7%
 
PS 1.1 - Simple 70.5% 85.7% 116.4%
PS 1.4 - Simple 70.5% 85.7% 116.4%
PS 2.0 - Simple 70.5% 85.7% 116.4%
PS 2.0 PP - Simple 70.5% 85.7% 116.4%
PS 2.0 - Longer 71.3% 86.5% 117.6%
PS 2.0 PP - Longer 71.3% 86.5% 117.6%
PS 2.0 - Longer 4 Registers 71.3% 86.5% 117.6%
PS 2.0 PP - Longer 4 Registers 71.3% 86.5% 117.6%
PS 2.0 - Per Pixel Lighting 120.5% 86.9% 118.1%
PS 2.0 PP - Per Pixel Lighting 120.5% 87.0% 118.1%

One thing you notice with the tests here are that the Radeons appear to be very predicable according to instruction length, regardless of the shader profile type being used. Relative performances of most of the shaders also more or less correlate to the theoretical differences between the boards.

X800 XT PE 383.1 594.5 595.1 495.2 493.7 277.9 277.9
X800 PRO 263.2 375.7 396.5 341.7 341.2 190.5 190.5
9800 XT 184.5 208.2 218.2 190.0 189.9 104.5 104.5
9800 PRO 170.9 177.1 182.7 191.9 191.8 103.9 103.9
9700 PRO 152.8 167.6 177.4 165.2 165.2 89.3 89.3
 
X800 PRO 45.6% 58.2% 50.1% 44.9% 44.7% 45.8% 45.9%
9800 XT 107.6% 185.6% 172.8% 160.6% 160.0% 165.8% 165.8%
9800 PRO 124.1% 235.7% 225.6% 158.0% 157.4% 167.3% 167.4%
9700 PRO 150.8% 254.8% 235.4% 199.7% 198.8% 211.3% 211.3%
 
9800 XT 42.6% 80.5% 81.7% 79.8% 79.7% 82.3% 82.2%
9800 PRO 54.0% 112.1% 117.0% 78.0% 77.9% 83.3% 83.3%
9700 PRO 72.3% 124.2% 123.5% 106.8% 106.5% 113.4% 113.4%

With the various Rightmark tests we can see that, again, the minimum performance advantages the X800 XT has over the 9800 XT is twice the performance, getting close to three times in one case. All the PS2.0 tests show a slightly greater difference from the X800’s to the 9800 XT than the theoretical rates would suggest, perhaps indicating that the shader pipeline has been tuned a little more for PS2.0 operation.

 

X800 XT PE 680.9 442.5 277.9 168.0 113.5
X800 PRO 474.1 304.4 190.4 115.2 79.0
9800 XT 262.6 167.8 104.6 63.1 43.3
9800 PRO 260.7 166.5 103.8 62.8 43.0
9700 PRO 224.6 143.5 89.3 53.9 36.9
 
X800 PRO 43.6% 45.4% 45.9% 45.8% 43.8%
9800 XT 159.3% 163.7% 165.6% 166.2% 162.4%
9800 PRO 161.2% 165.7% 167.6% 167.4% 163.8%
9700 PRO 203.2% 208.5% 211.3% 211.5% 207.3%
 
9800 XT 80.6% 81.4% 82.0% 82.6% 82.5%
9800 PRO 81.9% 82.8% 83.4% 83.5% 83.5%
9700 PRO 111.1% 112.2% 113.3% 113.7% 113.8%

Looking at the Rightmark Phong Lighting test across the resolutions, we see that the performance differences stay fairly consistent across the range of resolutions, and that the performances are pretty much spaced according to their theoretical performance would suggest. Both the X800’s appear to show a very slight CPU limitation at low resolution, but for the most part they are still fill-rate (shader) bound. However, even at 1600x1200 the X800 XT PE’s performance is in excess of 100 FPS and the X800 PRO 70 FPS.

One thing that cropped up last year was that of "Shader Compiler Optimisers" -- driver routines that interrogate the assembly shader commands as they are passed to the graphics card and attempt to reconfigure the assembly into lower level code that's organised to fit much closer to the underlying hardware and hence optimally run the shader code for the particular graphics chip that it's operating.

ATI have had various revisions of a shader compiler in their drivers since the initial release of R300, and there have been several revisions of it over the past 20 months. The current compiler has been mostly borne from the efforts of their driver developers however, these are not necessarily experts on compilers. Back when the East Coast (Marlborough) team started the initial R400 project ATI realised the importance of shaders and how important shader compilers would be, however their thinking was that compilers are pretty much a solved issue and there wasn't much point in ATI attempting to reinvent the wheel. So, during the initial phases of the R400 design a team of compiler developers were hired, formerly from DEC, and they have been busy coding up shader compilers for ATI. Presumably they have been focusing on the projects the Marlborough team have been working, however they have also been working on a version for R300/R420 and the first fruits of their labours are scheduled to be dropped into the CATALYST's in a couple of releases time.

Of course, it will be critical to see where performance gains can be seen. For short PS1.x and even most PS2.0 shaders it's unlikely the gains will be that great as, in many cases, these are more dictated by texture performance rather than instruction execution performance, however for longer shaders there is more room for an effective compiler to make a difference, however longer shaders do not really appear that frequently in games just yet. Hopefully we'll be able to get hold of a beta build of these drivers to test out where gains are, if there are any.