Benchmarks - Fill-Rates

Lets take a closer look at the pixel pipelines of the X700 to see how they are organised and see what their peak operational rates are.

X700 XT 3440.2 3689.3 2871.4 1713.8 1814.5
X700 PRO 3017.4 3243.9 2277.5 1399.2 1603.1
X800 XT 5637.1 7564.8 5000.4 2969.6 3354.6
X600 XT 1477.2 1985.6 1464.7 1117.4 973.9
9800 PRO 2665.1 2556.8 2644.9 1822.0 1462.1
 
X700 PRO 14.0% 13.7% 26.1% 22.5% 13.2%
X800 XT -39.0% -51.2% -42.6% -42.3% -45.9%
X600 XT 132.9% 85.8% 96.0% 53.4% 86.3%
9800 PRO 29.1% 44.3% 8.6% -5.9% 24.1%
 
X800 XT -46.5% -57.1% -54.5% -52.9% -52.2%
X600 XT 104.3% 63.4% 55.5% 25.2% 64.6%
9800 PRO 13.2% 26.9% -13.9% -23.2% 9.6%

Taking into account the bandwidth demands on the various types of test used here, the relative performance differences don't throw up too many surprises. We can see that in comparison the the 9800 PRO the X700 XT is faster in most fill operations, except the Single Texture Alpha Blend, which is the most reliant on bandwidth - the X700 PRO follows the same trend, but also looses out in Single Texturing, due to its larger bandwidth deficit.

What we do notice is that in many cases the X700 falls below its theoretical maximum fill-rates. In order to see if this is hardware or bandwidth related we'll run the tests again in 16-bits per pixel in order to alleviate the bandwidth demands a little.

32bpp 3440.2 3689.3 2871.4 1713.8 1814.5
16bpp 3701.9 3701.9 3573.5 2441.1 1814.5

Running in 16bpp we can see that more of the operations come closer to the theoretical fill-rate of 3800Mp/s. The Single texture test is still someway off that, but greater than half of the colour write fill-rate, indicating that RV410 is still capable of 8 blend operations if required. The Floating Point texture fill-rate is less than half the colour fill-rate, however this isn't unexpected since fewer components can be sampled per clock with float textures - if this is a 16-bit per component float texture then X700 would require two cycles to sample it.

In all, though, it would seem that the X700 pipeline is fairly straight forward, with 8 texture/fragment processor pipes and 8 ROP's, giving a peak of 8 texture samples, 8 colour writes, 8 colour blends and 8 Z/Stencil samples per clock. With Multi-Sampling FSAA enabled the X700 is capable of two Z/Stencil samples per clock meaning the fill-rate doesn't drop for 2x FSAA.

 

X700 XT 1507.9 1313.8 1058.6 785.8 599.8 483.7 403.6 344.4
X700 PRO 1193.1 1016.0 885.1 675.6 523.7 426.7 360.0 306.7
X800 XT 2776.4 2343.2 1959.7 1547.5 1210.4 996.5 835.1 708.3
X600 XT 1130.1 843.6 624.6 432.2 327.8 236.8 215.3 182.9
9800 PRO 1469.8 1107.6 751.4 622.7 479.1 387.5 321.0 274.0
 
X700 PRO 26.4% 29.3% 19.6% 16.3% 14.5% 13.4% 12.1% 12.3%
X800 XT -45.7% -43.9% -46.0% -49.2% -50.4% -51.5% -51.7% -51.4%
X600 XT 33.4% 55.7% 69.5% 81.8% 83.0% 104.2% 87.5% 88.3%
9800 PRO 2.6% 18.6% 40.9% 26.2% 25.2% 24.8% 25.7% 25.7%
 
X800 XT -57.0% -56.6% -54.8% -56.3% -56.7% -57.2% -56.9% -56.7%
X600 XT 5.6% 20.4% 41.7% 56.3% 59.8% 80.1% 67.2% 67.7%
9800 PRO -18.8% -8.3% 17.8% 8.5% 9.3% 10.1% 12.1% 12.0%

Using RightmarkD3D to test the texturing performance when adding extra texture layers we see that the behaviour is matches very much what you would expect for a chip with one texture unit per pipeline, as for all the boards there is a smooth reduction in performance. About the oddest facet we see here is that the 9800 PRO starts off with performance between the X700 XT and PRO, but drops behind the PRO at 3 layers.