Benchmarks



Theoretical Rates

Before going on to look at any actual benchmarks scores we'll take a look at the theoretical metrics of the X1800's in relation to the other boards we're comparing them against.

X1800 XT 625 10000 10000 1250 750 48.0
X1800 XL 500 8000 8000 1000 500 32.0
X850 XT PE 540 8640 8640 810 590 37.8
X800 XT 500 8000 8000 750 500 32.0
X1800 XL 25.0% 25.0% 25.0% 25.0% 50.0% 50.0%
X850 XT PE 15.7% 15.7% 15.7% 54.3% 27.1% 27.1%
X800 XT 25.0% 25.0% 25.0% 66.7% 50.0% 50.0%
X850 XT PE -7.4% -7.4% -7.4% 23.5% -15.3% -15.3%
X800 XT 0.0% 0.0% 0.0% 33.3% 0.0% 0.0%

By the theoretical specifications the X1800 XT has a 25% core clock advantage over the X1800 XL, so that equates to a 25% higher geometry and pixel throughput as they both use the same core in the same configuration, and has a 50% higher bandwidth by virtue of its use of GDRR3 RAM running at 750MHz. In comparison to ATI's previous high end board, the X850 XT PE, the X1800 XT has a 16% pixel processing advantage, which is purely down to the clock speed differences as the pixel pipeline counts and configurations are ostensibly similar, a 54% vertex shader advantage, thanks to the extra clockspeed and two more vertex shader units, and 27% more bandwidth.

Comparisons between the X1800 XL and the X800 XT are going to be key to see how the underlying changes in the architecture have improved performance as these parts both have the same core and memory clockspeeds and the same number of pixel processing pipelines, with only the two extra vertex shaders on the X1800 XL immediately differentiating them from these top level theoretical specifications.

Fill-Rates

Here are a few key fill-rate characteristics of the X1800 XL in relation to the X800 XT.

X1800 XL 7059.0 8000.2 2999.8 3810.1
X800 XT 5644.7 7567.4 2936.9 3319.4
X1800 XL % Faster than: 25.1% 5.7% 2.1% 14.8%

Despite these two generations of boards having what would appear to be the same pixel processing specifications, we can immediately see that the X1800 XL is attaining colour fill-rates much closer to its theoretical maximum than the X800 XT is, and in fact there is a 25% performance difference between the two boards in this metric indicating that some of the changes in the R520 architecture are having a positive effect. Although the Z fill-rates of the two boards are fairly close we can see that the X1800 XL has actually reached its peak theoretical throughput in this case, which also suggests that there would have been room for benefit of doubling the Z fill-rate on R520, as ATI did with RV530, as its bandwidth saving mechanisms mean that this is not a bandwidth limited operation, presumably even more so on the relatively bandwidth abundant X1800 XT. The blending rates between the two generations of architecture are fairly similar and while the R520 appears to operate a sample rate two FP16 texture samples per clock, the architectural changes appear to increase the efficiency giving the X1800 XL a 15% performance advantage over the X800 XT when sampling an FP16 texture.

Normal (No AA) 7059.0 8000.2
2x 5759.4 7479.3
4x 5690.0 2999.8

The sample rates of R520 with Multi-Sample Anti-Aliasing enabled stay the same as the previous generations of ATI DirectX9 products with the ROPS capable of producing two samples per clock cycle with MSAA enabled.



X1800 XL 1634.0 1434.0 1110.2 888.4 729.5 617.6 534.7 470.4
X800 XT 1545.1 1224.6 962.8 710.0 534.8 427.6 354.0 302.0
X1800 XL % Faster than: 5.8% 17.1% 15.3% 25.1% 36.4% 44.4% 51.1% 55.7%

Again, despite the appearance of having the same fill-rates and texture sampling capabilities we see that the R520 based X1800 XL has a small performance advantage of about 6% over the X800 XT when sampling a single texture layer, but the difference gradually increases to nearly 56% as more layers are applied to a single pixel. Its difficult to pinpoint where these improvements are coming from, but it is unlikely to be through the shader scheduler, as this isn't really making use of much in the way of shader functionality, but more likely due to increased efficiency of the memory - either better use of cache via the non-associative caches, better memory management from the new memory controller, better memory efficiency from the increased granularity of the memory access or combinations of any of these.