Benchmarks

Theoretical Rates

Before going on to look at any actual benchmarks scores we'll take a look at the theoretical metrics of the boards we are using in this review. Because the X1900 has the same number of texture units and ROPs as the X1800, but internally is processing three times the number of pixels we shall also include the fragment rate of each of the boards here.

ATI R580 Theoretical Rates

  Core Clock(MHz)  Fillrate(Mp/s)  Texture fillrate (Mt/s)  Fragment rate  Triangle rate (Mtris/p)  Memory Clock (MHz)  Memory Bandwidth (GB/s) 
X1900 XTX  650  10400  10400  31200  1300  775  49.6 
X1900 XT  625  10000  10000  30000  1250  725  46.4 
X1800 XT  625  10000  10000  10000  1250  750  48.0 
X850 XT PE  540  8640  8640  8640  810  590  37.8 
% Diff  Core Clock  Fillrate  Texture fillrate  Fragment rate  Triangle rate  Memory clock  Memory bandwidth 
X1900 XTX to X1900 XT  4.0%  4.0%  4.0%  4.0%  4.0%  6.9%  6.9% 
X1900 XTX to X1800 XT  4.0%  4.0%  4.0%  212.0%  4.0%  3.3%  3.3% 
X1900 XTX to X850 XT PE  20.4%  20.4%  20.4%  261.1%  60.5%  31.4%  31.4% 

Comparing the clock rates between the X1900 XTX and X1800 XT shows only a 4% difference on the core, and 3% on the memory. These clock rate differences only correspond to a 4% texture and colour fillrate increase and an extra 3% bandwidth, as all these elements are the same between the two chips. It is, of course, in the fragment pipeline were the major difference between the two boards occurs, with the X1900 XTX having a 212% pixel shader rate advantage over X1800 XT.

In comparison to the previous generation X850 XT PE the X1900 XT has a 20% core clock advantage, which translates to the same difference for colour and texture fillrates as the X850 XT PE also has 16 texture and ROP units, but only 1 pixel shader processor per pipeline, so the X1900 XTX has a 261% fragment rate advantage. The X1900 also has two extra vertex shaders than the X850, so this, coupled with the clocks, equates to 31% advantage for the X1900 XTX. The memory bus width between the two boards is the same, at 256-bit, so the bandwidth difference corresponds to their memory speed differences. Of course, what these fillrate specification hide is the differences between the underlying architectures, hence how efficient they are relative to one another at actually utilising their capabilities.

Although we won't be looking at the X1900 Crossfire board in lieu of the standard X1900 XT board in these theoretical tests, we've included the theoretical numbers just to highlight the differences between the two. With such small clock rate differences it translates into a 4% core difference (hence fill, fragment and vertex shader rates) and a 7% bandwidth difference.

ATI R580 Measured Theoretical Rates

  Color Fill  Z Fill  Single Texture  Single Texture Alpha Blend  1 Floating Point Texture 
X1900 XTX  9719.0  10209.8  7303.1  4391.4  4859.5 
X1800 XT  9447.3  9812.2  7116.9  4285.7  4670.8 
X850 XT PE  6195.8  8138.6  5617.0  3309.3  3628.9 
% Diff from Theoretical  Color Fill  Z Fill  Single Texture  Single Texture Alpha Blend  1 Floating Point Texture 
X1900 XTX  -6.5%  -1.8%  -29.8%  -57.8%  -53.3% 
X1800 XT  -5.5%  -1.9%  -28.8%  -57.1%  -53.3% 
X850 XT PE  -28.3%  -5.8%  -35.0%  -61.7%  -58.0% 
% Diff  Color Fill  Z Fill  Single Texture  Single Texture Alpha Blend  1 Floating Point Texture 
X1900 XTX to X1900 XT  2.9%  4.1%  2.6%  2.5%  4.0% 
X1900 XTX to X850 XT PE  56.9%  25.4%  30.0%  32.7%  33.9% 

When putting the theoretical rates to the test we can see that generally speaking the newer X1000 class of boards are able to get closer to those peaks than the X850 XT PE is. Beyond that, the X1900 is behaving very much like we would expect it to, with the Z fillrate confirming that only one Z sample/test is performed per cycle with R580 (unlike RV530) and the FP16 texture sampling takes two cycles, which probably results in the FP32 sampling rate taking 4. The test highlights that applying a single texture cycle and outputting a colour value is bandwidth limited with 16 textures and ROPs and nearly 50GB/s of bandwidth, and that is further exaggerated when the pixels are blending, which also has to sample the value already in the frame buffer, hence requires more bandwidth.