Part Three: Summary



Specification Comparison

Let’s look at how ATI and NVIDIA evaluate their proud products in whitepapers first.

ATI: The RADEON 9700 is the most advanced graphics processor ever created. With 107 million transistors, it is a completely new architecture designed around the concepts of high bandwidth, parallelism, efficiency, precision, and programmability. The performance of this new architecture is staggering, more than doubling anything on the market today in every category.

NVIDIA: The “CineFX” architecture, in combination with the high-level Cg programming language, enables the paradigm shift— the convergence of real-time and cinematic quality rendering. The key factors contributing to this milestone in real-time rendering include: advanced programmability, high-precision color, high-level shading language, highly efficient architecture and high bandwidth to system memory and CPU.

Ok, let’s compare R300 with NV30 by words from their parent companies.

Programmability High Advanced Advanced > High
Shading language N/A High-level Cg is a highlight of NV30
Bandwidth High High (but only to system memory and CPU) NV30 only targets AGP 8x, but R300 targets both AGP and Video memory.
Parallelism High N/A ATI emphasizes its 4VEs and 8 PPs design.
Efficiency High Highly efficient architecture NV30 emphasizes its architecture
Precision High (96bit) High-precision color NV30 > R300
Scalability* 256 ? NVIDIA seems to dislike multi-chip design.

* not their companies’ words, just mine

We find that both ATI and NVIDIA highlight programmability, bandwidth, efficiency and precision, overlapping about 80% of each other. Because NVIDIA does not emphasize the memory bandwidth of NV30, the possibility of 128bit interface in NV30 increases.

Of course, R300 supports DX9 HLSL and will support OpenGL2 HLSL, as does NV30.

General Specification Name Radeon 9700 NV30 Radeon 8500 GeForce 4 Ti
Model Pro -
4600
Manufacturing Process 0.15um 0.13um 0.15um 0.15um
Transistor Count 107M 100-120M 63M 68M
AGP 8x 8x 4x 4x
Core Clock Rate 325 MHz 400 ~ 450MHz 275 MHz 300 MHz
Tessellation Unit N-Patches ü ü ü -
Adaptive Tessellation ü ü - -
Continuous Tessellation ü ü - -
Displacement Mapping ü ü - -
Vertex Processing Unit Shader Version 2 2.0+ 1.1 1.1
Number of Unit 4 3 2 2
Hardware T&L Unit -* ? ü ü
Transform rate 325M Triangles/s 272~306M triangles/s 69M triangles/s 136M triangles/s
Pixel Processing Unit Shader Version 2 2.0+ 1.4 1.3
Number of Unit 8 8 4 4
Texture Unit(s)/Pipeline 1 2 2 2
Textures/Pass 16 16 6 4
Pixel fillrates 2.6G Pixels/s 3.2~3.6G Pixels/s 1.1G Pixels/s 1.2G Pixels/s
Texel fillrates 2.6G Texels/s 6.4~7.2G Texels/s 2.2G Texels/s 2.4G Texels/s
Memory Architecture Bus width and memory type 256bit/DDR (R300 also supports DDR-II) 128bit/DDR-II 128bit/DDR 128bit/DDR
Crossbar Controller 4 x 64bit 4 x 32bit - 4 x 32bit
Capacity 128MB (256MB Max for R300) 128MB/256MB 128MB 128MB
Vertex Cache ü ü ü ü
Primitive Cache ü ü ? ü
Color/Pixel Cache ü ü - ü
Texture Cache ü ü ü ü
Z Cache ü ? - -
Frequency 620 MHz 800M ~ 1 GHz 550 MHz 650 MHz
Bandwidth 19.8 GB/s 12.8 ~ 16 GB/s 8.8 GB/s 10.4 GB/s
Bandwidth Optimization Color Compression ü(12:1) ü - -
Fast Color-Clear ü ? - -
Z-Compression ü(24:1) ü ü ü
Fast Z-Clear ü ü ü ü
Early Z-Culling ü ü - ü
Hierarchical Z ü(64 Pixels /Cycle Max, 3 levels) ? ü -
Compressed Textures ü ü ü ü
Image Quality Enhancement Programmable Multi-Sampling ü(Non-Grid) ü - -
Per Sample Gamma Correction ü ? - -
Program Controlled Filtering ? ü(TXD) - -

* While, unlike R200 and NV25, R300 doesn't have a separate T&L unit alongside the Vertex Shaders to execute legacy T&L code over, all T&L operations are carried out in hardware via the Vertex Shader units. Although currently uncomfirmed it's likely that NV30 will do the same.

Whether NV30 uses a 256bit memory interface or not is as yet unknown. In my opinion, NV30 has more chance to have a 128bit memory interface than a 256bit memory interface due to many hints. According to Samsung, its 1GHz DDRII memory module will be yielded volumetrically in the 3rd quarter, 2002.

R300's raw memory bandwidth, advanced memory compression (12:1 on color, 24:1 on Z) and hierarchical Z (hit 64 pixels per cycle of raw fill) do impress us here, which is the basis of the excellent presentation of R300 on FSAA.

From this table, we can estimate the performance gap among NV30, R300, NV25 and R200, if ignoring the factors of CPU power and driver quality.

  1. In comparison to R200, NV25 will get obvious superiority at games with many complex geometric objects or sensitive to memory bandwidth.
  2. R300 is better than NV25. Similar to the relation between NV25 and R200, R300 also has clear advantage at games with many complex geometric objects or sensitive to memory bandwidth. Besides, R300 will get more advantage over NV25 when running games written by complex shader programs. Currently, the distinct gap between NV25 and R300 when running games with 4xFSAA and/or 16xAF mainly comes from memory bandwidth difference, the lower performing implementation of AF in NV25 and the use of color buffer compression with R300 when using FSAA.
  3. We can predicate that the NV30 will have more power than R300, depending on different 3D applications. It is possible that NV30 will gain more fame when running future 3D games with complex shading effects (Note: its highly likely that we will be serveral architectural generations down the line before this level of complexity in games actually occurs. Look at DoomIII as an example as this is is only just making use of the majority of features introduced in NV15/R100 - Ed.). And we also can see that 1GHz Samsung DDR II plays a key role about the real performance of NV30.