Part Three: Summary
Specification Comparison
Let’s look at how ATI and NVIDIA evaluate their proud products in whitepapers first.
ATI: The RADEON 9700 is the most advanced graphics processor ever created. With 107 million transistors, it is a completely new architecture designed around the concepts of high bandwidth, parallelism, efficiency, precision, and programmability. The performance of this new architecture is staggering, more than doubling anything on the market today in every category.
NVIDIA: The “CineFX†architecture, in combination with the high-level Cg programming language, enables the paradigm shift— the convergence of real-time and cinematic quality rendering. The key factors contributing to this milestone in real-time rendering include: advanced programmability, high-precision color, high-level shading language, highly efficient architecture and high bandwidth to system memory and CPU.
Ok, let’s compare R300 with NV30 by words from their parent companies.
R300 | NV30 | Comments | |
Programmability | High | Advanced | Advanced > High |
Shading language | N/A | High-level | Cg is a highlight of NV30 |
Bandwidth | High | High (but only to system memory and CPU) | NV30 only targets AGP 8x, but R300 targets both AGP and Video memory. |
Parallelism | High | N/A | ATI emphasizes its 4VEs and 8 PPs design. |
Efficiency | High | Highly efficient architecture | NV30 emphasizes its architecture |
Precision | High (96bit) | High-precision color | NV30 > R300 |
Scalability* | 256 | ? | NVIDIA seems to dislike multi-chip design. |
* not their companies’ words, just mine
We find that both ATI and NVIDIA highlight programmability, bandwidth, efficiency and precision, overlapping about 80% of each other. Because NVIDIA does not emphasize the memory bandwidth of NV30, the possibility of 128bit interface in NV30 increases.
Of course, R300 supports DX9 HLSL and will support OpenGL2 HLSL, as does NV30.
Vendor | ATI | NVIDIA | ATI | NVIDIA | |
General Specification | Name | Radeon 9700 | NV30 | Radeon 8500 | GeForce 4 Ti |
Model | Pro | - | 4600 | ||
Manufacturing Process | 0.15um | 0.13um | 0.15um | 0.15um | |
Transistor Count | 107M | 100-120M | 63M | 68M | |
AGP | 8x | 8x | 4x | 4x | |
Core Clock Rate | 325 MHz | 400 ~ 450MHz | 275 MHz | 300 MHz | |
Tessellation Unit | N-Patches | ü | ü | ü | - |
Adaptive Tessellation | ü | ü | - | - | |
Continuous Tessellation | ü | ü | - | - | |
Displacement Mapping | ü | ü | - | - | |
Vertex Processing Unit | Shader Version | 2 | 2.0+ | 1.1 | 1.1 |
Number of Unit | 4 | 3 | 2 | 2 | |
Hardware T&L Unit | -* | ? | ü | ü | |
Transform rate | 325M Triangles/s | 272~306M triangles/s | 69M triangles/s | 136M triangles/s | |
Pixel Processing Unit | Shader Version | 2 | 2.0+ | 1.4 | 1.3 |
Number of Unit | 8 | 8 | 4 | 4 | |
Texture Unit(s)/Pipeline | 1 | 2 | 2 | 2 | |
Textures/Pass | 16 | 16 | 6 | 4 | |
Pixel fillrates | 2.6G Pixels/s | 3.2~3.6G Pixels/s | 1.1G Pixels/s | 1.2G Pixels/s | |
Texel fillrates | 2.6G Texels/s | 6.4~7.2G Texels/s | 2.2G Texels/s | 2.4G Texels/s | |
Memory Architecture | Bus width and memory type | 256bit/DDR (R300 also supports DDR-II) | 128bit/DDR-II | 128bit/DDR | 128bit/DDR |
Crossbar Controller | 4 x 64bit | 4 x 32bit | - | 4 x 32bit | |
Capacity | 128MB (256MB Max for R300) | 128MB/256MB | 128MB | 128MB | |
Vertex Cache | ü | ü | ü | ü | |
Primitive Cache | ü | ü | ? | ü | |
Color/Pixel Cache | ü | ü | - | ü | |
Texture Cache | ü | ü | ü | ü | |
Z Cache | ü | ? | - | - | |
Frequency | 620 MHz | 800M ~ 1 GHz | 550 MHz | 650 MHz | |
Bandwidth | 19.8 GB/s | 12.8 ~ 16 GB/s | 8.8 GB/s | 10.4 GB/s | |
Bandwidth Optimization | Color Compression | ü(12:1) | ü | - | - |
Fast Color-Clear | ü | ? | - | - | |
Z-Compression | ü(24:1) | ü | ü | ü | |
Fast Z-Clear | ü | ü | ü | ü | |
Early Z-Culling | ü | ü | - | ü | |
Hierarchical Z | ü(64 Pixels /Cycle Max, 3 levels) | ? | ü | - | |
Compressed Textures | ü | ü | ü | ü | |
Image Quality Enhancement | Programmable Multi-Sampling | ü(Non-Grid) | ü | - | - |
Per Sample Gamma Correction | ü | ? | - | - | |
Program Controlled Filtering | ? | ü(TXD) | - | - |
* While, unlike R200 and NV25, R300 doesn't have a separate T&L unit alongside the Vertex Shaders to execute legacy T&L code over, all T&L operations are carried out in hardware via the Vertex Shader units. Although currently uncomfirmed it's likely that NV30 will do the same.
Whether NV30 uses a 256bit memory interface or not is as yet unknown. In my opinion, NV30 has more chance to have a 128bit memory interface than a 256bit memory interface due to many hints. According to Samsung, its 1GHz DDRII memory module will be yielded volumetrically in the 3rd quarter, 2002.
R300's raw memory bandwidth, advanced memory compression (12:1 on color, 24:1 on Z) and hierarchical Z (hit 64 pixels per cycle of raw fill) do impress us here, which is the basis of the excellent presentation of R300 on FSAA.
From this table, we can estimate the performance gap among NV30, R300, NV25 and R200, if ignoring the factors of CPU power and driver quality.
- In comparison to R200, NV25 will get obvious superiority at games with many complex geometric objects or sensitive to memory bandwidth.
- R300 is better than NV25. Similar to the relation between NV25 and R200, R300 also has clear advantage at games with many complex geometric objects or sensitive to memory bandwidth. Besides, R300 will get more advantage over NV25 when running games written by complex shader programs. Currently, the distinct gap between NV25 and R300 when running games with 4xFSAA and/or 16xAF mainly comes from memory bandwidth difference, the lower performing implementation of AF in NV25 and the use of color buffer compression with R300 when using FSAA.
- We can predicate that the NV30 will have more power than R300, depending on different 3D applications. It is possible that NV30 will gain more fame when running future 3D games with complex shading effects (Note: its highly likely that we will be serveral architectural generations down the line before this level of complexity in games actually occurs. Look at DoomIII as an example as this is is only just making use of the majority of features introduced in NV15/R100 - Ed.). And we also can see that 1GHz Samsung DDR II plays a key role about the real performance of NV30.