Apparently, NVIDIA is inclined to think that the shading program executing efficiency will take the place of memory bandwidth and become the main hurdle of real-time rendering performance for future 3D games and applications. This appears to be confirmed by the NV30 architecture.
Compared with that of R300, the more advanced architecture of NV30 will lead to the possibility that an effect which needs several rendering passes or shading programs in R300 will only need one pass and one shading program in NV30. The advanced architecture of NV30 not only improves program performance and decreases the required bandwidth, but also helps programmers to improve working efficiency.
In the Stanford Paper, Efficient Partitioning of Fragment Shaders for Multipass Rendering on Programmable Graphics Hardware by Eric Chan etc., which was published in Graphics Hardware 2002, we find some interesting Demos, which gives some comparisons among R200, R300 and NV30 (via software driver emulation). These demos mainly focus on the pixel shader power.
Images | R200 | R300 | NV30 | Note |
Procedural wood surface ( credit: Larry Gritz ) | 50 Passes | 7 Passes | 1 Pass | Limited by instructions |
Procedural flame shader (credit: Bill Mark) | 20 Passes | 3 Passes | 1 Pass | Limited by instructions |
RenderMan bowling pin + projected textured lights | 7 Passes | 5 Passes | 5 Passes | Limited by interpolants |
Procedural Wood | Procedural Flame | Projected texture lights |
Images courtesy of Eric Chan, Stanford University |
However, if the fragment processor in NV30 doesn't have similar parallel processing ability of the pixel engine in R300, the performance gap will be minimized when running pixel shaders. Besides, 128bit memory interface requires that NV30 must be equipped with expensive high speed DDRII memory, and R300 can improve its bandwidth more easily than NV30 can. I will not be surprised that ATI push its >350MHz version R300 (Ultra version? :-)) with 400MHz DDR out at the same time of the debut of NV30.
As we can see, NV30's architecture is very suitable for the professional rendering farm. By the help of Cg and OpenGL 1.4/1.5, the success of NV30GL or its successor is anticipated.
Generally speaking, Both NV30 and R300 are well designed and balanced products and consider the requirement of not only current games but also future games.
The similarities between NV30 and R300 is close to that of R100 and NV15. The viewpoint that NV15 has strong raw power and R100 has smart design was very popular at that time. But for this time, the role of ATI and NVIDIA seems to have changed: R300 has a muscular brute-force architecture (4VS units and 256bit interface) while NV30 has a sophisticated design. Very funny!
Some Interesting and/or Unconfirmed Information
Pixel shader operations in R300 execute in linear color space, not gamma space, which leads to much higher quality shading and nearly perfect AA colors (patent pending).
NV30 can automatically compute mipmapping level-of-detail in hardware, even for dependent & computed texture coordinates.
LMAIII introduced a complex algorithm which has very effective early Z-Culling and a revolutionary memory controller which significantly improves efficiency of data access.
Except for the powerful fragment processor, NV30 also introduces an advanced pixel processor which supports many programmable FSAA sampling patterns, including:
- 4X Mixedsampling (skewed grid, 8-tap filter) Quincunx with 1x2 OGSS
- 4X Mixedsampling (skewed grid) *2x RGMS with 1x2 OGSS = 4xS
- 4X Multisampling (Gaussian) *2x2 OGMS with 9tap filter
- 4X Multisampling *2x2 OGMS
- 4X Supersampling (skewed grid, 8-tap filter) 2x RGSS with 5tap filter with 1x2 OGSS
- 4X Supersampling (skewed grid) 2x RGSS with 1x2 OGSS
- 4X Supersampling (Gaussian) 2x2 OGSS with 9tap filter
- 4X Supersampling **2x2 OGSS
- 4X Supersampling (LOD bias) **2x2 OGSS with LOD adjustment
- 2X Supersampling (Quincunx) 2x ?GSS with 5tap filter
- 2X Quincunx *2x RGMS with 5tap filter
- 2X Multisampling *2x RGMS
- 2X Supersampling (vertical) 1x2 OGSS
- 2X Supersampling (horizontal) **2x1 OGSS
* known modes for GF3/4
** known modes for GF1/2
According to Tech-Report, NV30 also supports similar features like video shader in R300.
Information from Bit-Tech states that when processing FP16 per component color, NV30 has a smarter design than R300, which can provide double processing power.
Because of the small transistor count gap between R300 and NV30, 16 Texture Units of NV30, and many more powerful instructions and advanced features provided by NV30, I think that it is possible that vertex processor and fragment processor or these processors inside share the some processing power, such as math function processing unit and lighting unit. So, it is possible that we overestimate the power of NV30. In spite of this, we must admit NV30 has an elegant architecture. Let facts prove whether what I have said is right or not.
Microsoft intends to introduce even more sophisticated VS/PS 3.0, which is opposed by both NVIDIA and ATI, for longer durability to extend the life of DX9 to the birth of Longhorn (DirectX 10).