Pixel Processing Unit

R300 has an outstanding pixel processing engine, more details of which can be found in Beyond3D's Radeon 9700 PRO Review.


Click for a bigger version

Simplified R300 Pixel Shader

However, from the details NVIDIA have released so far we can surmise that NV30 has an excellent fragment processor too. Although NVIDIA does not disclose details about NV30's pixel processing unit, because there is no real product for contending R300's current lead, NVIDIA has to release specifications, documents, presentations and whitepapers about NV30 for maintaining interest. That information provides the basis of analysis of pixel processing unit in NV30, which is called fragment processor in NV30.

The following diagram comes from NV30 OpenGL Extensions presentation from NVIDIA.


Click for a bigger version

NV30 Fragment (Pixel Shader) Pipeline


Here we can find that register combiner unit is still available, even in fragment program mode, because it is commonly used and provides a powerful blending model. For example, it allows for four operands, fast 1-x operations, separate operations on color and alpha components, and more. These operations could be performed by fragment programs, but would require multiple instructions and program parameter constants. Supporting both methods simultaneously allows a programmer to write a program to obtain texture colors and then use the combiners to obtain a final fragment color.

As such, there are two different types of fragment programs: one "normal" and one for combiners. For combiner programs, the texture colors 0 through 3 are taken from texture output registers 0 through 3, respectively. The other combiner registers are not modified in fragment program mode.

Information from NV30 presentations undoubtedly indicates that NV30 provides 16 texture units, and 8 pipelines also get indirect confirmation. So combining with the information from the diagram, it is obvious that NV30 has 8 pipelines, each similar to that of NV2x, plus a fragment program processing unit. So NV30 has a same pixel fillrate and much higher texel fillrate with R300 for clock to clock.

Whether the fragment program processing unit in NV30 can process texture addressing instruction and arithmetic instruction simultaneously or not is unknown now.

According to DX9 Beta 2.1, the internal precision required by PS2.0 is here:

  • Implementations vary precision automatically based on precision of inputs to a given op for optimal performance.
  • The minimum level of internal precision for temporary registers is s10e5 (FP16).
  • The minimum internal precision level for constants is s10e5 (FP16).
  • The minimum internal precision level for input texture coordinates is s16e7 (FP24).
  • Diffuse and specular registers are only required to support [0-1] range, and high-precision is not required.

So, we can see R300 is a true DX9 card in spite of 24bit internal float precision in the pixel shader pipeline. Of course, NV30, which supports true IEEE-32(s23e8) FP precision, is also a true DX9 card.

Note that only parts of the R300 pipeline are at 24bit precision, with the chip being a mixture of both 32bit and 24bit floating point precision. The core pixel shader operations are carried out at FP24 precision however the texture address operations (and the entire Vertex Shader pipeline) are IEEE-32(s23e8) FP precision. The output of shaders can be converted to lower precision, such as 32bit or 64bit per pixel, or converted up to 128bit per pixel.

It was confirmed that R300 and NV30 support both FP16 per component textures and FP32 per component textures. Besides, R300 also supports 16bit fixed point textures, and to the contrary NV30 supports 12bit fixed point textures. NV30 and R300 also support 64bit and 128bit "float" frame buffer.

R300's pixel pipeline also supports 1d/2d/3d/cubemap floating point textures where NV30 is limited to texture_rectangle. Floating point textures in R300 are limited to nearest filtering. R300 also supports muliple 128bit and 64bit texture formats, including a c4_16 format where each component is a 16bit fixed point value and full filtering, 3D textures and projected textures are supported.

Just like R300, NV30 also supports two-sided stencil, a "required" feature for running DOOM3 and for shadow volumes.