Pixel Engines (ROP)Following on from the Texture and Shader engines are the ROPs (Render Outputs), and below is a high level diagram of the ROP:
The ROPs are responsible for such basic operations as Z checking (to decide whether the pixel should actually be written, if it hadn’t been rejected from an earlier compare) and either writing or blending pixels to the frame buffer. The NV40 pipeline has both a Z ROP, which does the Z writing, and a C ROP. The C ROP is a combined Z and Colour ROP. The use of the C ROP is what achieves NV2A’s, NV3x’s, and now NV4x’s optimised Z / Stencil rendering path such that during non-colour rendering situations the C ROP can be utilised to write a second Z/Stencil value per clock cycle, but will be used for colour writes when value need to be written to the frame buffer. This also signifies that NV4x is only capable of 2 FSAA Multi-Sample samples per clock cycle, and indeed David Kirk confirmed this to be the case - as it has, in fact, been since NV20. To achieve 4X Multi-Sampling FSAA a second loop must be completed through the ROP over two cycles – memory bandwidth makes it prohibitive to output more samples in a cycle anyway. Unlike ATI, only one loop is therefore allowed through the ROP which continues to restrict NV4x to a native Multi-Sample AA of 4X – modes greater than 4X still require combined Super-Sampling and Multi-Sampling.
NVIDIA have finally moved away from a 4X ordered grid for their sampling patterns, with the rotated grid pattern that 3dfx introduced and ATI later adopted. Like other Multi-Sampling solutions, the pattern is derived from a sparse sampled grid – according to David Kirk there is a potential 8x8 grid of sub-sample positions, although later NVIDIA documents suggest a 4x4 grid. At present though we believe that this is not programmable, which may indicate that the sample positions are fixed. The shader core also supports Centroid Sampling, which should cut down one case which doesn't work well with Multi-Sampling FSAA. Note: For a list of currently supported DirectX caps on NV40 and the 60.72 drivers see here, and see here for a list of supported OpenGL extensions. The 60.72 drivers currently do not expose DirectX shader model 3.0 so relevant caps for this will not be present. Edit: Its come to our attentetion that the 60.72 drivers do already expose the Shader 3.0 capabilitiy bits, it is purely the lack of DirectX 9.0c (and the 9.0b caps viewer being used here) that prevent us from showing the full Shader Model 3.0 caps of NV40. |