Architecture Updates

Although NVIDIA state that G70 is a brand new architecture, in reality without any API changes and with NV40 already supporting Shader Model 3.0 there is little need to significantly alter an architecture that is already very effective, and as such the pipelines and the functional blocks all bear the same basic organisation as NV40. Although the architecture for G70 is indeed very similar to NV40, the year in-between the two products has enabled NVIDIA to make some fairly important changes that should significantly increase performance of each pipeline.

Ostensibly there appears to be no changes to the vertex shader units. The whole of the 8 vertex shaders is organised as a MIMD (Multiple Instruction, Multiple Data) array. Like NV40, each unit contains a Vector and Scalar ALU, such that two instructions (one with four data elements and another with one) can be co-issued in a single cycle and has penalty free branching, according to NVIDIA. Each vertex shader unit also has a vertex fetch processor that can fetch up to four point sampled textures and the units are threaded in order to hide texture latencies. NVIDIA state that the vertex shader has single cycle MADD capability (suggesting that NV40 didn't) and that the scalar performance has been improved. NVIDIA have also worked to improve the performance of the vertex culling and setup, probably to accommodate the throughput of the increased number of vertex shader units.

The fragment shader pipelines are where some of the larger changes have occurred for the G70 pipeline. Again, the basic layout from NV40, with two primary ALU's capable of dual issuing instructions, with the first handling the texture address processing, remains the same but changes to the ALU's should considerably increase the per-clock performance. NVIDIA say they have analysed 1300 shaders utilised in current and upcoming games and noticed that the most frequently used instruction is MADD - although NV40 has two ALU's they are not in fact each fully featured with the same instructions, instead one is a MADD unit and the other is a MUL unit; for G70 NVIDIA say they have added a MADD and MULL into each of the units that didn't previously contain them and in fact we are led to believe they are now complete instruction duplicates of each other (although, obviously the second unit doesn't have texture address processing instructions). The net result is that G70 features 48 fragment shaders of the same capabilities, with one of them having to handle the texture processing instructions. The pipelines are arranged as a single SIMD array and they will all be operating on data of the same state.

Again, like NV40 the ALU's are FP32 precision, with a free FP16 normalise on the first ALU. Each unit is a single vector unit, but can execute two instructions that fit in or below 4 components (i.e. 3+1 components, 2+2, 2+1, 1+2, 1+1). The texture processing units can perform a four sample bilinear filter in a single cycle, operating over multiple cycles for higher filtering operations, which can go up to 16x Anisotropic with either Bilinear or Trilinear filtering. The texture units also have FP16 filtering capabilities which NVIDIA say is single cycle as well and there have been performance improvements for handling larger textures. Although NVIDIA haven't spoken of it directly we have noticed from Far Cry's logs that 3Dc normal map compression was enabled by the engine, suggesting that G70's drivers exposes this FOURCC texture format*.

*Update: NVIDIA have confirmed that 3Dc isn't supported by G70 hardware, however they do support a V8U8 format which can be used for 2:1 compression of two component Normals. When an application calls for 3Dc the NVIDIA’s driver will convert the relevant textures to this format at the load time and use these during the applications operation.

Overall the ROP units appear not to have changed significantly, again having the capability of a single colour and Z/Stencil or two Z/Stencil samples per cycle and the capability to blend on FP16 render targets. The FSAA mechanism also stays the same as NV40 with a maximum of 4 multi-samples (2 per clock), and rotated grid sampling; multisampling still doesn't operate on floating point blending targets though. NVIDIA state that they have improved the performance of rotated grid AA and a new method of adding increased sampling to Alpha textures has been included (we'll take a closer look at this in the image quality section of the article), however we suspect that this is largely a shader operation rather than requiring changes to the ROP's. The ROP's do support Gamma Adjusted MSAA, however this is not enabled by default in the current drivers, but can be selected via the control panel.

Since the introduction of floating point blending and filtering more developers have adopted the capabilities however NV40 displayed some issues with this under some circumstances, especially highlighted by the Far Cry implementation. We understand that the overall pipeline of G70 has been optimised to improve the efficiency of handling floating point input and output.

The memory interface on G70 stays the same as NV40, with a 256-bit interface split divided into four 64-bit crossbars