Texture & Shader Core

As programmability of the graphics core increases, as demanded by API specifications such as DirectX9 Shader model 3.0, the mathematical capabilities of the Pixel Shader need to increase, with the onus on fixed function rendering and texturing reducing. NV3x pipeline's organisation was still very geared towards fixed function and DirectX8 rendering; however, NV4x's shader pipeline has been overhauled to take the onus away from texture performance and towards mathematical functionality and capabilities, inline with the demands of Pixel Shader 2.0 and 3.0 rendering.

Texture Capabilities


As the diagram above highlights, each pixel pipeline features one Texture Processor and two Shader units. We can also see a "L1" and "L2" texture caches - the L1 Texture cache is available per quad, so there will be one for each of the four quads in NV40, and the L2 cache is shared between all of the quads.

Each of the texture samplers in the NV40 pipeline is cable of 4 samples per cycle, or single cycle bilinear. In our discussions, David Kirk suggested that the texture units on NV40 will operate not at the pixel level, but at the quad level, such that if it was determined that the texture sampling requirements of the entire quad were less than the overall sampling abilities of the quad of texture units then potentially trilinear sampling may be achievable in a single cycle over the entire quad, where two cycles would be required with each texture sampler fixed to a pipeline. Indeed, such a texturing mechanism sounded similar to 3dfx's fabled "Rampage" chip, and it didn't come as much of a surprise to find that Rampage's chief architect was in charge of NV40's texture and shader engine. We asked Emmett if this was the case with NV40 and he replied that "some in the NV4x range would feature this as it has benefits and drawbacks".

We are trying to clarify if "per quad texturing" will actually be utilised in NV40, though at the recent NVIDIA Editors Day they made it fairly clear that trilinear filtering will require two cycles in NV40 (with Anisotropic potentially requiring more dependant on the base filtering required and the position of the surface requiring texturing).

The texture units are capable of Bilinear, Trilinear and Anisotropic filtering up to 16:1 (16X). Floating point textures are also supported, with filtering capabilities on FP16 textures.

With NV3x NVIDIA's performance suffered in direct comparisons to ATI's R300-based hardware for various reasons, but NVIDIA believes one of these was due to the LOD filtering ATI uses and the Anisotropic filtering method in place. NVIDIA have sought to minimise these differences in NV4x and we're told that alongside the various levels of Bilinear / Trilinear mix introduced in NV3x, NV4x will also feature various levels of LOD in order to match filtering. New Anisotropic filtering modes have also been inserted to level the filtering abilities with ATI.