Vertex Shader

As the previous full pipeline diagram indicated, NV40 features 6 Vertex Shader engines. The Vertex Shader for NV4x is described as MIMD (Multiple Instruction - Multiple Data) as each of the engines can execute different instruction from one another, meaning the vertices being operated on between the units do not have to be related.

As the pipeline above shows, each vertex engine has both a full vector ALU unit with a scalar ALU in parallel, giving a total of 5 component operations per cycle, similar to R300. Elementary flow, in the form of constant branching, in Pixel Shader 2.0, however NVIDIA supported dynamic branching in NV3x which was exposed via the Vertex Shader 2.0 Extended model - with VS3.0 branching is a requirement and NVIDIA are trying to ensure high speed dynamic branching with the Vertex Shader Branch Unit.

Other than instruction lengths, the primary change, and probably most interesting in the whole of Shader 3.0, is that the vertex shaders now include a texture look-up, giving rise to numerous possibilities such as physics calculations in the Vertex Shader and displacement mapping. NV4x can handle up to 4 textures in the vertex shaders.

The texture samplers available to the vertex shader are point sampling only, no filtering is available, though they are able to cope with mip maps. Whether the mip maps are automatically generated is not clear. We asked David Kirk if the lack of filtering meant that Displacement Mapping with LOD would be possible and David suggested that developers could implement their own LOD approach with a vertex program, so it seems that this may incur a performance penalty. Bilinear filtering could also be achieved by virtue of the 4 texture samples available, but the bilinear math would need to written as a vertex program which would also incur a performance penalty (as well as requiring 4 passes in the vertex shader for each of the 4 texture samples).

The NV4x vertex shader has support for VS3.0's Geometry instancing. Also note that the NV4x hardware imposes no vertex shader instruction length limits on the shader programs - the limitations are in place due to the API.