Before we go on to look at the performance of NV40 in the form for GeForce 6800 Ultra we'll take a look at an overview of the pipeline and the features it brings - most importantly being the first hardware introduction of Microsoft's DirectX9 Shader 3.0 model.

Shader Model 3.0

As mentioned before, when Microsoft introduced DirectX9 with Vertex and Pixel Shader 2.0 they also shipped with shader model 3.0 embedded in the API, which wasn't supported by any hardware at that point in time. The NV4x architecture has been designed to fully adopt the Shader Model 3.0 specification. Here we'll take a quick overview of some of the key differences between the requirements of Shader model 2.0 and 3.0.

Shader 2.0 & 3.0 Differences

Here are some key differences between the Vertex Shader models. Note that these are the HLSL targets, including support for VS_2_a, which is the vertex shader HLSL target for NV3x's VS2.0 Extended support, but which is not necessarily the definitive VS2.0 Extended target as there are many optional elements with minimum requirements that can be exceeded:

# of instruction slots 256 256 >= 512
Max # of instructions executed 65535 65535 >=65535
Instruction Predication - a a
Temp Registers 12 13 32
# constant registers >=256 >=256 >=256
Static Flow Control a a a
Dynamic Flow Control - a a
Dynamic Flow Control Depth - 24 24
Vertex Texture Fetch - - a
# of texture samplers - - 4
Geometry Instancing Support - - a

As we can see from the above table, many of the items are quite similar to VS2.0, and many more are similar to NV3x's VS2.0 Extended. Besides from the increased instruction slots, VS3.0 now requires dynamic flow control branching, where this was an optional element in PS2.0 Extended. One of the main differentiators, though, is the texture fetches that are now available in the vertex shaders, which brings some of the more overt possibilities capable with the Shader 3.0 model.

Here is the rundown for the pixel shaders. Again we'll show the HLSL render targets with a new one, PS_2_b recently discussed by ATI at GDC falling under the Pixel Shader 2.0 Extended model:

Dependant Texture Limit 4 No Limit 4 No Limit
Texture Instruction Limit 32 Unlimited Unlimited Unlimited
Position Register - - - a
Instruction Slots 32 + 64 512 512 >= 512
Executed Instructions 32 + 64 512 512 >=65535
Interpolated Registers 2 + 8 2 + 8 2 + 8 10
Instruction Predication - a - a
Index Input Registers - - - a
Temp Registers 12 22 32 32
Constant Registers 32 32 32 224
Arbitrary Swizzling - a - a
Gradient Instructions - a - a
Loop Count Register - - - a
Face Register (2-sided Lighting) - - - a
Dynamic Flow Control - - - 24

Again, we can see that PS3.0 if a fairly natural evolution of the pixel shader model with more instructions, registers and a more generalised model. As with VS3.0 dynamic branching now becomes a requirement of the PS3.0 model.

Another element to note is that high precision mode for Shader 2.0 was set for FP24, however Pixel Shader 3.0’s high precision now comes in at FP32 – FP24 and below now becomes partial precision in the Pixel Shader 3.0 model.