Registers

Maybe some people think register analysis is ridiculous. However, in truth, the number of registers and their type directly influences the power of vertex shading program. Need more details? Ok, a register number and its type can influence the performance of vertex shading program, the agility of coding, optimization of HLSL compiler and even the number of real-time rendering effects. Programmers who have written ASM code will agree with this.

Input  r 16 16 16 16 16
Temp   r/w 12 12 32 16 32
Float Constant  r 96 256 256 256 256
Integer Constant  r - 16 16 -* 16
Boolean Constant r - 16 16 -** 16
Address u/w 1(s) 1(v) 1(v) 2(v) 1(v)
Loop Counter u - 1 1 -* 1***
Sampler r - 4**** ? - 4
Predicate(Conditional Code) r/w - - - 1 1
Position w 1 1 1 1 1
Point Size w 1 1 1 1 1
Fog Coordinate w 1 1 1 1 1
Texture Coordinate w 8 8 8 8 8
Diffuse/Specular Color w 2 3 2 2 2
Back-facing Color w - 2 - 2 -
ClipPlane Distance w - 32 - 6 -

Note:
1) - No Support; r Read; w Write; u Use; s Scalar; v Vector.
2) * Although NV30 doesn't provide integer constant registers and loop counter register, any float constant register component can be used as a loop counter.
3) ** NV30 has a different scheme to implement static flow control.
4) *** In VS2.0, only the constant register bank could be indexed, in the VS3.0 input register and texture coordinate register banks can be indexed using the loop counter register.
5) **** It seems that sampler registers are used in hardware Displacement Mapping

Temp Register: The Temporary Registers are 4-component floating-point vector registers used to hold temporary results during vertex program execution. As we can see, both R300 and NV30 overrun DX9VS2.0 requirements of 12 Temp Registers. R300 even has 32 Temp Registers, which can offer great agility to ASM shader programmers.

Float Constant Register: They are also program parameter registers which are a set of 4-component floating-point vector registers containing the vertex program parameters. These registers can contain both integer and float. R300 and NV30 appear the same here.

Integer Constant and Loop Counter Register: Integer Constant Registers correspond to flow control registers which are currently used only in the LOOP and REP instructions (See them in the next part). When used as an argument to the LOOP instruction:

.x is the iteration count. (REP uses only this component).

.y is the initial value for the loop counter.

.z is the increment step for the loop counter.

The only loop counter register automatically that gets incremented in each execution of the LOOP is ENDLOOP block. When nested, any time the Loop Counter register is used it refers to the current loop. So it can be used in the block for relative addressing if needed and is invalid for use outside the loop.

R300 follows exactly the standard of DX9 VS2.0. If NV30 has a big difference here, it's that it can use any float constant component as a loop counter. Everything has two sides, which is proved once again. In an extreme situation, NV30 can use up all of its float constant registers for vertex program parameters. Thus it leads that NV30 cannot execute loops due to no loop counter. However, R300 can only have 16 independent loops in one vertex program due to the number of integer constant register; it seems that NV30 doesn't have such a limitation.

Boolean Constant Register: It is essentially a collection of bits used in static flow control instructions (eg. IF-ELSE-ENDIF) in DX9 VS2.0/3.0. There are 16 of them and hence the shader program can only have 16 independent branch conditions. R300 provides 16 boolean constant registers (16 read-only bits). However, due to lacking IF-ELSE-ENDIF instructions, it seems that R300 will use JNZ and JUMP instead.

NV30 uses more powerful BRA instruction to implement flow control and doesn't need such registers.

Address Register: NV30 provides two address registers. Besides, the Address Registers in NV30 are 4-component vector registers with signed 10-bit integer components. Two vector address registers instead of one vector address register can provide better indirect addressing, including 2D array accesses.

R300 and DX9 VS3.0 provide one vector address register, no more than DX8VS1.x, except that components other than .x can be used for relative addressing.

Predicate Register: This is a boolean vector register that can only be modified via the new SETP instruction also introduced by DX9 VS3.0 to implement predication. R300 doesn't support this. In NV30, the same function is implemented by Condition Code Register which provides both conditional write masking and dynamic flow control. Condition Code Register is a single four-component vector, and each component of this register is one of four enumerated values: GT (greater than), EQ (equal), LT (less than), or UN (unordered). Most vertex program instructions can optionally update the condition code register. Condition Code Register plays a key role in NV30's instruction set.

Output Register: Output Register in DX9 VS2.0 and DX9 VS3.0 is nothing more than that in DX8VS1.1. However, R300 and NV30 goes beyond these.

NV30 provides 2 Back-facing color registers (like NV2x), and 6 new Clip Distance Registers. These new Clip Registers are the transformed vertex's clip distances. These floating-point coordinates are used by post-vertex program clipping processes, such as only the portion of the primitive where the clip coordinate is greater than zero is rasterized. Moreover, NV30 performs fast trivial reject if all clip coordinates of a primitive are negative in hardware. NV2x and NV30 use up texture components to send the 6 clip plane distances

R300 supports 2 colors each for back face culling, but can use up to 3 for front face -- Diffuse, Specular and Other. In addition, R300 can support up to 32 clip planes, if using up texture components, however R300 also has fast native user clip planes at rejecting whole polygons outside of them, instead of just groups of rendered pixels. And clipped polygons will be anti-aliased.

It is likely that registers beyond DX9 VS2.0/3.0 specification in NV30, such as the second address register, cannot be used by 3D games and applications written for DX9 platform in theory, but NV extension in OpenGL 1.4 can utilize all of them.

Both R300 and NV30 have their own advantages. In my opinion, NV30 is a little better than R300 in register analysis due to the Condition Code Register.