Registers
The following is comparison from DX8 PS1.4 to DX9 PS3.0.
Register Type | R/W | DX8.1 | DX 9 PS2.0 | R300 | NV30 | DX 9 PS3.0 |
Temp | r/w | 6 | 12 | 32 | 32 | 32 |
Temp Pseudo | w | - | - | - | 2 | - |
Position Input | r | - | - | - | 1 | 1 |
Input | r | - | - | - | - | 10* |
Color Input | r | 2 | 2 | 2 | 2 | - |
Texture Coordinate | r | 6 | 8 | 8 | 8 | - |
Fog Distance/Coordinate | r | - | - | - | 1 | - |
Float Constant | r | 8 | 32 | 32 | 512** | 240 |
Integer Constant | r | - | - | - | - | 16 |
Bool Constant | r | - | - | - | - | 16 |
Sampler | r | - | 16 | 16 | 16*** | 16 |
Backface Bit | r | - | - | - | - | 1 |
Loop Counter | u | - | - | - | - | 1 |
Predicate (Conditional Code) | r/w | - | - | - | 1 | 1 |
Output Register | ||||||
Color | w | 1 | #MRT | 4 | 2**** | #MRT |
Depth | w | 1 | 1 | 1 | 1 | 1 |
Texture | w | - | - | - | 4 | - |
Note:
1) - No Support; r Read; w Write; u Use; s Scalar; v
Vector; #MRT number of Multiple Render Target.
2) * General input registers instead of color and texture coordinate registers.
3) ** NV30 does not provide Constant Registers. Constants in NV30 PS are stored
in instruction slots.
4) *** Known as texture image unit in NV30.
5) **** NV30 does not support MRT, these two register cannot be used
simultaneously.
Temporary Register: Both R300 and NV30 overkill the DX9 VS2.0 specification and reach the DX9 PS3.0 specification, yet temporary registers in NV30 have more agility than R300’s do. When using FP16, NV30 can store up to 64 four FP16 values. In the same situation, R300 only can store 32 values.
Temp Pseudo Register: In addition to the normal temporary registers, there are two temporary pseudo-registers, "RC" and "HC", in NV30. RC and HC are treated as unnumbered, write-only temporary registers. The components of RC have an FP32 data type; the components of HC have an FP16 data type. The sole purpose of these registers is to permit instructions to modify the condition code register without overwriting the values in any temporary register.
Position Input Register: This refers to the post-perspective divide, aka post-viewport transform window space coordinate. Both NV30 and PS3.0 support it. It is very useful for render-to-texture passes, since it gives us the exact location on the screen that we are rendering to. That is to say it corresponds to exactly the location on screen that the last pass rendered to. It not only saves wasting a texture coordinate slot to do multipass rendering, but the sample location is also more exact. It's hard to ensure that we are sampling at the right sub-pixel position if we do the viewport transform our self.
Constant Register: R300 follows PS2.0 specification accurately, providing 32 float constant registers. In fact, NV30 does not have constant registers, but it can store all required constants in instruction slots. Of course, slots can store both float constants and integer constants. Each constant occupies one slot, and the max number of constants in NV30 is 512. NV30’s scheme is more suitable for the complex pixel shader program used in DCC which needs much more parameters.
Encoding constant in the instructions of the program carries a performance risk. Say that we have a single pixel shader program and we want to render multiple objects with only small changes (to avoid repetitiveness). This can be done by simply changing a couple of constants in the pixel shader program in between the rendering of each object. Changing a constant is easy to do and is easy and quick on R300, on NV30 however this can be a complex and expensive operation since all the instructions that use that one constant that has changed need to be modified. So if a program uses a constant 5 times, then NV30s scheme needs to modify 5 instructions while R300 only needs to update one constant register. Encoding constants in the instruction saves storage space on chip but can come with a costly performance penalty when constants are changed between primitives submitted.
Sampler Registers and Texture Coordinate Registers: In the pixel shader programming model of the R300, samplers and texture coordinates are completely decoupled. Texture coordinates are iterated across polygons and may be used to sample data through a sampler. At any one time, a given sampler is associated with a specific texture map in memory as well as a set of filtering state and texture coordinate clamping state. R300 can iterate up to 8 4D texture coordinates and has 16 samplers. Naturally, it is possible to sample from a single sampler multiple times in a given shader using different texture coordinates. This is common when performing image processing operations.
The fragment program execution environment in NV30 accesses textures via arbitrarily computed texture coordinates. As such, there is no necessary correspondence between the texture coordinates and texture maps previously lumped into a single "texture unit". Just like R300, NV30 also separates the notion of "texture coordinate sets" and "texture image units" (texture maps and associated parameters), allowing implementations with a different number of each. The NV30 implementation will support 8 texture coordinate sets and 16 texture image units.
Fog Distance/Coordinate: NV30 uses it for holding the associated eye distance or fog coordinate normally used for fog computations.
Backface Bit Register: It is a new register for the PS3.0 model. This bit, if set, indicates that the primitive is the back face (the area is negative, CounterClockwise). Hence, inside the pixel shader, the application can make a decision as to which lighting technique to use. Two-sided lighting can be achieved this way. This bit is unset for lines and point primitives and can be used instead of the Constant Boolean registers.
Output Register: It seems that NV30 does NOT support MRT required by DX9 directly. Two color output registers correspond to different types of FP, FP32 and FP16, and cannot be used simultaneously. However, its is reasonable to assume that NV30 can implement MRT functions by Pack/Unpack operations.
Output Register and temp register share the same space in NV30. That is to say a fragment program fails to load if its total temporary and output register count exceeds 64. Each FP32 temporary or output register used by the program counts as two registers, and each FP16 temporary or output register used by the program count as a single register. R300 has a similar limitation.
Four texture output registers are used by NV30’s combiner fragment programs to generate the initial texture register values for the register combiners. After a combiner fragment program is executed, register combiner operations are performed and can use these computed values. The R, G, B, and A components of the combiner registers are taken from the x, y, z, and w components of the corresponding output registers.
The pixel shading unit of the R300 can output up to four colors to different render targets. The ability to output to multiple render targets simultaneously allows multiple intermediate values to be saved out between rendering passes and allows for the implementation of G-buffer techniques. An image-space outlining technique using multiple simultaneous pixel shader outputs is shown in the SIGGRAPH 2002 sketch Real-Time Image-Space Outlining for Non-Photorealistic Rendering.