Modifiers



- (negate): Source modifier. Negation operator. ü ü ü ü
_abs: Source modifier. Absolute value of the source. - ü ü ü
Arbitrary Swizzles: Source modifier. - ü* ü ü
_sat: Instruction modifier. Clamp from 0 to 1. ü ü ü ü
_pp: Instruction modifier. Lower precision (at least s10e5). ü ü ü ü
[!](p[.swizzle]): Instruction modifier. Support predicate. - - ü ü

Note: - No Support; ü Support; ? Unknown.
* R300 can do any swizzles, except those that replicates two source components

Instructions in NV30 may have up to eight variants, including a suffix of "R"( FP32), "H" (FP16), or "X" (FX12) to specify arithmetic precision, a suffix of "C" to allow an update of the condition code register, and a suffix of "_SAT" to clamp the result vector components to the range [0,1]. For example, the sixteen forms of the "ADD" instruction are "ADD", "ADDR", "ADDH", "ADDX", "ADDC", "ADDRC", "ADDHC", "ADDXC", "ADD_SAT", "ADDR_SAT", "ADDH_SAT", "ADDX_SAT", "ADDC_SAT", "ADDRC_SAT", ADDHC_SAT", and "ADDXC_SAT".

Some mathematical instructions that support precision suffixes, typically those that involve complicated floating-point computations such as SIN and LG2, do not support the "X" precision suffix.

DX9 PS2.0/3.0 also support such the saturate instruction modifier, which not cost any additional instruction slots. The _SAT instruction modifier can be used with any arithmetic instruction (including macro instructions), except for FRC and SINCOS. _SAT cannot be used with texture addressing instructions: TEXLD*, TEXKILL. _SAT also cannot be used with instructions writing to output registers.

In DX9 PS2.0, the partial precision hint (represented as _pp in the assembly) can be use by the application to indicate to the device that the operation can be performed and the result stored at a lower precision (at least s10e5, aka FP16). Note: This is a hint and many implementations may choose to ignore it.

As we all know, PS1.4 in R200 supports many register modifiers, such as negate, invert, bias, scale, bias and scale, and channel replication, and instruction modifiers, such as _x2, _x4, _x8, _d2, _d4, _d8, and _sat. It is obvious that R300 will continue to support them. Whether NV30 supports all of them or not, such as _x4, _x8 and _d8, is unsure now.

Besides, R300 automatically handles conversion of internal format to required external format and hides conversion from the shader, but details are unknown now.

Instruction Sets

Instruction sets make a big progress from PS1.x to PS2.0/3.0, but the executing model of pixel shader changes more. In the last generation GPU, if the instruction number exceeds the number of texture stages, the pixel fillrate will drop dramatically because of pipelines combining. Now, for R300 and NV30 there is no such question any more. The last barrier of complex pixel shaders is broken.

Comparing with PS1.x, DX9 PS2.0/3.0 introduces some new instructions including macros: FRC, EXP, LOG, POW, CRS, ABS, RCP, RSQ, NRM, MIN, MAX, and LRP.

Add & multiply instructions ADD, DP2ADD, DP3, DP4, MAD, MOV, MUL ADD, DP2ADD, DP3, DP4, LRP, MAD, MOV, MUL ADD, DP3, DP4, LRP,  MAD, MOV, MUL, SUB, X2D ADD, DP2ADD, DP3, DP4, MAD, MOV, MUL
Texturing instructions TEXLD[B|P] TEXLD[B|P] TEX, TXD, TXP TEXLD[B|P], TEXLDD
Partial derivative instructions - - DDX, DDY DSX, DSY
Math functions EXP, FRC, LOG, RCP, RSQ EXP, FRAC, LOG, RCP, RSQ COS, EX2, FLR, FRC, LG2, POW, RCP, RSQ, SIN EXP, FRC, LOG, RCP, RSQ
Compare instruction CMP CMP, CND - CMP
Set on instructions - - SEQ, SFL, SGE, SGT, SLE, SLT, SNE, STR -
Graphics-oriented instructions - - DST, LIT, RFL -
Minimum / maximum instructions - MAX, MIN MAX, MIN -
Pack instructions - - PK2H, PK2US, PK4B, PK4UB -
Unpack instructions - - UP2H, UP2US, UP4B, UP4UB -
Kill instructions TEXKILL TEXKILL TEXKILL TEXKILL
Static flow control instructions - - - IF, ELSE, ENDIF, CALL, CALLNZ, LOOP, ENDLOOP, LABEL, REPEAT, ENDREP, RET
Dynamic flow control instructions - -   IFC, BREAK, BREAKC
Predicate instructions - - - SETP

DX9 PS2.0 provides macros such as: MIN, MAX, LRP, POW, CRS, NRM, ABS, SINCOS, M4X4, M4X3, M3X4, M3X3, M3X2. Microsoft claims that macros are added for ISV convenience, but encourages more efficient implementation (ideally native) by the IHV.

Add & multiply instructions: Although I cannot find LRP instruction in ATI's own specification, I've recieved confirmation that LRP is support by R300 natively, executing in one cycle. NV30 also supports LRP natively too. X2D is used to perform 2D coordinate transformation.

Texturing instructions: As we can see, NV30 provides a TXD instruction (Texture Lookup with derivatives) instead of TEXLDB instruction (Texture Lookup with LOD bias). However, TEXLDB in PS2.0/3.0 can be emulated by TEXLDD PS3.0 / TXD NV30 instruction:

dSdX = ddx (texture coordinate)
dSdY = ddy (texture coordinate)
multFactor = 2^bias
dSdX = multFactor*ddx
dSdY = multFactor*ddy
color = txd(texture coordinate, dSdX, dSdY)

So for the cases where all the shader author wants is to add a mip LOD bias, TEXB would be faster; however, this functionality is available if desired using TXD. For anisotropic filtering, TXD also provides control over the direction of anisotropy which allows some pretty nice effects.

Partial derivative instructions: These can calculate partial derivatives with respect to screen-space x or y and are useful for anti-aliasing, height-field bump mapping and computing parameters for the TXD texture lookup with partial derivatives.

Compare instructions: CMP can implement conditional write function in a shader. NV30 does not support the simple and useful CMP instruction introduced by R200 natively. So, to be blunt, the engineers at ATI and NVIDIA are rather disgusted by each other these days.

Graphics-oriented instructions: The DST instruction compute a distance vector which is useful for per-fragment light attenuation calculations: a DOT3 operation involving the distance vector and an attenuation constants vector that yield the attenuation factor. The LIT instruction accelerates per-fragment lighting by computing lighting coefficients for ambient, diffuse, and specular light contributions. The RFL instruction computes the "reflected direction" vector of the "direction" vector about the "axis" vector. R300 can support these instructions as macros.

Packing and unpacking instructions: Do not suspect the sense of packing and unpacking instructions. They are useful for packing and unpacking multiple components in a single channel of a floating-point frame buffer. For example, a 128-bit "RGBA" frame buffer could pack 16 8-bit quantities or 8 16-bit quantities, all of which could be used in later rasterization passes.

Besides, pack instructions should be able to emulate some of DX9’s MRT functionality, albeit in a rather ugly way. MRTs render into METs, where each component must be bound to a separate sampler, so the 128-bit packed floating point color output from a shader would need to be copied into 4 different element textures to support MRT. Of course, it depends on the implementation of DX9 driver of NV30. For shaders which render to 4 color buffers and then bind all 4 of those buffers to a different shader in a subsequent pass, the pack instructions will work well.

We must admit that NV30 has a powerful instruction set. Although it lacks the important flow control instructions and one texture instruction, it provides a little more instructions than PS3.0 specification.