Pixel Shader Core

Each of the quad pipelines feature 4 pixel rendering pipelines, each of which have their own Pixel (fragment) Shader core.

 

The pixel shader core in R420 remains largely the same as R300, with a texture address unit, and two ALU's each of which are co-issue capable, and dual issue capable together - this can result in up to 5 floating point operations executed per cycle in the best case scenario. The second ALU is fully featured, however the first is a smaller ALU which ATI haven't fully detailed its capabilities - it does feature PS1.4 input modifiers but ATI state that it has other instruction capabilities, however we don't know what they are. Unlike the Vertex Shader, the Vector ALU is only 3 components, so if a full 4 component operation is required then the scalar unit is utilised, hence a co-issue is not possible at the same time on that ALU. The precision of the shader core remains at FP24 per component.

Although the Shader core, for the most part, remains the same as R300's, which fundamentally means that it's still a Shader 2.0 shader core, it isn't exactly the same and some modification have been made. R420's shader core increases the number of constant and temporary registers and a facing register has been added for two sided lighting. The instruction limits have been increased so that the Vector, Scalar and Texture ALU's can now have 512 instruction limits, which is native to the hardware and not just a tweak via the F-Buffer. The F-Buffer also has some changes in that only the pixels that are operated on in a pass are accessed, and not the entire frame regardless of whether the pixels are affected; this results in better memory management and performance in comparison to 9800's F-Buffer.

At GDC a new HLSL compilation profile, PS2_b, was talked of and this was in relation to R420's new instruction lengths.

HLSL Compilation Profiles

  PS_2_0  PS_2_a  PS_2_b  PS_3_0  PS_4_0 
Dependant Texture Limit  No Limit  No Limit  No Limit 
Texture Instruction Limit  32  Unlimited  Unlimited  Unlimited  Unlimited 
Position Register 
Instruction Slots  32+64  512  512  >=512  >=65535 
Executed Instructions  32+64  512  512  >=65535  Unlimited 
Texture Indirections  No Limit  No Limit  No Limit 
Interpolated Registers  2+8  2+8  2+8  10  32 
Instruction Predication 
Index Input Registers 
Temp Registers  12  22  32  32  4096 
Constant Registers  32  32  32  224  16x4096 
Arbitrary Swizzling 
Gradient Instructions 
Loop Count Register 
Face Register (2-sided Lighting) 
Dynamic Flow Control  24