Pixel Shader Core
Each of the quad pipelines feature 4 pixel rendering pipelines, each of which have their own Pixel (fragment) Shader core.

The pixel shader core in R420 remains largely the same as R300, with a texture address unit, and two ALU's each of which are co-issue capable, and dual issue capable together - this can result in up to 5 floating point operations executed per cycle in the best case scenario. The second ALU is fully featured, however the first is a smaller ALU which ATI haven't fully detailed its capabilities - it does feature PS1.4 input modifiers but ATI state that it has other instruction capabilities, however we don't know what they are. Unlike the Vertex Shader, the Vector ALU is only 3 components, so if a full 4 component operation is required then the scalar unit is utilised, hence a co-issue is not possible at the same time on that ALU. The precision of the shader core remains at FP24 per component.
Although the Shader core, for the most part, remains the same as R300's, which fundamentally means that it's still a Shader 2.0 shader core, it isn't exactly the same and some modification have been made. R420's shader core increases the number of constant and temporary registers and a facing register has been added for two sided lighting. The instruction limits have been increased so that the Vector, Scalar and Texture ALU's can now have 512 instruction limits, which is native to the hardware and not just a tweak via the F-Buffer. The F-Buffer also has some changes in that only the pixels that are operated on in a pass are accessed, and not the entire frame regardless of whether the pixels are affected; this results in better memory management and performance in comparison to 9800's F-Buffer.
At GDC a new HLSL compilation profile, PS2_b, was talked of and this was in relation to R420's new instruction lengths.
HLSL Compilation Profiles
PS_2_0 | PS_2_a | PS_2_b | PS_3_0 | PS_4_0 | |
Dependant Texture Limit | 4 | No Limit | 4 | No Limit | No Limit |
Texture Instruction Limit | 32 | Unlimited | Unlimited | Unlimited | Unlimited |
Position Register | - | - | - | o | o |
Instruction Slots | 32+64 | 512 | 512 | >=512 | >=65535 |
Executed Instructions | 32+64 | 512 | 512 | >=65535 | Unlimited |
Texture Indirections | 4 | No Limit | 4 | No Limit | No Limit |
Interpolated Registers | 2+8 | 2+8 | 2+8 | 10 | 32 |
Instruction Predication | - | o | - | o | o |
Index Input Registers | - | - | - | o | o |
Temp Registers | 12 | 22 | 32 | 32 | 4096 |
Constant Registers | 32 | 32 | 32 | 224 | 16x4096 |
Arbitrary Swizzling | - | o | - | o | o |
Gradient Instructions | - | o | - | o | o |
Loop Count Register | - | - | - | o | o |
Face Register (2-sided Lighting) | - | - | - | o | o |
Dynamic Flow Control | - | - | - | 24 | o |