Pixel Shader

Along with the Vertex Shader capabilities, the next major part of any Shader Model 3.0 hardware is the Pixel Shader engine.



Pixel Shaders


Although the colours have changed in the Pixel Shader core's diagram since the past few architecture releases, the organisation an arrangement haven't - this is, in fact, because the actual ALU organisation remains unchanged. Although everything in the pipeline has been re-engineered to hit new target clocks that the 90nm process can enable and the capabilities have been extended for Pixel Shader 3.0 operation, the same ALU structure has been kept partially because ATI already have a highly optimised shader instruction compiler, which would need to be re-written for any different ALU organisation. To reiterate the breakdown of the maximum per clock functionality of the ALU's:
  • ALU 1
    • 1 Vec3 ADD + Input Modifier
    • 1 Scalar ADD + Input Modifier
  • ALU 2
    • 1 Vec3 ADD/MULL/MADD
    • 1 Scalar ADD/MULL/MADD
  • Branch Execution Unit
    • 1 Flow Control Instruction

This grouping of operators is what ATI terms as a "Pixel Shader Pipeline" for the R520 architecture.

Additionally to the ALU's is a separate Texture Address Unit that processes the relevant texture instructions for the texture sampler unit. This unit is separated from the ALU's and so instruction can be issued to it at the same time instructions are issued to the ALU's and there should be no contention between the two.

The above ALU structure, or Pixel Shader pipeline, is the nominal organisation for the R520 architecture, but ATI have designed it to be a little more flexible in terms of the number of functional units a whole pipeline can support. While the R520 and RV515 chip have a 1-to-1 ratio of all functional units down the raster pipeline, including texture units, Pixel Shader pipelines and ROP's, some chips in the line will feature more Pixel Shader pipelines within each pipe. For instance RV530's composition is 4 texture units, 12 Pixel Shader Pipelines and 4 ROP's; essentially each of the rendering pipelines in RV530 can simultaneously handle three pixels within the Pixel Shader element of the pipeline.



R520 / RV515 Pipeline



RV530 Pipeline


The above is a conceptual representation of the difference between R520/RV515's and RV530's pipelines, with RV530 being able to handle three "Shader Pipelines" in each pixel pipeline. In the case of RV530 there is a single Texture Address Unit and Texture Sampler Unit per 3 Pixel Shader Units. Note that although the representation above is for a single "pixel pipeline" this organisation applies across an entire quad.

Although the general layout of the ALU's is the same as previous generations, there are a few changes. For instance, Shader Model 3.0 parts have to cope with full dynamic branching in the Pixel Shader and so each shader pipeline has a branch execution unit - this works in conjunction with the dispatch processor, so we'll take a closer look at it later. Additionally, because the Shader Model 3.0 specification calls for at least FP32 processing throughout all the shader pipelines, all the ALU's in the Pixel Shader have been extended for FP32 processing - as with ATI's previous architectures, the R520 architecture only operates with a single precision, being FP32 this time, so that even if the FP16 partial precision hint is called for the R520 series will calculate the instructions at FP32 precision. ATI have also added single cycle SIN/COS functionality in the Pixel Shader ALU's, rather than using multi-cycle macros.

Texture processing is achieved in a similar fashion to ATI's previous parts. Although the earlier pipeline diagram indicates a texture sampler array, all the the texture units are not re-allocatable to different pipelines, instead 4 are dedicated to each of the quads. At present ATI see the primary use of float texture sampling as being for lookup data, which only requires point sampling, and so at present haven't put floating point texture filtering in place, instead relying on filtering in the shader if required - at present the developer has to code this, however if they have a demand for it they may roll it up into the driver. The texture address processors can now address texture sizes up to 4096x4096. The texture pipeline still handles S3TC/DXTC and 3Dc normal map compression, although this gets an upgrade to 3Dc+ as it is now able to support single channel textures at a 2:1 compression ratio which can be used on things such as luminance maps, shadow maps, material properties and HDR textures. Finally the texture samplers can handle a new, angle invariant Anisotropic Filter, which we will take a closer look at later.