Capabilities

As we mentioned Xenos has capabilities that exceed those of a pure Shader Model 3.0, in DirectX terms, implementation. Whilst ATI are not yet giving out the full instruction set openly, they have broken down a number of the capabilities of Xenos and so that we can compare them against the Shader Model 2.0 and 3.0 capabilities. Note that the table below breaks the operations into Pixel and Vertex Shader models for SM2.0 and SM3.0 as the capabilities are still quite distinct between the two, however with the unified shader architecture on Xenos these differences are removed such that the capabilities that are available to one type of shader processing are available to the other as well.

ATI Xenos Capabilities

  VS2.0  PS2.0  VS3.0  P3.0  Xenos* 
Instruction slots  256  32 + 64  >= 512  >= 512  4K shared pool (nominally 2K VS + 2K PS) 
Max instructions executable  65535  32 + 64  >= 65535  >= 65535  > 500K 
Instruction predication  No  No  Yes  Yes  Yes 
Temp registers  12  12  32  32  64 
Constant registers  >= 256  32  >= 256  224  512 shared pool (nominally 256 VS + 256 PS) 
Static flow control  Yes  No  Yes  Yes  Yes 
Dynamic flow control  No  No  Yes  Yes  Yes 
Dynamic flow control depth  No  No  24  24  4 for loops/calls, 2^23 if nesting 
Vertex texture fetch  No  Yes  Yes, dependent fetches, all formats 
Texture samplers  None  16  16  32 surface shared pool where each textyre surface consumes 1 entry
Each vertex surface consumes 1/3 of an entry
Max 32 PS, 96 VS, nominally 24 each 
Geometry instancing  No  Yes  Yes 
Dependent texture limit  No limit  No limit 
Texture instruction limit  32  No limit  No limit 
Position register  No  Yes  Yes 
Interpolated registers  2 + 8  10  16 
Arbitrary swizzling  No  Yes  Yes 
Gradient instructions  No  Yes  Yes 
Loop count register  No  No  Yes  Yes  Yes 
Face register  No  Yes  Yes 


* Note: We are listing here Xenos hardware capabilities, which may or may not be the same as that is exposed through the API for the XBOX 360 hardware. However, as this is a closed system with a custom API for the hardware we would expect them to be exposed for use by developers.

Some additional capabilities that are included on the Xenos graphics processor are:

  • Multiple Render Targets (MRT)
    4 render target outputs are supported as output and, as an addition to current processors, each target can have different blend capabilities.
  • Hierarchical Stencil Buffer
    Operates similar to the Hierarchical Z buffer to quickly cull unnecessary stencil writes.
  • Alpha-to-Mask
    Converts Pixel Shader output alpha value to a sample mask for sort-independent translucency.

An additional functional element that Xenos provides to developers is a Geometry Tessellation Unit. The tessellation unit is a fixed function engine that accepts triangles, rectangles and quads as its primitive input, along with a tessellation level per edge such that the level of tessellation is completely variable across the surface of the original primitive.

Current graphics processor architectures can mark to "kill" a pixel in the pixel shader and this is the case with Xenos. However, as the architecture unifies the shaders the capabilities of both the shader program types (vertex and pixel) are available to each other, so the kill command will also operate for vertices. Although the vertex isn't retired in the ALU as it goes through the rest of the geometry pipeline to be set up vertices marked as killed will be ignored, effectively reducing the level of detail in the resultant geometry.

Although 4000 is a reasonably large number of instructions to support in a single code block, this is a limitation on the number of instructions that can be applied to a single shader program because the full program is stored on the chip and never partially retrieved from memory. However, should the developer wish to exceed that in a single block then ATI's F-Buffer technology is included to increase the shader length. Alternatively ATI's "MEMEXPORT" (see "MEMEXPORT" section) could be used to increase the length of a shader program beyond the nominal 4000 instructions.

The combination of the shader array and tessellation unit can now make the, oft spoken of but rarely seen, capability of displacement mapping an attainable method to use as this truly becomes a single pass algorithm for Xenos. A simple primitive can be sent to the tessellation unit which is then subdivided into a vertex mesh and then that can be applied to a vertex shader program that does displacement map lookups via the vertex fetch texture units and then the geometry mesh altered according to the sampled values from the texture sampler. Alternatively, if the screen-space projection of the input primitive to the tessellation unit is calculated prior to tessellation then the per-edge tessellation level can be figured out dependant on that projection such that displacement mapping with correct, dynamic level of detail can be achieved.