Texture Processing


There are both 16 texture fetch units (filtered texture units, with LOD) and 16 vertex fetch units (unfiltered / point sample units) giving 16 of each type of texture samplers. Note that as the output data from the texture samplers is supplied to the unified shader arrays both types of texture lookups are available to either Vertex or Pixel Shader programs, if needed, and there are no limitations on the number of dependant texture reads. All of the texture address processing is handled locally by the texture processing array with each texture unit having its own texture address processor, so this is functionality that does not consume any cycles in the ALU shader array.

Each of the filtered texture units have Bilinear sampling capabilities per clock and for Trilinear and other higher order (Anisotropic) filtering techniques each individual unit will loop through multiple cycles of sampling until the requested sampling and filtering level is complete. The texture address processor has some general purpose shader ability and is able to apply offsets from the input texture co-ordinates which can be used with higher order filtering techniques. The Anisotropic filtering capabilities adapts the number of samples taken dependant on the gradient of the surface that it is sampling, which is fairly normal for Anisotropic filtering mechanisms, ATI says that the anisotropic filtering quality is improved from previous generations of hardware. As Xenos is the controller of a UMA, the entirety of system RAM is available to the texture samplers, although they will not perform any operations on the eDRAM memory.

Xenos texture capabilities include support for DXTC (S3TC) texture compression routines as well as various other compression routines that are DXTC like in their operation. ATI2N (3Dc) is supported, as this is more or less just a twist of DXTC operation, as well as other compression formats that would be useful for normal maps. There are no compression methods available for float texture formats, although there are a total of 64 different texture formats supported.

The design of the Xenos processor is such that latency within operations is hidden as much as possible. Texture lookups are usually one of highest latency operations in a graphics pipeline, and possibly the least predicable in terms of the variation in the number of cycles a request is made to the data becoming available. Xenos uses a large number of independent threads of vertex and pixel workload interleaved in order to achieve high utilization of all of the processing units while hiding the latency of fetches. The net result is that although a thread may need to wait for a texture sample to be achieved, that thread need not be stalling the ALU's waiting for texture data, instead other threads will operate on the ALU's which should maximise the available texture and ALU resources available.