Vertex Shader

Unlike ATI's Xenos graphics processors for the XBOX 360, the R520 architecture is a little more traditional in that it doesn't make use of a unified architecture and hence has distinct vertex shader and pixel shader processors. R520 itself features 8 vertex shader units, which translates into transformation capabilities of two vertices per clock, giving all the currently announced R520 configurations transformation rates of over 1G vertices per second, although inevitably they will be limited by the setup rates.



Vertex Shaders


The basic organisation of the vertex shaders for the R520 generation doesn't change significantly from those featured in R420 or even R300; the ALU arrangement stays with the familiar structure of a full Vector (4 component) ALU with a scalar ALU alongside, meaning a scalar instruction can be dual issued alongside another, up to a full vector, in a clock cycle on each vertex shader. All of ATI's DirectX9 parts have utilised FP32 width ALU's in the vertex shaders, so there are no precision changes required for R520's vertex shaders. Although ATI supported geometry instancing with their VS2.0 vertex shader engines, with the VS3.0 model they are able to expose it in more the expected manner for a Shader Model 3.0 part.

Obviously the capabilities of the vertex shader have been increased in order to enable the specifications of the Vertex Shader 3.0 model, which included long instruction lengths (1024 instructions), dynamic flow control instructions, with branches, loops and subroutines and a larger temporary register space. However, one element of Vertex Shader 3.0 compliance is the capability for vertex texturing, yet there appears to be an absence of any texture lookup capabilities from ATI's diagrams above; a curious loophole of the VS3.0 specification is that although the capability bit for Vertex Texture capabilities must be enabled for compliance, there are no actual texture formats dictated for support, so if the capability bit is enabled but no texture formats exposed VS3.0 compliance can still be met - indeed this is the case with R520 as it has no direct vertex texturing capabilities. ATI's statement is to engineer Vertex Texturing in a non-unified architecture to a point were it is actually usable and beneficial would require so much die for extra texture caching and other associated elements to reduce the texture latency costs, it would be very costly for the frequency that it is likely to be used, and that cost would have inevitably come at the detriment to something else; instead ATI are looking to leverage the pixel pipeline more for such vertex operations.

Part of the point of vertex texturing is to be able to expose pixel format data to the vertex shader, and as somewhat of an alternative to Vertex Texturing ATI will be promoting the use of a new extension to DirectX known as Render to Vertex Buffer. As the name implies Render to Vertex Buffer (R2VB) allows all of the operations within the pixel shader to be utilised, but rather than rendering to a displayable surface or texture the results are rendered to a buffer in memory that can be directly read as an input to the vertex shader. The upshot of this process is that an application can have access to the capabilities of the Pixel Shader which can then be fed back into the geometry processing pipeline, which should result in a superset of capabilities of vertex texturing and should actually perform better than current vertex texturing schemes because the pixel pipelines are inherently built to cope with, and hide texture latencies.

R2VB is actually a subset of ATI's OpenGL "Uberbuffers" extensions (but a superset of current Vertex Texturing methods), but without specific support from DirectX to expose or enable R2VB ATI have to use a "backdoor" method for enabling it, as they did with Geometry instancing on VS2.0 hardware. Developers will have to first check for the FOUCC "R2VB" format to see if R2VB support is enabled by the device and then check the specific device format in order to see what capabilities it has; R2VB can not be enabled when N-Patches are in use either. We asked ATI about the possibility of exposing R2VB functionality in the driver as you would vertex texturing and getting the driver to unroll all the specifics automatically and ATI suggested that they have thought about this and will look into it, however ATI maintain that Microsoft are happy for them to expose this in such a manner. Although this now seems like there are now at least three different implementations of vertex texturing across the latest generations of PC and console graphics, apart from the current method of checking for R2VB capabilities, this implementation does bring R520's vertex texturing implementation closer to the Xenos chip in the XBOX 360 in terms of vertex texturing capabilities in that virtually all the functionality of the pixel shader can be exposed and used.

Render to Vertex Buffer should be supportable by any DirectX9 (SM2.0 or 3.0) board, should the vendors choose to expose it. As pre-VS3.0 hardware does not have any vertex texture sampler support only one R2VB sampler can be specified, however with ATI's VS3.0 hardware the driver exposes up to five samplers and we believe that this could be equally supportable in NVIDIA's NV4x/G7x series.