In relation to R300 ATI were quick to adopt the 'VPU' (Visual Processing Unit) acronym first brought to us with 3Dlabs' P10 chip. Presumably ATI feel that the levels of programmability afforded by their offering warrant it, so let's dig a little deeper to see what's behind the slew of marketing names

TRUFORM 2.0

'TRUFORM' on Radeon 8500 was basically the marketing name for N-Patch support. N-Patches are a method of increasing the level of geometry by tessellation of a simple surface based on the vertex normal information. See here for more details.

TRUFORM 2.0 takes that a step further. N-Patch support is still prevalent, though this time it also supports continuous tessellation. On Radeon 8500 the levels of tessellation were fixed at a few predefined levels but continuous tessellation allows floating point tessellation levels for smoother transitions. Also, TRUFORM 2.0 supports adaptive tessellation to dynamically adjust the level of tessellation that occur dependent on Z positioning; i.e. if an object is close to the viewport high levels of tessellation will be required to make it look smoother, and if it is far away from the viewport then high levels of detail will not be seen anyway so a lower level of tessellation can be used rather than wasting lots of geometry processing.

Alongside N-Patch support TRUFORM 2.0 has been enhanced with the inclusion of Displacement Mapping, which Matrox have been talking about and which has been included in DirectX9. Displacement Mapping is another form of Higher Order Surface support, though rather then interpolating increased levels of geometry based on normal information of geometry supplied, a 'Displacement Map' texture is used. The displacement map texture has the detail of the locations of the geometry to be tessellated, so the displacement map texture is sampled and the values given back are used in the tessellation of the polygon being rendered.



Displacement Mapping


It would appear from this that the 'TRUFORM II' engine is not a flexible, programmable unit, but one that is limited to performing these two types of Higher Order Surface processing.

SMARTSHADER 2.0

SMARTSHADER 2.0 is ATI's name for the Vertex and Geometry Shading elements of the Radeon 9700 PRO. The '2.0' in the title, as opposed to the Roman numerology, is likely in deference to the fact that these have full hardware support for both DirectX9's Vertex Shader 2.0 and Pixel Shader 2.0 API specification.

Vertex Pipeline & Vertex Shaders 2.0

The R300 hardware features 4 parallel DirectX9 version 2.0 Vertex Shaders.

 

Click for a bigger version

R300 Vertex Engine


ATI states that in conjunction with the powerful triangle setup engine this is the first geometry engine that can process a single vertex and triangle in one clock cycle.

The following Diagram details the operations of a single Vertex Shader of the Vertex Shading engine:


Click for a bigger version

R300 Vertex Pipe


What's interesting here is that ATI are utilising a standard Vec4 processing engine in conjunction with a parallel Scalar processor. If we remember back to 3Dlabs P10 Vertex Shader one of the interesting aspects was that 3Dlabs had opted to go with 16 parallel scalar processors rather than traditional for 3D processing, Vec4 processors. Vec4 processors are tuning to processing 4 element vector operations, as are common in 3D; however, if operations are run that do not consist of 4 Vector operations the Vec4 processor will still take the same amount of processing to produce this operation as it would a Vec4 operation, even though there is less to process. Presumably 3Dlabs' thinking was that as more arbitrary processing is going to be required on the Vertex Processors than the number of non Vec4 operations will increase and using Scalar processors throughout the Vertex Shader will be more optimal since they do not waste any processing on no Vec4 operations as they will execute an instruction per clock cycle. ATi have opted to stick with Vec4 processors for the mainstay of the processing, though scalar processors operate in parallel so that each Vertex Shader engine can operate on a Vec4 process and scalar process simultaneously. With 4 Vec4 units this is the equivalent to the processing ability of P10's 16 Scalar processors during Vec4 operations; however with each engine also containing a scalar processor it means that under R300's optimal conditions P10 has 4/5 the processing throughput in the Vertex Shader per clock (under P10's optimal conditions R300 will have half the processing throughput per clock as each of the 4 Vec 4 units can still do scalar ops as well as the Scalar processors, which means that 8 scalar ops could be achieved per clock).

One of the main advancements in the DirectX9 VS2.0 specification is the increased programmability which allows the developer more control over what's processed by the Vertex Shader. When R300 was initially announced, ATi detailed the specifications of their Vertex Shader in terms of DX9 VS2.0 functionality suggesting that this is all R300 supports, when in fact such statements were below what is supported by R300. Here's a small table to illustrate DirextX9 VS2.0 requirements and where the R300 hardware sits:

Max Instructions 1024 255 (max # of instructions with loops) * 255 (max # loops) + 1 (last instruction) = 65026
Max Constants 256 256
Temp Registers 12 32
Flow Control Yes Yes

As can be seen from the table above, DX9 Vertex Shader 2.0 allows new flow control commands, including loops, jumps and subroutines. These, alongside the increased number of instructions over DX8, significantly add to the programmability of the Vertex Shader, allowing much more complex operations to be performed. Additionally R300's Vertex pipeline can perform advanced shader operations, such as SIN/COS, via the use of DX9 macros. DX9 macros have no specified minimum number of execution instruction but do have have a ceiling limit of 8 instruction which they must be performed within. R300 supports all the specified DX9 macros, in generally the less number of instructions specified by Microsoft.