Intellisample Technology

With increased pixel precision, increasing texture sampling rates, and higher FSAA depths, more and more solutions are being sought to maximise the available bandwidth for the graphics core to ensure these ‘feature rich’ pixels can be read and written to with as minimal impact on performance as possible. In efforts to attain this goal NVIDIA introduces the ‘Intellisample’ suite of complementary technologies. Here’s a rundown of the major areas of Intellisample.

Colour Compression – 4:1 Z compression techniques have been implemented in prior generations of hardware; however, Intellisample now extends that into colour compression. Again, a 4:1 loss-less compression technique is enabled on the pixel colour information. The process is enabled entirely through hardware and is transparent to the application. The biggest benefits of colour compression will be seen when FSAA is enabled.

Fast Colour Buffer Clear – Again, as we’ve seen with prior generations fast Z clear techniques have been used to speed up the Z clearing process for faster rendering. This hardware clearing technology is also extended to the colour buffer to further aid rendering time.

Dynamic Gamma Correction – Color operations in the Shaders are calculated in linear space and then gamma corrected on output to give a truer final representation of the original artwork.

The picture on the right has been corrected using gamma correction. 

Adpative Texture Filtering – Alongside GeForce3/4’s Anisotropic filtering technique, an adaptive technique has also been implemented alongside the traditional one that should offer higher performance. NVIDIA says that this method analyses the geometry and texture content and varies the number of samples accordingly and both methods will be available for the user to select their preference.

Note: Clarification on this point is required since in recent articles NVIDIA have stated they already have adaptive sampling.

New Antialiasing Modes – To compliment the FSAA modes already available two new FSAA outputs are enabled via Intellisample: 6XS (for DirectX) and 8X (available in both DirectX and OpenGL).

The details of Intellisample above are the full suite of features; however, not all of them may be adopted in all variants of products based on the CineFX architecture, dependant on market positioning and hence chip costs. Although it's unclear exactly what will or won’t be used it's likely that GeForce FX, being the high end part, will feature all these performance and image quality enhancing areas.

GeForce FX Shaders & DirectX9

As we know from the CineFX documentation, and Beyond3D's NV30 & R300 Technical Comparison, the main target of interest for NV30 is the large advancement in shader technology. Here we'll take an overview of what GeForce FX brings to us.

Vertex Processor

The new vertex processing engine within the CineFX architecture is greatly improved from previous DirectX8 generation hardware. Here's a few key elements that GeForce FX brings forth:

  • Up to 65536 vertex instructions per vertex via loops and subroutines, up to 256 static instructions per shader - the CineFX vertex shaders doubles the instruction storage from DX8 class hardware and also introduces flow control allowing many more instructions to be carried out per vertex.
  • Up to 256 Vector Constants - While GeForce 4 provided 96 constants, GeForce FX allows 256 which could, for example, facilitate more bones in matrix palette skinning and more simultaneous light sources.
  • 16 Temporary Vector Registers - GeForce FX sees an extra four vector registers than its predecessor allowing for temporary storage of large programs.
  • Up to 64 Separate Loops - By supporting fully dependant looping and branching GeForce FX allows for much simpler shader programs.

The CineFX architecture also includes several new shader features not available on NVIDIA's previous generations of hardware:

  • Per-component conditional codes and write masks - allowing conditionals in hardware can simplify code and potentially improve performance.
  • Call and return (subroutines) - CineFX also allows subroutines to be called within code, returning values. Subroutines can be called up to 4 deep.
  • Loops and branching for both static and dynamic control flow - allows the vertex processing to be both powerful and flexible.

For a more complete run-down of the vertex processing features, abilities and instruction support see the Vertex Processing section of our NV30 and R300 Technical Comparison article.

To give an example of how this extra functionality may be useful to a developer NVIDIA use the following matrix palette skinned model.

With CineFX, rather than having to use separate shaders to animate the character, the generalised loops and branches of the vertex shader allow for a single shader to be written to perform all the skinning methods and operations. This makes for a much simpler program for the developer to create, as well as improving overall performance.

Click for a bigger version

With CineFX, rather than having to use separate shaders to animate the character, the generalised loops and branches of the vertex shader allow for a single shader to be written to perform all the skinning methods and operations. This makes for a much simpler program for the developer to create, as well as improving overall performance.