HOS & Shaders

There's been lots of talk of about Displacement Mapping support (although, we know this isn't a key features to determine 'Compliancy' for DirectX9), can you clarify what level of support GeForce FX has for it?

GeForce FX supports Geometry Displacement Mapping.

DirectX9 documents two forms of Displacement mapping support: Matrox's method that filters the Displacement Map and is able to cater for dynamic LOD, and a simpler pre-filtered method. If there is no native support for the full method in GeForce FX can a multi-pass method be implemented whereby the Pixel Shader does the Displacement Map filtering and the results are then passed back to the Vertex Shader to do the necessary tessellation (similar to the method 3Dlabs' P10/Wildcat VP can do)?

GeForce FX will not support Matrox’s Displacement Mapping. The GeForce FX method for displacement mapping does use a two-pass technique using a pixel shader program to filter the displacement map and passing results to a vertex shader program that offsets the geometry accordingly.

What forms of HOS beyond N-Patches and Displacement mapping is NV30 capable of?

GeForce FX does not support n-patches or other HOS techniques. 

What's NVIDIA's general stance on hardware HOS support?

HOS technology in theory is great, however it is too early to implement these techniques cost-effectively in hardware. NVIDIA believes that current HOS implementations are inadequate and that is why there is virtually no developer support. When the time is right to put HOS in hardware and we can do so in a way that adds value to the GPU, NVIDIA will be a leader in hardware HOS.

We know that NV30 supports the ability to produce two FP16 (64-bit) shader instructions in the time it takes to do one FP32 (128-bit instructions), however the number of cycles these take is unclear. At one launch presentation an NVIDIA representative mentioned that 32bit integer instructions can be done at two per clock, which was twice as fast as an single FP16, which suggests that FP16 instructions operate in one clock cycle, hence FP32 instructions would take two clock cycles. Is this the case?

We have not disclosed these details of our architecture.


One of the bandwidth saving techniques in NV30 is that of colour compression. In the press material its was quoted as giving 'free' FSAA, how 'free' is it realistically? For instance, we've seen that R300 also implements colour compression and while its FSAA performs faster than most other solutions its not entirely free - is this the same type of thing we'll see with GeForce FX? If it was really 'free' why not just make 8X (or greater) sampling the default mode?

Indeed, AA is not totally free. There is a small impact on frame rate, but the impact is much smaller than it has been for previous architectures. You will see how fast our AA modes are when the first boards are benchmarked and reviewed in the press.

If the colour compression is loss-less this implies that a buffer the entire size of the AA mode being used needs to be reserved in memory (so 4X FSAA requires 4 times the memory of rendering at the same resolutions without FSAA), what will the memory requirements be for 8X FSAA? Previous NVIDIA cards have not given a warning for the max resolution being used and have automatically dropped back to fewer samples, will there be some indications of the max modes available?

You are correct that a full-size colour buffer must be reserved in the graphics memory. The colour compression technique reduces bandwidth, but does not reduce maximum possible size of the frame in memory (so the maximum must be reserved). The GeForce FX Reviewers Guide has a table that shows which modes are possible given available memory. However, some applications may run poorly if there is insufficient memory left for texture storage, so the possible modes are subject to some variation on on application-by-application basis.

We've seen that the 'xS' FSAA modes (4xS and 6xS) are not available under OpenGL because OpenGL doesn't natively support the mixed Supersample / Multisample modes these are based on. However, 8X is also said to be mixing both multisampling and super sampling because NV30's pipelines are still only able to produce 4 AA samples per pipeline, so how does 8X differ from 4xS/6xS that enables it to operate under OpenGL?

 You can mix multisampling and supersampling in OpenGL modes as long as you meet some very specific restrictions. Our 8X modes meets those restrictions, while our “xS” modes do not.

Other than compression is would seem that the FSAA configuration of NV30 remains essentially the same as on NV20/NV25, why is this? Did you think about implementing some sort of more flexible approach allowing for sparse sampling positions?

NVIDIA added new AA display modes with GeForce FX that are not available on NV20/NV25. We will continue to drive AA technology to higher levels in future products.

The Intellisample documentation made note of the Shader Pipelines Gamma correction abilities - is it only Pixel Shader output that is Gamma corrected, or are FSAA sample Gamma corrected as well?

All of the gamma correction is done in the pixel shader, but this is perfectly compatible with FSAA rendering.

Is the Gamma correction value set at a specific value or is this programmable?

The gamma correction value is programmable because it is controlled by pixel shader programs.