With the debut of DirectX 9 (DX9), we see the introduction of a feature that is highly desirable in terms of approaching cinematic rendering quality, which is high precision rendering within the pixel/fragment shaders pipeline.

With DX9 and its compliant hardware, we now have floating point ("FP") calculation choices available to developers. Note the word "choices". DX9 specifies that for hardware to be DX9 "compliant", the minimum specification for floating point is 24-bit. 24-bit Floating Point is essentially equivalent to 96-bit colors -- we have four color channels/components (Red, Green, Blue and Alpha) at 32-bit (4 color channels * 24-bit = 96-bit color). There is, however, the matter of IEEE-32 Floating Point -- IEEE was standardized over 16 years ago, and its 32-bit floating point components/specification consists of 23-bits of mantissa, 1-bit of sign, and 8-exponent values. To be IEEE-32 "compliant", we're talking 128-bit colors (4 color channels * 32-bit = 128-bit color).

A potential "problem" rears its head -- do developers produce their pixels shaders with IEEE-32 precisions in mind, or the base DX9 precisions? This is actually a problem because the available DX9-compliant hardware available now, primarily NVIDIA's NV3x and ATi's R3x0, both have differences when it comes to precision modes supported in hardware and drivers.

The entirety of ATI's R300 series pipeline is 32-bit per component, bar the pixel shader processors, which are at 24-bits per component of precision, or 96-bit total. Developers have the option to write out float values to off-screen render targets of various float precisions and if one of 128-bit precision is selected (4 channels * 32-bit, with each 32-bit being IEEE SPFP) then the pixel shader output is up sampled to this precision. Basically, the R300 series' internal pixel pipeline is a mix of 24-bit and 32-bit per component, but the main pixel shader core is 24-bit FP per component. When the full 96-bit (24-bit * 4 channels) are written to main memory, it is written as a 4 * 32-bit IEEE SPFP, or 128-bit, value. Some of the texture addressing operations are done in 32-bit instead of 24-bit.

The NVIDIA's NV30 series internal pixel pipeline provides for three options -- 12-bit integer ("FX"), 16-bit FP per component or 32-bit FP per component.

With the difference between the two architectures from the two fierce competitors, what are the opinions of developers of games and game engines? What are their preferences? What sort of model do they target?

We asked a few developers about this. We are also not too much in the loop, as the case may be, regarding what NVIDIA actually does with its "The Way It's Meant To Be Played" games marketing campaign -- how and if this ties in with the entire "multiple precision choices" issue with regards to NVIDIA and ATI DX9 hardware. The following page are some of the responses we received.


Which shader model do you prefer - a model that allows different precision modes with different performances and qualities or a model that picks a single precision with a reasonable compromise in terms of high quality and performance, but being generic for all previous shader models?

Dean Calver, Computer Artworks
That's a hard one... I think I prefer not having to think about it to much so I tend to like a single 'reasonable' precision. Its not that I don't understand the reasons for having multiple precisions but basically I want the easy coding option when I'm writing shaders.

Chris Egerter, PowerRender
I would like to be able to control the precision, like we can be control 16 or 32 bit color. It really should be a setting in the API, or at least in the control panel for the video card.

Tom Forsyth, Muckyfoot
I prefer a model that gives me control. If I say "I need this in precision X", it had better be in precision X and no lower, even if it goes faster. If I wanted a less precise shader, I'd have written it that way. If I say "you're allowed to drop to precision Y", then I'm not too worried if they do it at higher precision. Just as long as everyone knows the rules and sticks by them.

The "partial precision" hint in D3D's pixel shaders does exactly this, and I'm fine with that bit of the API.

Obviously a single precision is easier to use, but in the real world we have multiple precisions, and we deal with them and use them sensibly - bytes, words, dwords, floats, doubles. I don't think the minor complexity of two different precisions in a pixel shader is going to make peoples' heads explode.

I think this is a bit of a leading question though :-) - as far as I am aware, both D3D and OpenGL give you control over the precision you use for your shaders. The exception is PixelShader version 1.4 and earlier, where there's no choice because the hardware cannot switch precisions at will. But I'm not an expert on the OGL standards.

Jake Simpson, EA/Maxis
I want control. Definitely the model that allows different precision modes. The second is easier as a coder because it shifts the weight from me to them, but potentially at the cost of graphical precision, and this is only going to get more and more important given the number of shader passes and potential shader collapsing that's going to happen.

Tim Sweeney, Epic Games
For games shipping in 2003-2004, the multiple precision modes make some sense. It was a stopgap to allow apps to scale from DirectX8 to DirectX9 dealing with precision limits.

Long-term (looking out 12+ months), everything's got to be 32-bit IEEE floating point. With the third generation Unreal technology, we expect to require 32-bit IEEE everywhere, and any hardware that doesn't support that will either suffer major quality loss or won't work at all.

"MrMezzMaster" (Developer of well known FPS title who wishes to remain anonymous)
Ultimately, the trade-off between quality and performance should lie in the hands of the coder and be exposed through the shader language (or model, if you like). Let's take the example of coding a numerical algorithm using a standard C compiler. Would you want the compiler to arbitrarily decide quality and performance for you? The answer is: no. This is why the C language has language features such as single precision float vs. double precision float, and of course a default floating point precision.

If the shader language can't specify a precision, the shader compiler (API and/or driver-level) needs to make an arbitrary decision based on context and usage. Then, the question is whether the shader compiler puts quality or performance first. I can bet that performance will typically win over quality in most situations, but that will be fine for most uses. There will always be cases where quality will suffer, but it's up to the API and/or driver-level compiler to test on a reasonable set of applications and adjust their optimization accordingly.