For the technically minded among us, can you fill out a little more details on who they work? We know they are post processing effects which tends to imply they only have the data in the color buffer and the Z-Buffer to work from, and some effects appear to require both elements to operate.

Maurice/Roger - That's right. At swap time, just before we perform the swap operation, we do some processing work on the back buffer. Most of the time we allocate a temporary buffer and perform the effects there, but some effects like greyscale don't depend on neighbouring pixel values, so we just perform them in place.

Actually we don't expose the z-buffer in SMARTSHADER. Even the edge detection algorithm doesn't use it- we use a Sobel edge-detection filter using only the contents of the color buffer. There are some cool effects that a depth buffer would allow, but we decided against it due to the possibility of cheating. Since the z-buffer isn’t normally directly visible to the user, exposing it would add new viewable information to some users. Certain effects like fog and smoke often don’t write anything to the depth buffer so someone would be able to write a shader that highlighted some non-visible objects behind such effects.

So all that leaves is the color buffer, which SMARTSHADER give you full access to. We also allow the user to pass in user defined textures. I used these user defined textures to create the ASCII effect. Other uses I can come up with off the top of my head for these would be logos and noise textures. This programmable version of SMARTSHADER uses ARB_fragment_program. This means you get the same limits you'd normally get with this extension on R300, namely 64 ALU instructions, 32 texture instruction, 32 constants, 32 temps, 8 texture units, and 4 levels of texture indirection. SMARTSHADER makes multipassing extremely easy so you can always choose to do that to expand these limits.

What shader lengths are being used for the more complicated effects and are there any PS2.0 shaders being used?

Maurice - The simple answer to the length portion of your question is that if you look at "Green ASCII.pps" you can count that the main shader has 12 ALU instructions and 2 texture instructions, or if you look at "Emboss (Black and White)" you see it has 18 ALU and 9 texture instructions. How those break down into instructions for our hardware after our shader compiler is done converting it to our hardware is something we can't disclose. Besides I've seen some interesting discussions on Beyond3d forums about this sort of thing and it wouldn't be nearly as much fun for you guys if you knew how everything worked. :)

In the OpenGL world ARB_fragment_program is the equivalent to PS2.0. Since all of our shaders are done with ARB_fragment_program one could argue that all the shaders are PS2.0. Another person might not call a program true PS2.0 unless it uses instructions that require floating point operations. The algorithm for ASCII shader is fundamentally based on floating point math so that one would certainly be classified as a PS2.0 shader. All the programs use 24 bit floating point math through the pixel pipeline. If you take the time to understand how some of these shaders work you'll learn to appreciate being able to always work with floating point math, not only because it guarantees higher precision, but also because it makes the algorithms simpler and easier to understand.

Roger - The most complicated effect we currently have is the edge-detection filter. It requires a lot of instructions to average pixel values and find the differences between them. It requires nine texture lookups, and more interpolants than we have available coming into the pixel shader, so we have to perform a bunch of math operations to make it work.

Many of the shaders make use of a filter kernel, where you look up a pixel and the value of its neighbors and compute the values between them. To do this, you have to perform many texture lookups- more than you have interpolants. We dynamically compute the texture coordinates for each pixel, and that means we need floating point math, and more than a few bits of precision.

Some shaders require large data ranges. Some of them, for example, sum up values, and the results may be negative, or may be much larger than 1.0. The floating-point registers help with that as well.

We had to discard some effects as well. One of them was a four-pass shader with fifty or so instructions in each pass. I was trying to apply a Kuwahara filter to give a cell shaded effect. In the end it just looked like to be a very slow blur filter.