Post-processing Shaders

This last phase is optional, at its most complex may be divided into 2 sub-phases. Here we want to take the fully lit scene and add any other ‘effects’, some examples would be fog, depth of field or tone mapping, also non-lit particles and transparent surfaces would be added here.

The inputs are all the buffers generated before (G-Buffers and the lit buffer) and actual output will be to the back-buffer for display. At its simplest, this phase doesn’t occur and the lit buffer is the back-buffer (super-sampling can occur as a part of the light shaders with some modifications to the light shaders), at its most complex it writes to temporary surfaces before finally processing into the back-buffer.

The potential sub-phases come about if super-sampling anti-aliasing is used. Currently fast anti-aliasing is done via multi-sampling, this has a high resolution depth/stencil but only a standard resolution colour buffer, the high resolution depth/stencil is used to calculate how much to filter the normal resolution colour buffer. This cannot be used with a G-Buffer, as you cannot safely filter the values stored there (to be absolutely correct, you can filter the G-Buffer but would require a completely programmable custom filter).

The main form of anti-aliasing for G-Buffers is to generate everything at a higher resolution and filter down as a post-process (an other type would be to have the light shaders use custom filters when they read the data out of G-Buffer). Unfortunately when this is done there is no way (currently) of filtering the depth/stencil buffer, so anything done post this must be done without the depth buffer. This mark’s the sub-phases, in the 1st sub-phase any post-processing has access to the depth buffer and will undergo super-sampling, whereas the 2nd sub-phase will output at the actual output resolution.

Fog

Fog can be the simple distance based fog or much more complex volumetric effects (though for true volumetric fog, it should be integrated into the light shaders). The simple distance based fog, looks up to position from the G-Buffer and attenuates the colour in the lit buffer based on it. Depending on the exact form of fogging and surfaces it may be possible to do the actual blending in the alpha blender.

float4 DistanceFog(float4 fogColour, float2 uv )
{
  float projDist = Gbuffer.Pos[uv].z
                 / Gbuffer.Pos[uv].w;
  float t = tex1D( projDist );
  return lerp( LB, fogColour, t);
}

Depth of Field

My favourite depth of field effect currently available in real-time is the technique developed by Guennidi Riguer [19] which is easily adapted to our deferred shading system. The basic idea is to calculate the distance from the focal plane and use this to control the size of a ‘circle of confusion’, this circle is used to lookup the image, the bigger the circle the more blurred the final image.

The algorithm easily adapts to our deferred lighting system but runs into instruction limits under PS_2_0 (Riguers’ version calculated some values (the blur factor) in the geometry pass whereas we are using ‘generic’ parameters so we calculate it on the fly). You can either reduce the number of taps used in the ‘circle of confusion’, split it into multiple passes or move to longer shader versions.

float ComputeBlur( float depth, float focalDist, float
                   focalRange )
{
  return saturate(abs(depth - focalDist) * focalRange);
}

float4 RiguerDOF( float focalDist,
                  float focalRange,
                  float maxCoC,
                  float2 tapOffset[NUM_OF_TAPS]
                  float2 uv )
{
  float depth = Gbuffer.Pos[ uv ].z;
  float blur = ComputeBlur( depth, focalDist,
                            focalRange);
  float4 colourSum = Lit[ uv ];
  float sizeCoC = blur * maxCoC;
  float totalContrib = 1.0f;

  for(i=0;i < NUM_OF_TAPS;i++)
  {
    float tapUV = uv + tapOffset[ i ] * maxCoC;
    float4 tapColour = Lit[ tapUV ];
    float tapDepth = Gbuffer.Pos[ tapUV ].z;
    float tapContrib = (tapDepth > depth) ? 1.0f :
    ComputeBlur(tapDepth, focalDist, focalRange);
    colourSum += tapContrib;
    totalContrib += tapContrib;
  }

  return colourSum / totalContribution;
}

Practical Considerations

The first thing you notice when you have this system up and running that its very pixel-shader limited, expensive pixel shaders run multiple times per pixel. This is one of the reason the light shader geometries work so well, even if you render lots of triangles getting the shape right the gains from accurate occlusion culling and fewer wasted pixels usually out way the vertex shader cost of transforming the geometries.

Shadow maps fit into this architecture extremely well, most of the cost of shadow maps is actually generating the shadow map, the actual cost of using a shadow map is a few cycles per pixel, even percentage closest filtering is only slightly more expensive. As shadow maps don’t have to be updated every frame its easy to have lots of shadows, the cost to use a light with a shadow over a light without a shadow can be as low as 6 pixel shader instruction per screen pixel.

Deferred lighting totally changes the behaviour you’d expect from complex lighting algorithms, lots of small lights are faster than big lights that cover a lot of screen space. If your world is densely occluded lots of your lights won't cost much at all, this causes a strange performance effect, lighting a flat plane is a lot more expensive than lighting a complex environment.

Problem Areas

Transparency

Transparency are a major weakness, there is no cheap solution to ‘standard’ transparencies. The best (in speed terms) we can do currently is to fall-back to a non-deferred lighting system for transparent surfaces and blend them in post-processing. The best (in image terms) using current hardware is depth-peeling. Each peeled layer runs the complete set of light shaders using the stencil buffer to limit pixel coverage and then blends the result together[20].

Memory

No solutions but a warning that deferred lighting has a number of large render-targets. If super-sampling is used the render-targets are even bigger. With 16 bit per component of the G-Buffer, the standard 16 components take up 256 bits per pixel with another 32 bits for the depth/stencil for a total of 288 bits per pixel. For a resolution of 1024x768 the G-Buffer alone takes over 28 MB of video memory (now factor in super-sampling!). Also there are the lit buffer and any spares used by the post-processing.