Another trick to solve the bandwidth problem is compression. Compression basically allows you to send more using the same bandwidth. In our analogy you could see it as sending more raw materials using the same belt space and speed (for example by pressing them together). Today's texture compression techniques allow compression ratios up to 1:8, which means that 8 times as much information can be sent as without compression. These techniques and their qualities and problems deserve their own in-depth article that will appear on Beyond3D soon (or later).

T-Buffer effects and data re-use


The T-Buffer effects can be generated in such a way that a lot of data can be re-used. We need to render 4 buffers, each representing a slightly modified version of the same scene. One way to do this would be to render each buffer sequentially. So, render buffer 1, buffer 2, buffer 3 and then buffer 4. This approach would be very inefficient in terms of data re-use. After all, every scene contains roughly the same objects, and each object has the same texture detail. Now, if you render per buffer you end up using the same textures for the same objects in each buffer, but by rendering buffer per buffer the use of those same textures are split up. So, it makes much more sense to render triangle per triangle to each buffer at the same time. Imagine drawing a triangle:

The triangle we render has a texture X. This triangle has this same texture in every buffer (since its part of the same object in every scene). The only thing that changes in the buffers is the positions (e.g. for FSAA it is shifted 1/4th of a screen pixel). The fact is that the same texture information is needed to render that triangle (be it slightly shifted, skewed). So, you render that triangle to buffer 1, 2, 3 and 4. In this process there should be a lot of texture data re-use resulting in a high efficiency of the cache. After all, drawing the same object but at a slightly different location uses the same source (texture) data.

The same efficiency is maintained for the output of data. If we store the buffer in an interleaved way we can write the results to all buffers at the same time. So the pixels of the buffers (we assume two buffers here) are stored like this:

(P1-B1) (P1-B2) (Z-value) (P2-B1) (P2-B2) (Z-value)

P1-B1 stands for pixel 1 in buffer 1. So, when we output data we can write to the two buffers continuously. This is much more efficient than storing the data like this:

(P1-B1) (P2-B1). (PN-B1) (P1-B2) (P2-B2) (PN-B2) (Z Values)

In this representation we have to jump around in the memory to write/read pixel of different buffers and Z-information. By interleaving we get a higher efficiency of the bandwidth.

This interleaving is not necessary on the pixel level. As we all know, 3dfx has been using Tile Based Frame buffers since the Voodoo1 board. This means that the screen is divided in small, rectangular subparts. These subparts are stored linear in memory. Now, it's possible that 3dfx decided to interleave at this tile level. This means that you render tile per tile. For more information check this extract from our T&L article.