Maybe all of this is sounding a bit complex, but it does make sense. The easiest way to imagine this is like this: Imagine a large grid. This grid represents your texture. Now imagine a small outlined square that contains 4 (2 by 2) of those small grid squares. Now imagine that for every new pixel you render you move to the right one texel. Do you see the texels used for the bilinear filtering (the texels inside the large outlined square)? Now if you keep moving to the right you will see that many texels are passed. So many that the first ones can no longer be stored inside your chip cache. Now imagine moving left to right but only a small bit, say 8 grid units. Now move the larger square to the next line (one texel unit) and do the same again left to right 8 texels. Notice how the 2 top ones can be re-used from the texels we had for our previous line? This is the difference between a normal linear frame buffer and a tile based one. In short, we can reduce the number of page breaks and once the system is running we only need 1 new texel for every new pixel we render (as long as we continue to use the same texture).

Naturally, some critical minds will say: How do you know that subsequent trace positions will be so close together that we can re-use previous texels? Well the answer is: we don't. The traced positions depend on the location of the 3D triangle in space. But there are some tricks to make sure that traces do end up close together. Let me illustrate that. Imagine a small textured square. The texture used has 128x128 texels. Now this triangle is rather far away from the camera so its small... so small that it only spans 8 by 8 pixels on the screen. Now this means we only need 8 times 8 traced positions in our 128x128 wide texture map. It's pretty obvious that those samples (traced positions) will be so far apart that there is virtually no re-use. This complete lack of re-use is caused by a very large texel to pixel ratio. We need to jump over way too many texels to reach the next ones we need. Now how can we solve this? The answer is mip-mapping. Mip-maps are downsampled versions of the original high-resolution texture. So if we take the 128x128 map and create mip-maps then we get a 64x64, 32x32, 16x16, 8x8, 4x4, 2x2 and a 1x1 map. Now we need to select the correct map based on the ratio. Our square is 8x8 pixels large on the screen, so why not use the 8 by 8 texture map. Now our texel to pixel ratio is equal to 1! We will have full optimal re-use of texels. In reality, the correct miplevel (correct size map) is selected using special math involving the depth positions. But generally, by using mip-maps, we guarantee a more optimal re-use of texels. Naturally, the situation isn't always this optimal since sometimes we end up having a situation where no map is optimal. So we need to take the second best map and this means that re-use efficiency drops. That is thus one of the reasons why texture caches never have 100% efficiency.

I hope that after all of this you understand that using a tile based memory structure for the frame buffer is more optimal than a simple linear scan line based one. I also hope you understand why its important to have mip-maps. This is also why so many driver sets (like those from NVIDIA) contain auto mip-map generation for games that don't provide mip-maps. A game without mip-maps will have very inefficient cache use. For those who are interested, the 3dfx Voodoo1 already contained a tile based memory structure for its buffers so I would say that the tweak described here is a pretty basic and important one. Note that this whole advantage is inherently present in the PowerVR design.