Maybe all of this is sounding a bit complex, but it does make sense.
The easiest way to imagine this is like this: Imagine a large grid. This
grid represents your texture. Now imagine a small outlined square that contains
4 (2 by 2) of those small grid squares. Now imagine that for every new pixel
you render you move to the right one texel. Do you see the texels used for
the bilinear filtering (the texels inside the large outlined square)? Now
if you keep moving to the right you will see that many texels are passed.
So many that the first ones can no longer be stored inside your chip cache.
Now imagine moving left to right but only a small bit, say 8 grid units.
Now move the larger square to the next line (one texel unit) and do the
same again left to right 8 texels. Notice how the 2 top ones can be re-used
from the texels we had for our previous line? This is the difference between
a normal linear frame buffer and a tile based one. In short, we can reduce
the number of page breaks and once the system is running we only need 1
new texel for every new pixel we render (as long as we continue to use the
same texture).
Naturally, some critical minds will say: How do you know that subsequent
trace positions will be so close together that we can re-use previous texels?
Well the answer is: we don't. The traced positions depend on the location
of the 3D triangle in space. But there are some tricks to make sure that
traces do end up close together. Let me illustrate that. Imagine a small
textured square. The texture used has 128x128 texels. Now this triangle
is rather far away from the camera so its small... so small that it only
spans 8 by 8 pixels on the screen. Now this means we only need 8 times 8
traced positions in our 128x128 wide texture map. It's pretty obvious that
those samples (traced positions) will be so far apart that there is virtually
no re-use. This complete lack of re-use is caused by a very large texel
to pixel ratio. We need to jump over way too many texels to reach the next
ones we need. Now how can we solve this? The answer is mip-mapping. Mip-maps
are downsampled versions of the original high-resolution texture. So if
we take the 128x128 map and create mip-maps then we get a 64x64, 32x32,
16x16, 8x8, 4x4, 2x2 and a 1x1 map. Now we need to select the correct map
based on the ratio. Our square is 8x8 pixels large on the screen, so why
not use the 8 by 8 texture map. Now our texel to pixel ratio is equal to
1! We will have full optimal re-use of texels. In reality, the correct miplevel
(correct size map) is selected using special math involving the depth positions.
But generally, by using mip-maps, we guarantee a more optimal re-use of
texels. Naturally, the situation isn't always this optimal since sometimes
we end up having a situation where no map is optimal. So we need to take
the second best map and this means that re-use efficiency drops. That is
thus one of the reasons why texture caches never have 100% efficiency.
I hope that after all of this you understand that using a tile based memory
structure for the frame buffer is more optimal than a simple linear scan
line based one. I also hope you understand why its important to have mip-maps.
This is also why so many driver sets (like those from NVIDIA) contain auto
mip-map generation for games that don't provide mip-maps. A game without
mip-maps will have very inefficient cache use. For those who are interested,
the 3dfx Voodoo1 already contained a tile based memory structure for its
buffers so I would say that the tweak described here is a pretty basic and
important one. Note that this whole advantage is inherently present in the
PowerVR design.