Theory is not Reality
Now all my previous talk about fill-rate has been purely theoretical. Now if I know one thing then it is this:
"Theory is never equal to reality"
You can put that one on a paper and hang it above your bed or your desk because its always true and especially for everything that involves electronics. If you ever tried to build some electronic thing then you will know that the first time you run it it needs a hell of a lot of tweaking, mainly because all components have specs but these specs are a range, so your calculations have to be adapted to that range. But now back to fill-rate: why isn't the theoretical number possible?
1. Z- and frame Buffer Clearing
Traditional renderers use a Z-Buffer to figure out what is visible
and inside this buffer you store depth values. Now every time you start
a new frame this buffer has to be reset. Its a bit like starting with
a white piece of paper to draw on. This clearing requires time and during
that time your 3D accelerator is idle. Now when the 3D core is idle it
is not calculating pixels. If you remember our fill-rate definitions and
especially our Theoretical Peak fill-rate then you know that we assumed
during those calculations that 100% of our clock cycles were used for
rendering pixels! During this buffer clearing this is impossible so you
lose fill-rate and thus theory is not equal to reality. Theory : doesn't
care about buffer clears, Reality : does care about buffer clears.
Now how big is the impact of this ?
Well the impact is related to what kind of memory the card uses, the screen resolution and of course the frame-rate. First of all Memory Type : SGRAM supports special block write modes, these block write modes allow the clearing of memory zones. SDRAM doesn't have support for special block write modes so clearing the buffer will take more cycles. The influence of screen resolution is pretty trivial : the higher the resolution the more memory you have to clear ! After all at 640x480 you have only one forth of the number of pixels of the 1024x960 resolution. frame-rate... this is a bit funny but the higher your frame-rate is the bigger the impact of clearing the buffers for every frame.
Now actually telling you how many cycles you lose because of this is hard to do since it depends on all these factors. But fact is that the influence is there.
Isn't there any way to avoid it ?
Yes there are some tricks but all of these tricks require software influence, basically if the game doesn't support these tricks then you are stuck with the impact of buffer clearing. About the tricks : you can avoid the need to clear the frame-Buffer if the game writes to every screen pixel during every frame. So if every 3D scene generated creates polygons that cover the whole screen then the rendering does its own clear buffer. However not all games write polygons to all onscreen pixels so its a risk. Quake2 for example does support this trick. The Z-buffer clearing can also be avoided but again it requires heavy influence of the game/application. Basically it involves the Z-Buffer compare mode: Always. This means that a pixel is written no matter what the old stored Z-value is.
2. Page Breaks
This effect is again memory related. Let me introduce the problem by an
analogy:
A memory component can be seen as a small notebook. This book contains pages with a certain number of lines. Now when you write to memory its like writing to these pages of this notebook. Naturally all the info doesn't fit on one page so you use all the pages of your book. Now at some time you will need to retrieve info you have written to this book. Now assume that all lines in the book are numbered. Now if I ask you for the info at page 5 line 10 then you can find that by going to page 5 and then reading line 10. Now if I ask you page 5 line 20 then you will be able to tell me that very quickly, however if I ask you for page 37 line 7 then you need to flick through the pages until you find page 37. Basically the message I try to bring across is that finding info that is at the same page in your notebook is easy and fast compared to finding info on a different page. Now this same idea of pages and lines is present in memory technology, and the speed impact of accessing a different page is also present. When the information you need is at a different location, at a different page then you will suffer a Page Break . A Page breaks always results in extra cycles being needed to find and retrieve the different page... now the question is : how many page breaks do you suffer when doing 3D Graphics?
3D Rendering is a very random process : you render a polygon here, you render a polygon there. The location of these polygons can be completely different, the textures used for rendering them can also be completely different. Basically a 3D accelerator needs to:
- Read Texture Data
- Read/Write Z-Buffer Info
- Read/Write frame Buffer Info
Basically this should give you the idea that you need to access different locations in memory. However there is a big possibility that there are only 2 positions because the Z- and frame Buffer are often interleaved meaning that for every pixel you store color and Z-info in a linear way close to each other. This does have impacts on the clearing time. Still since most new 3D accelerators support dual texturing in single cycle they thus need to access 2 textures at the same time, this means you need an extra memory page : the two textures are normally at different locations.
So what is the impact ?
Well during the whole process of rendering you need to access various positions in memory the swapping between these locations results in page breaks... the higher the number of breaks the higher the number of cycles that you lose. Again giving you an exact number is impossible but the impact is there without any doubt.
3. Plain lack of data
Many people just accept single cycle dual texturing without thinking about
it... rendering two textures at the same time requires a serious amount
of bandwidth. If you look at the basic data needs : 4 texels for texture
one, 4 texels for texture two, read and write both a Z and frame buffer
value. Now if you add all this up then you see that the bandwidth need
for rendering one single dual textured bilinear filtered pixel is huge.
Naturally this is worst case but look at the numbers:
Texels: 16 bits color x 4 (bilin) = 64 bits => x2 since we have two textures
= 128 bits.
Z-Buffer: 16 bits read and write = 32bit
Z-Buffer: 16 bits read and write = 32bit
Now this gives us a grand total of 192 bits per pixel that you need to render. This means that caching MUST work, if it doesn't you run out of data. NVIDIA's RIVA TNT has already showed that caching can fail in a game (cfr Unreal) and the results of this failure are dramatic : a huge slowdown and hiccups of the frame-rate.
Keep in mind that my numbers are partly worst case (I could have gone for a 32bit frame and Z-buffer but I didn't). Even if you assume that cache efficiency is 50% you still need a huge amount of new data every clock and I guess most of you will agree that sometimes it will go wrong and then you lose cycles.
I think the previous 3 points have given you an idea why theory and reality are NOT equal since you lose cycles and the Theoretical numbers assumed that 100% of the cycles could be used for filling pixels to the frame buffer.