As I understand it, DOOM3's rendering pipeline works by, first, rendering the complete scene without ambient light nor textures (so that this pass is very quick) and then filling the z-buffer with correct (and final) depth information for each pixel on screen and z-writes are turned off.

Yes.

And then, for each per pixel light :
1. Render shadow volumes of all shadow casters into stencil buffer. This is again a quick pass (no textures used), but potential fill rate burn because of overdraw (btw, what sort of optimizations are you doing to reduce overdraw?).
2. Add light contribution to pixels that have stencil=0 (when they are not in shadow). This looks something like this for diffuse point light:
Temp=NormalMap dot3 LightDirection
Temp=Temp mul AttenuationMap
Temp=Temp mul LightColor
Result=Temp mul MaterialTexture

That is basically correct for the diffuse map, although you have to sort by light and clear the effected regions of the stencil buffer as well. Adding the specular map requires a half angle cube map, another access of the normal map, some random math, and a specular map access. There are some subtleties with sorting to allow some extra shadowing features, but they aren't critical aspects.

Basically, your engine draws z-buffer in one quick pass and then the z buffer does not change anymore. Total number of render passes is (greatly) influenced by the number of per pixel lights used. Correct? Yes. You said there are two passes on GeForce 3/4 and one on Radeon 8500. Are these number of passes *per light*? Say we take 30fps as basis - GeForce3 or 4 could handle about 3 or 4 per pixel lights per frame which actually means 8-10 passes! Radeon 8500 would take 5-6 passes which saves some T&L work. Again, correct?

Yes, the primitive code path is a surface+light "interaction". We guestimate 2x lights per surface average and 2x true overdraw, for 4x interaction overdraw (times 5,3,2,or 1 passes, depending on the card and features enabled).

The Kyro (or specifically, the Kyro2). With lack of cubemap support, and with LightDirection being a cube map texture, would disabling per pixel normalization of LightDirection enable the Kyro2 to run DOOM3? Would you do this?

I doubt it, but if they impress me with a very high performance OpenGL implementation, I might consider it.

GeForce1/GeForce2/Radeon7500. All would be able to run DOOM3 at lower resolutions with fewer per pixel lights per frame. What about per pixel normalized LightDirection? No cube maps and LightDirection can be stored in diffuse or specular color component of vertex but I'd appreciate a clarification/confirmation from you.

That is actually on my list of things to benchmark, but I rather doubt it will make a difference. I don't think there are enough combiners on a GF1/2 to do it, and I don't expect much of a speedup by skipping a rather low-res cube map access on GF3/4. John Carmack

Thanks to John for taking the time. Feel free to discuss this right here.