On the R300 the Hierarchical-Z is disabled whenever the stencil buffer isn't cleared with the Z-buffer. This hurts performance. Is there any way around this?

Don't know.

(Note: ATI has since stated that they Hierarcial-Z buffer is not disabled, but seeing as there is no Hierarcical Stencil test it is just of no use during the actual stencil operation. It is, however, still enabled)

Your .plan indicates that the NV30-path that you use implements only 16-bits floating-point (FP), i.e. half precision FP, for most computation, which should be sufficient for most pixel shading. The ARB2-path does not have 16-bits FP, and so all computation are done with 32-bits FP on the NV30. With regards to the R300, there shouldn't be a difference since it is always 24-bits FP on the R300. According to your .plan, NV30 is twice as slow on 32-bits FP - that is why the NV30 is slower than the R300 on the ARB2-path, but faster on the NV30-path. The question is what sort of quality difference are we talking about (in DOOM3) for such a difference between FP formats?

There is no discernable quality difference, because everything is going into an 8 bit per component framebuffer. Few graphics calculations really need 32 bit accuracy. I would have been happy to have just 16 bit, but some texture calculations have already been done in 24 bit, so it would have been sort of a step back in some cases. Going to full 32 bit will allow sharing the functional units between the vertex and pixel hardware in future generations, which will be a good thing.

My interpretation from your .plan :

In terms of Performance :
NV30+NV30-path is faster than NV30+ARB2
NV30+NV30-path is faster than R300+ARB2
R300+ARB2 is faster than NV30+ARB2
R300+R200-path is faster than R300+ARB2

In terms of Quality :
NV30+ARB2 is better than NV30+NV30-path
NV30+ARB2 is better than R300+ARB2
R300+ARB2 is better than NV30+NV30-path
R300+ARB2 is better than R300+R200-path

Am I correct?

Correct.

Why is the ARB2 path so slow on NV30? Just the higher precision alone doesn't seem to account for NV30's performance given that it runs at a much higher clock speed versus the R300 and 96-bit vs 128-bit is but a 33% difference.

Apparently, the R300 architecture is a better target for a straightforward assembler / compiler, while the NV30 is twitchy enough to require more serious analysis and scheduling, which is why they expect significant improvements with later drivers.

What about the difference between NV30+NV30-path and R300+R200-path in terms of performance and quality?

Very close. The quality differences on the ARB2 path are really not all that significant, most people won't be able to tell the difference without having it pointed out to them.

Why do you have NV30-specific code paths and none for the R300?

There aren't any R300-specific fragment extensions, so I really can't make an R300-specific back end. I do support their two sided stencil extension (unfortunately, slightly different than NVIDIA's...), which is orthogonal to the back end selection.

In your opinion, why is it that the existing framebuffer content for a given pixel isn't a standard input to Pixel Shaders (which is a S3 DeltaChrome feature/advantage)?

All software developers want this, but the hardware developers insist that dedicated blenders vastly simplify the write ordering hazards.

Our thanks go to John for taking the time out from his extremely busy schedule to attend to Reverend's emails!


  • If you wish to comment on this then please do so here.

Related articles: