Analysis: Sampling Throughput


Here we were quite conservative, simply sampling from 16 Texture Objects in order to shade a fullscreen quad. We've also tried using Texture Arrays (results not shown) but there were no worthwhile differences.  First, let's see what happens when we vary texture format:

Click for a bigger version


Not much to see here, really.  Samplers don't behave all that differently when compared to their venerable siblings in the 8800GT (which interestingly has a similar theoretical sampling rate, but much lower practical performance) -- they are faster, yes, but they don't seem to have gained or lost much functionally, excluding support for the new DirectX 11 specific formats.

On one hand, they seem to have lost a bit, looking at single channel 32-bit sampling, which is half rate versus the previous implementation, possibly showing that it's not special cased, and thus fetched through the same path/at the same throughput. On the other, they can now sample at full speed from 16-bit per component surfaces and the somewhat wackier 9-bit per channel with 5-bits of exponent shared ones.

Also note that Slimer can't sample at full rate if BC6 is used, probably because decompressing it takes two cycles rather than one, the latter being the case for the other compressed formats (remember, NVIDIA keeps texture samples in compressed form even in the highest level in the texture cache hierarchy, unlike ATI who decompresses when fetching into L1).

Versus the ones found in Cypress they look slightly worse in the unfiltered case, since those are full rate everywhere. Let's see if turning on bilinear filtering changes things:

Click for a bigger version


No big surprises with filtering either, so we can safely move on to the next section, closing this one showing you the impact of varying texture resolution for 8-bit per component and BC2 compressed textures:

Click for a bigger version

Click for a bigger version