Analysis: ROP Throughput


Rendering many fullscreen quads is the name of the game here, varying the render-target format. As you'll soon see, the ROPs are quite juicy morsels for someone looking for writing topics. Let's start by disabling depth and stencil, and measuring how fast Slimer can fill rectangles with colours:

Click for a bigger version

Click for a bigger version


As we said, juicy! It's obvious that maximum ROP throughput equals ~16.8 GPixels/s, which is eerily close to 16.996 GPixels/s and exactly what we'd expect if there were only 28 ROPs on-chip, except we know that there are 40 of them. This is the point where we urge you to look upstream, at the ROP analysis, where we had already told you so.

Also look at what happens when writing out to the sRGB format – in this case the limitation is at the ROP level, namely it appears that the linear to sRGB conversion that is required forces the taking of a slow hardware path, contrary to Cypress which seems to handle the conversion full speed. Turning on blending improves the picture a bit, with the use of the L2 as colour cache showing benefits here, and we also notice there's a fast path for the basic R8G8B8A8 format, for which blends are full-rate. Other formats range between half-rate (most) and quarter-rate (in the case of 128-bits per pixel). Here's how things look with Z-only rendering:

Click for a bigger version


Everyone performs as expected and so the only worthwhile note is that Slimer is better here in terms of efficiency, achieving almost 90% of its theoretical throughput whereas Cypress reaches only 80%. If you're thinking that we're now done, you're cute, but also wrong!