Memory Optimisations

Beyond the new memory controller the R520 architecture also has further areas where effective memory bandwidth utilisation has tried to be optimised.



 

Caches and R520 Cache Misses Relative to X850 (ATI Data)


For one, R520 moves to fully associative caches - caches on previous designs directly mapped to a number of RAM locations, meaning that if data that was in those locations didn't need to be cached at that time the cache block can't be used by other data; fully associative caches allow all the cache lines to be be mapped to any location in external memory. All of R520's texture, colour, Z and Stencil caches have moved to fully associative designs.

ATI's Hierarchical Z-Buffer has been further tweaked such that the precision is now floating point, for better accuracy, and can cache up to 60% more pixels than R420 can. The Z Buffer compression routines have also been optimised to give higher compression ratios than previous parts.


Render Backend

For the most part the ROP's on R520 retain many of the properties of the entire R300-R480 line, but with a few key developments.

ATI have supported various HDR methods since the introduction of R300 with its floating point texturing capabilities, however NVIDIA have supported a more optimal method of High Dynamic Range blending, something that ATI's part have not been able to previously. With the entire range of chips using the R520 architecture ATI will now support HDR blending, but will do so under a number of formats:

  • FP16 - 64-bit floating point
  • Int16 - 64-bit integer
  • Int10 - 32-bit 10-10-10-2
  • Custom formats (eg Int10+L16)

The 64-bit formats will provide the highest quality, but come with the highest bandwidth and memory footprint costs seeing as each pixel is twice the size of a singe 32-bit pixel. The Int10 mode provides a level of Higher Dynamic Range, at the same performance and memory costs as standard 32-bit pixels, but comes at the trade-off of only leaving 2-bits for the Alpha channel basically allowing for either totally transparent, totally opaque and two intermediate values.

Although ATI have provided these HDR blending capabilities in the R520 architecture they haven't removed any orthogonally, meaning that all modes of operation that run through the ROP's work with one another, the net result is that all of the FSAA options that are provided under standard blending buffers are equally supported under any of the HDR blending modes, however the costs associated with the 64-bit modes and FSAA enabled are likely to be fairly high due to the bandwidth utilisation and memory space requirements.


 Far Cry - Normal Rendering                   Fry Cry - HDR + AA

 

The images above are taken from a version of Far Cry that enables their HDR rendering mode to operate with FSAA on ATI's new boards, which only took a few hours to enable. It is expected that this capability will be fully integrated through the engine and interface and, once correctly quality assured, a patch issued - due to the early nature of this code we are unable to provide benchmarks at this juncture.

Although only the RV530 chip features it in the new line, another feature of the R520 architecture its ability to support two quads in the ROP at once when rendering Z and Stencil samples. RV530 features double Z/Stencil fillrates in relation to its colour fill-rate.

ATI's engineers state that they have placed a lot of emphasis on being able to run with high quality options on. To this end they state that they have put a lot of effort into their high quality 6x Anti-Aliasing mode (despite this not being a popular benchmarking mode) and its costs should be less in relation to 4x FSAA than it was in previous architectures. Testing will bear out how much this is the case.