Analysis: Tesselator Throughput, PCIe Bandwidth and Conclusions
The tessellator in ATI hardware has been with us for 2 years, and we've yet to see any tests done on it. That's not that surprising though, given the fact that the tessellation SDK has only recently been provided by ATI, and no demos of any sort existed, at least not for public consumption. We're not yet ready to do a full investigation of it, but we can certainly get our feet wet by doing a quick test. We'll be using one of the two demos ATI provides, namely the CharacterTessellation one. Here's what came out at the other end:


Looking at the relative performance (normalised versus the result without tessellation), it seems quite clear that there has been no change on this front in the R6xx-R7xx transition, at least as far as tessellator use in DX9 is concerned (the demo is a DX9 one, and for the time being only DX9 programming is exposed). As a minor bit of trivia, with 15x tessellation you're taking an 8884 triangle mesh as input and outputting 3651324 triangles. At 133 frames per second, this means the little bugger is pushing 485 Mtris/s, which is impressive after a fashion. Of course, the maximum quoted rate is 750 Mtris/s, but take into account that in computing there's always overhead (in this case, more vertices to evaluate via the evaluation shader, greater setup pressure, less efficient hier-Z with high densities of small triangles etc.). However, given the closeness of DX11, it's likely that this first tessellator iteration on the PC will go down in history as an interesting quirk and little more.
Before ending the bore, there's still one thing we'd like to show you, namely read-from-host-memory/write-to-host-memory rates -- basically, a PCI-Express throughput test:

The final surprise used to be here, as RV740 did quite badly when it came to writing back to the host, not even managing to hit 1 GB/s. However, driver updates seem to have taken care of this, at least in our case, so there's not much left to see these days.
Conclusions
Overall, the first 40nm effort from ATI has proven to be a pretty solid mainstream chip. Gobs of shading potential (albeit not simple to leverage, mind you) balanced by a good chunk of sampling ability and coupled with capable ROPs make it an easy to like GPU. Sure, it's hampered by its laughably low bandwidth, but given the target market, the gamble might've paid off due to the chip's size. That is, if delays and poor availability hadn't put a fly in the ointment.
As it is, RV740 is somewhat caught in a disadvantageous position due to market dynamics: the slightly faster 4850 has already fallen in price, undercutting it somewhat, and the entire line-up is moving downwards, in order to make room for new-comers. Availability is still not great, something hard to forgive in a volume oriented part. Had it come out according to initial plans, and with good availability, it would've been a pretty cool little guy. As it is, we're not sure it's that interesting now from a strictly economic standpoint. As a GPU, it's nice to see how the lower end of the market has evolved over the years, and to notice that nowadays it's more than capable. More than twice as much overall shader horsepower per clock as R600 at the lower end of the market is brilliant.
Hoping that no one fell asleep whilst reading this, we'll leave you to your own devices as we get back to planning future B3D endeavours with new DX11 hardware from both big IHVs.