Beyond3D - NVIDIA GT200 GPU and Architecture Analysis

NVIDIA GT200 GPU and Architecture Analysis - Page 8

Published on 16th Jun 2008, written by Rys for Consumer Graphics - Last updated: 16th Jun 2008

Architecture Summary

Because GT200 doesn't implement a brand new architecture or change possible image quality compared to G80 or G92, we've been able to skip discussion of large parts of the chip simply because they're unchanged. There's nothing new to talk about in terms of maximum per-pixel IQ, because the crucial components of the chip that make that all happen have no improvements or changes to speak of. It's purely a question of performance and how that's derived.

Click for a bigger version

If you've got your wonky monocle on (every graphics enthusiast has one, so they can squint and replicate Quincunx in real-time, with REAL pixels), it's possible to look at GT200 and see 1.4B transistors and wonder why 2x G92 across the board wasn't present because, after all, it's nearly double the transistor count. The reality is that transistors have been spent elsewhere in the chip however, for CUDA among other things. Furthermore, and perhaps more importantly, some potential minor bottlenecks such as triangle setup remain seemingly unchanged while clocks also went down.

The stark reality is that GT200 has more of an eye on CUDA and non-graphics compute than any other NVIDIA processor before, and it speaks volumes, especially as they continue to ramp up the CUDA message and deploy Tesla into outposts of industry the company would previously have no business visiting. Oil and gas, computational finance, medical imaging, seismic exploration, bioinformatics and a whole host of other little niches are starting to open up, and what's primarily a processor designed for drawing things on your screen is now tasked with doing a lot more these days. The base computational foundation laid down by G80 now has DP and full-speed denormal support, which is no small matter as a new industry grows up. We'll cover that separately, since we took in a recent Editor's Day in Satan Clara related to just that.

We've not been able to spend much time with hardware to date, but we've been able to throw some graphics and CUDA programs at a chip or two (and a special thank you to the guy who helped me run some simple D3D9 on a GTX 280 last night, très bien monsieur!). Performance is high when testing theoretical rates in the shader core, and although we can't see NVIDIA's claimed 93-94% efficiency when scheduling the SFU, ~1.5 MULs/clock/SP is easy enough with graphics code, and we see a fair bit higher under CUDA. That contrasts to the same shaders on G80, where the MUL is resolutely missing in graphics mode, regardless of what you dual-issue with it.

We can almost hit the bilinear peak when texturing, which proves general performance claims there, and if I could be bothered to reboot to Windows and fire up Photoshop, I could make a version of the old 70Gpixel/sec Z-only image, only this time it'd be around the 1/10th of a terazixel mark. That's right, I said terazixel. While we haven't measured blend rate, it's next on our list to do properly, but we're confident the theoretical figures will be born out during testing.

With 3D games, and it's prudent to talk about the split execution personalities of the chip in terms of a graphics mode and the compute mode, because they affect performance and power consumption, depending on the game and resolution tested (of course, caveats ahoy here), 30-100% or so more performance is visible with GeForce GTX 280, compared to a single GeForce 9800 GTX. Quite a swing, and the upper end of that range is seemingly consumed by D3D9 app performance. Compared to the multi-GPU product du jour right now, GeForce 9800 GX2, it does less well and is often beat at high resolution depending on memory usage. Yes, we're waving our hands around a bit here without graphs to back it up, so we forward you to the fine fellows at The Tech Report and Hardware.fr for more data.

As the single chip paragon of graphics virtue and greater internets justice, GT200 has no peers, and GeForce GTX 280 looks like a fine caddy for this latest of our silicon overlords. We question heavily whether the asking price is worth it, with GTX 260 looking much better value there if you must buy GT200 right now, but we daren't say much more until we've had proper hands-on testing with games and some more CUDA codes. There's something about having 512-bit and 1GiB on one chip though.

We mentioned at the top of the article that G80 has finally been truly usurped in our eyes for 3D. At a simple level, 1.4B transistors holding huge peak bilinear texturing rates, 256 Z-only writes per clock and around 780 available Gflops in graphics mode will tend to do that. More on GT200 and its competition over the coming next couple of weeks, and keep your eyes peeled on the forums for more data before new articles show up.

NVIDIA GT200 GPU and Architecture Analysis - Page 8

Architecture Summary

Page Navigation