Beyond3D - NVIDIA Fermi GPU and Architecture Analysis

NVIDIA Fermi GPU and Architecture Analysis - Page 2

Published on 23rd Oct 2010, written by Alex Voicu for Consumer Graphics - Last updated: 28th Oct 2010

Chippery

The retail, desktop-oriented moniker for the Fermi family is GeForce GTX 4xx. The SKU we bought for evaluation was the GTX470, a step down from the top-end GTX 480 and powered by the first Fermi processor, GF100. It's worth noting that even the highest-end current GeForce GTX 4xx SKU isn't based on a fully specced GF100, having only 480 ALUs enabled out of a potential 512.

It's reasonably clear that NVIDIA is not a company that embraces the petite solution to anything, not least their graphics processors -- we were rather dumbstruck with the 2.15 B transistors Cypress touted, so you can imagine our reaction when hearing about Fermi packing some odd 3B of them in its rather robust ~550 square millimetre die. Density is clearly lower (5.55 M versus 6.43 M transistors per square millimetre) compared to the competition, but then again ATI seems to have released the dense genie (as well as finding a love for putting lots of nice, dense SRAM into their recent designs), so that's hardly surprising.

Putting together that particular die size and TSMC's woes with its 40nm process, it becomes obvious that Slimer manufacturing is no piece of cake, but we'll not dwell on that all that much since enough virtual ink has flowed and much Internet drama has happened over the topic. Suffice to say that being considerably behind ATI in launching a DirectX 11 capable part, and not being able to field a SKU based on a fully enabled chip, is illustrative of how things went.

GF100 is fully DirectX 11 compliant, and one of the consequences of that is that it's quite different from its DirectX 10 predecessors. Moreover, in its quest to preempt Intel's perennial paper tiger, Larrabee, NVIDIA's architects made a number of design choices that serve to further differentiate Fermi (and here we're talking about implementation details rather than feature set, since the feature set is quite tightly matched to what DirectX 11 mandates).

Whilst we won't espouse any nonsense about GF100 sacrificing graphics in favour of compute (and that is nonsense, one does not sacrifice its core market and prime income driver for a fledgling and still emerging one, sorry), we do think that Fermi would've looked a bit differently had NVIDIA not underestimated ATI's competitiveness (an easy error to make; at the time Fermi was being laid out ATI seemed quite incapable of competing), whilst at the same time realising LRB1 would maintain its semi-comatose state well beyond any reasonable expectation. Which is not to say that they'd not have eventually materialised, but it's probable that'd have happened at a later date.

Another novelty on the NVIDIA side of the fence is support for GDDR5. We do know that reaching this goal was reasonably challenging and took a bit longer than initially anticipated (GDDR5 sporting products from NV were planned for a much earlier time-frame), and the comparatively low-speed at which GF100 derivatives drive their DRAMs suggests that there's still room for improvement in this particular relationship.

Power is a somewhat stingy topic: the maximum board power for our 470 is pegged by NVIDIA at a somewhat high 215 watts, with the 480 being an even thirstier beast at 250 W. To combat this appetite for power to a certain extent, down-clocking is quite aggressive -- see below -- so staring at your desktop should be a green experience indeed!

NVIDIA GTX 470 Clock Table

	Base Clock	Hot Clock	Memory Clock
Default	50	101	135
2D Desktop	405	810	324
3D Applications 1	405	810	1674
3D Applications 2	607	1215	1674

That digested, it's now time to meet the board we tortured.

NVIDIA Fermi GPU and Architecture Analysis - Page 2

Chippery

NVIDIA GTX 470 Clock Table

Page Navigation