NVIDIA GF100 graphics architecture details

Thursday 21st January 2010, 08:26:00 AM, written by Rys

Late last year I wrote a speculative piece on NVIDIA Fermi and GF100 for The Tech Report, outlining a best guess at the graphics capabilities based on research, NVIDIA's historical progression in graphics and good old journalism, probing the company's engineers where possible.  Along with a roll of the dice for the clocks, it set a reasonably good estimate on where things would land on launch.

I say reasonably good estimate since NVIDIA recently released a whitepaper containing lots of good details, and they also wheeled out their big graphics guns in the form of Henry Moreton, Emmett Kilgariff (whoa!) and Jonah Alben (a twelve year veteran at NV, now their Senior VP of GPU Engineering after joining from SGI way back when) to brief selected members of the press after CES.

The Tech Report were there, and they've published their findings, correcting where I got things wrong to get a better, fuller picture of where the graphics architecture stands.  The analysis is typically great, Scott painting a very nice picture of what it's capable of when drawing pixels.

Check out their analysis in full, and we'll be along with our own look as well as soon as we can.
Discuss on the forums

Tagging

nvidia ± fermi, dx11, gf100


Latest Thread Comments (4019 total)
Posted by trinibwoy on Monday, 02-Aug-10 03:08:24 UTC
Did Nvidia beef up GF104's texture units? Was just browsing Damien's english review (http://www.behardware.com/articles/795-4/report-nvidia-geforce-gtx-460.html) and it seems FP16 and RGB9E5 are now full speed as opposed to half speed on GF100.

Image: http://img153.imageshack.us/img153/3681/texturing.png

Posted by Alexko on Monday, 02-Aug-10 08:39:45 UTC
Damien's reviews usually deserve a bit more than a quick browsing… :p

Quote
Moreover, the texturing units have been improved to filter FP16 textures (as well as FP11, FP10 and RGB9E5) at full speed.
http://www.behardware.com/articles/795-2/report-nvidia-geforce-gtx-460.html

Posted by trinibwoy on Tuesday, 03-Aug-10 04:05:24 UTC
Of course, thanks. Saw it on my second read through :) Wonder why they bothered.

Posted by Chalnoth on Tuesday, 03-Aug-10 06:23:29 UTC
Quoting trinibwoy
Of course, thanks. Saw it on my second read through :) Wonder why they bothered.
My first guess would be that it was something that was intended for the GF100 all along, but there was a bug in the hardware implementation that forced them to implement these modes with reduced performance.

As for why they would have wanted to go this route in the first place, well, that would make sense if they feel that these modes will become more and more common as time goes forward, and if the added hardware cost was minimal.

Posted by mczak on Tuesday, 03-Aug-10 12:54:13 UTC
Maybe the full-speed fp16 was just a later addition which didn't make it for GF100.
That said, it would imho make more sense for GF100 than GF104, since GF100 has lower tex:alu ratio (and also higher memory bandwidth / tex). Unless you think it doesn't matter for GF100 since it looks more useful for non-gaming usages anyway..

Posted by ShaidarHaran on Tuesday, 03-Aug-10 14:00:00 UTC
Interesting that the fp formats have seen performance increases from GF100->GF104, but the int formats have seen performance decreases. Also, there appears to be a hard cap @ 33.3 GTexels/s for 3 of the formats. Any thoughts as to what might be causing this? Is it a lack of cache or cache bandwidth? Some other architectural limitation? I don't think it's a lack of VRAM or VRAM bandwidth since GF104 out-performs GT200b in 2 of the 3 formats.

Posted by TKK on Tuesday, 03-Aug-10 15:38:10 UTC
Quoting ShaidarHaran
I don't think it's a lack of VRAM or VRAM bandwidth since GF104 out-performs GT200b in 2 of the 3 formats.
Also, if it was the case there should be a difference between the two GTX 460 variants, which isn't the case.

Posted by Gipsel on Tuesday, 03-Aug-10 16:57:42 UTC
Quoting ShaidarHaran
Also, there appears to be a hard cap @ 33.3 GTexels/s for 3 of the formats. Any thoughts as to what might be causing this? Is it a lack of cache or cache bandwidth? Some other architectural limitation? I don't think it's a lack of VRAM or VRAM bandwidth since GF104 out-performs GT200b in 2 of the 3 formats.
It's the theoretical max throughput of the 56 TMUs * 0.675 GHz = 37.8 GTexel/s. Obviously the efficiency (88%) is slightly lower than on AMD GPUs (~98% or so) for this simple tasks.

Posted by mczak on Tuesday, 03-Aug-10 18:34:21 UTC
Quoting Gipsel
It's the theoretical max throughput of the 56 TMUs * 0.675 GHz = 37.8 GTexel/s. Obviously the efficiency (88%) is slightly lower than on AMD GPUs (~98% or so) for this simple tasks.
I think the more interesting comparison is GTX470/480 - 60 TMUs *0.7 GHz = 42 GTexels/s and it is achieving 41.4 GTexels/s (for int8 only though) - 99%. So for some odd reason GF104 can achieve less of the peak potential of the tmus.

Posted by CarstenS on Tuesday, 03-Aug-10 19:20:50 UTC
I'm showing (almost) the same here. 33.8 GTex is the maximum i can get out of a stock GF104 with bilinear filtering. With trilinear it's a more expected 18.9 GTex/s. Together with the point sampling result of - again - 33.8 GTex/s I'm guessing, it's maybe interpolation or adress bound.

An HD5830 is literally miles away at 43.6 and 22.4 GTex/s.


Add your comment in the forums

Related nvidia News

CUDA 4.0 and Parallel Nsight 2.0 released
NVIDIA Fermi GPU and Architecture Analysis
NVIDIA's Parallel Nsight finally released
NVIDIA GeForce GTX 460 - GF104 breaks cover
PhysX87, ancient tragedy in 5 acts by RWT
So long, Chris, and thanks for all the fish
NVIDIA Fermi: new GPU architecture, starting with GF100
NVIDIA release OpenCL GPU drivers for Linux and Windows
NVIDIA GeForce GTX 275 at $250 to fight HD 4890
A look at NVIDIA's SLI Multi-OS and new Quadros