AMD Shangai die size disclosed; Fudzilla hints at 0MiB L3 chip

Monday 17th March 2008, 11:00:00 AM, written by Arun

AMD displayed wafers of its 45nm Shanghai chip at CeBit, and Hans de Vries took the opportunity to create a comparison picture between Shanghai and Nehalem (scroll to the middle of page). Surprisingly, their die size is just about identical, so AMD doesn't have the previously expected die size advantage. Or do they? Fudzilla claims that a 0MiB version of Shanghai/Deneb, codenamed Propus, is also coming and that it should be more than 30% smaller than Nehalem.

Shanghai's die size is 243mm² (or 263mm² including test logic) and Nehalem's is 246mm² (or 265mm² including test logic). The number of dies per wafer is barely different if at all thus. Hans' analysis of both die shots seems plausible to us, although as far as we can tell the Nehalem that is pictured there is the 192-bit version aimed at the ultra-high-end and servers, not the 128-bit + PCI Express version.

Interestingly, Nehalem's core size excluding L2 is massively larger than Shanghai's: 24.4mm² versus 15.3mm². Multiple factors are at play there, including Nehalem's support for two threads per core (ala HyperThreading, but it is rumoured to be much more efficient). However, it seems incredibly unlikely that Shanghai will come anywhere near performance leadership against Nehalem given that core size deficit and Intel's leaked performance estimates.

On the other hand, it looks like AMD will have one chip that gives them a cost advantage against Nehalem: Propus. It's a 0MiB L3 version of Shanghai/Deneb according to Fudzilla (which has been surprisingly accurate on AMD rumours lately) and should be 30%+ smaller than Nehalem, which matches our own estimate of ~170mm² based on Shanghai's die shot. Even more positive is the fact Fudo claims L3 only improves performance by 5-10% on average and often less in real-world benchmarks. We believe this might be due to the increase in memory latency for cache misses caused by L3, as experienced on Barcelona.


Discuss on the forums

Tagging

amd ± intel, shanghai, nehalem, propus, 45nm


Latest Thread Comments (16 total)
Posted by 3dilettante on Monday, 17-Mar-08 15:09:18 UTC
Fusion doesn't seem applicable to the concerns the L3 was supposed to address, though the L3's presence outside of the server market may have more to do with AMD's limited ability to design multiple cores.

A shared last-level cache is helpful with reducing coherency traffic, and any high capacity cache is very useful for servers, even if the L3 is slow.
Barcelona's L3 was less than impressive because was slow and small.

The shared cache is helpful in some cases on the desktop, though the implementation is so dog slow in some cases that it was only marginally better than the shared FSB Core2 uses.

Posted by Jawed on Monday, 17-Mar-08 16:01:15 UTC
Quoting 3dilettante
Fusion doesn't seem applicable to the concerns the L3 was supposed to address, though the L3's presence outside of the server market may have more to do with AMD's limited ability to design multiple cores. A shared last-level cache is helpful with reducing coherency traffic, and any high capacity cache is very useful for servers, even if the L3 is slow. Barcelona's L3 was less than impressive because was slow and small. The shared cache is helpful in some cases on the desktop, though the implementation is so dog slow in some cases that it was only marginally better than the shared FSB Core2 uses.
I'm thinking of coherency between CPU and GPU cores quite specifically as well as whether a GPU core would benefit in its own right in using L3 (bearing in mind that the GPU in a Fusion configuration is stuck with a miserly 10-25GB/s). I'd hope by the time Fusion arrives the GPU performs dramatically better than 780G's GPU... Jawed

Posted by Arun on Monday, 17-Mar-08 16:20:17 UTC
3dilettante, Fudo has said explicitly there would be a chip without the L3. It won't just be disabled.

Posted by 3dilettante on Monday, 17-Mar-08 17:04:34 UTC
Quoting Jawed
I'm thinking of coherency between CPU and GPU cores quite specifically as well as whether a GPU core would benefit in its own right in using L3 (bearing in mind that the GPU in a Fusion configuration is stuck with a miserly 10-25GB/s).

I'd hope by the time Fusion arrives the GPU performs dramatically better than 780G's GPU...

Jawed
I dunno.
My expectations for the early iterations of Fusion are pretty low when it comes to the level of integration we can expect.

AMD's more conservative approach might mean they'll just slap a GPU core on-die or on-package with no real attempt to add any real coherence.
Since the mobile chips that Fusion debuts on are also the first chips with on-die PCI-E, it might mean that the GPU will just sit on one side of the PCI-E bridge, which would probably rule out any coherency at all.

If the GPU can take advantage of the cache hierarchy, I hope there would be some way to control what is kept coherent. Only certain kinds of data would be expected to be shared between the CPUs and the GPU, and any other traffic would probably pollute the L3.

Quoting Arun
3dilettante, Fudo has said explicitly there would be a chip without the L3. It won't just be disabled.
I think this makes sense, but I'm waiting for more data or confirmation.

An L3-free 65nm Phenom would have made sense too, but it hasn't happened for some reason.

Posted by Farhan on Tuesday, 18-Mar-08 02:15:46 UTC
Quoting Jawed
I'm thinking of coherency between CPU and GPU cores quite specifically as well as whether a GPU core would benefit in its own right in using L3 (bearing in mind that the GPU in a Fusion configuration is stuck with a miserly 10-25GB/s).

I'd hope by the time Fusion arrives the GPU performs dramatically better than 780G's GPU...

Jawed
I don't know if you would want your GPU to compete with your CPU for L3 cache if you're not sharing much data. Current GPUs don't use a cache coherent connection to the CPU anyway right? So unless it is something more integrated than just a CPU and GPU together on one package/die, i don't see why you would want to unify the caches. You'd get more bandwidth if the GPU had its own cache anyway.

Edit: oops didn't notice that 3dilettante pretty much said everything i wanted to say already :P

Posted by bearmoo on Wednesday, 19-Mar-08 07:18:51 UTC
but I do remember reading somewhere just a couple days ago where a AMD guy said what they are doing is much more than slapping two cores in one package. Who knows it could be just marketing talk. I also recall a recent Fusion slide where they showed 3 squares representing cores(2cpu, 1gpu)with a big rectangular chunk of cache underneath it.

Posted by bearmoo on Wednesday, 19-Mar-08 07:55:51 UTC
aha! found it

“*You can integrate a CPU and a GPU by having an internal PCI-E bus,” said Hester. “But we’re trying to do a much tighter integration so that we get the best possible power efficiency. *Putting more and more cores that use up more power but don’t change the user experience is not a good thing.” This tighter integration apparently involves having all the accelerators on one die.

http://channel.hexus.net/content/item.php?item=12024&search=AMD%20Fusion

Posted by 3dilettante on Wednesday, 19-Mar-08 14:15:50 UTC
Now we just need to know if he means Swift or he's saying their eventual goal is to have higher integration.

AMD's being forced to be more conservative about platform introductions makes me wonder if they can resist the temptation to slap the GPU on PCI-E, or whether another design delay will leave them with little choice.

Posted by Sound_Card on Friday, 21-Mar-08 17:16:49 UTC
All presentation slides indicate to me that their is high integration in fusion. http://static1.photo.sina.com.cn/bmiddle/4d7e9f774416dbe585d10http://images.tomshardware.com/2006/12/14/slide_apu.jpghttp://64-bit-computers.com/wp-content/amd_apu.gif

Posted by 3dilettante on Friday, 21-Mar-08 17:24:27 UTC
I only got the last link to load, but anything dealing with the APU stuff is long-term.
The conceptual drawings are not exact enough to indicate much about what the first Fusion product will look like.

AMD previously set out a pretty gradual route for Fusion, starting with mostly separate and becoming more integrated over time.


Add your comment in the forums

Related amd News

Beyond Programmable Shading course notes available
AMD launches FireStream 9250 with 200Gflops DP via RV770
AMD GPGPU solutions get extra support from industry partners
AMD Phenom X3 released; reviewed
Rage3D take a look at Assassin's Creed D3D10.1 support
Stanford University release Folding@Home client for R6-family ATI GPUs
Official: AMD layoffs 10%; misses Q1 guidance.
AMD release FireGL V7700 with DisplayPort support
AMD release new Phenom X4 processors with B3 silicon
AMD RV670 price cuts & 128-bit Radeon HD3830?