AMD Shangai die size disclosed; Fudzilla hints at 0MiB L3 chip
Monday 17th March 2008, 11:00:00 AM, written by Arun
Shanghai's die size is 243mm² (or 263mm² including test logic) and Nehalem's is 246mm² (or 265mm² including test logic). The number of dies per wafer is barely different if at all thus. Hans' analysis of both die shots seems plausible to us, although as far as we can tell the Nehalem that is pictured there is the 192-bit version aimed at the ultra-high-end and servers, not the 128-bit + PCI Express version.
Interestingly, Nehalem's core size excluding L2 is massively larger than Shanghai's: 24.4mm² versus 15.3mm². Multiple factors are at play there, including Nehalem's support for two threads per core (ala HyperThreading, but it is rumoured to be much more efficient). However, it seems incredibly unlikely that Shanghai will come anywhere near performance leadership against Nehalem given that core size deficit and Intel's leaked performance estimates.
On the other hand, it looks like AMD will have one chip that gives them a cost advantage against Nehalem: Propus. It's a 0MiB L3 version of Shanghai/Deneb according to Fudzilla (which has been surprisingly accurate on AMD rumours lately) and should be 30%+ smaller than Nehalem, which matches our own estimate of ~170mm² based on Shanghai's die shot. Even more positive is the fact Fudo claims L3 only improves performance by 5-10% on average and often less in real-world benchmarks. We believe this might be due to the increase in memory latency for cache misses caused by L3, as experienced on Barcelona.
Tagging
Related amd News
AMD launches FireStream 9250 with 200Gflops DP via RV770
AMD GPGPU solutions get extra support from industry partners
AMD Phenom X3 released; reviewed
Rage3D take a look at Assassin's Creed D3D10.1 support
Stanford University release Folding@Home client for R6-family ATI GPUs
Official: AMD layoffs 10%; misses Q1 guidance.
AMD release FireGL V7700 with DisplayPort support
AMD release new Phenom X4 processors with B3 silicon
AMD RV670 price cuts & 128-bit Radeon HD3830?


A shared last-level cache is helpful with reducing coherency traffic, and any high capacity cache is very useful for servers, even if the L3 is slow.
Barcelona's L3 was less than impressive because was slow and small.
The shared cache is helpful in some cases on the desktop, though the implementation is so dog slow in some cases that it was only marginally better than the shared FSB Core2 uses.