[Analysis] TSMC 40G to deliver up to 3.76x the perf/mm^2 of 65G & Power Implications

Thursday 01st May 2008, 09:27:00 AM, written by Arun

It turns out TSMC's 40nm general-purpose process will be even more impressive than previously expected: we knew it was going to sport 2.35x the gate density of 65nm, but now it turns out it'll deliver 60% higher performance too for a theoretical 3.76x perf/mm² boost. But power improved much less.

TSMC's 55nm process improves power efficiency by up to ~10%, but performance remains unchanged; so that's a 1.23x perf/mm² improvement over 65nm. The latter was a 30-50% performance improvement over 90nm (and a bit less over 80nm), while also sporting ~2x 90nm's gate density. Similarly, 90nm was a 30-35% performance improvement over 130nm and it achieved ~2x 130nm's gate density.

So clearly the 40nm step will be a bigger step than usual. It's worth pointing out, however, that SRAM scaling from 65/55 to 45/40 is less impressive than gate density; it's only ~2x. That's a pretty sharp contrast to the 130->90nm transition, which was the exact opposite: ~2.45x the SRAM density and ~2x the gate density! It is unclear if this is mostly for technical reasons, or if customer focus on logic-rich chips had something to do with it.

Generally speaking, TSMC's gate density and performance figures often seem slightly overoptimistic, however they remain fairly accurate and aren't massive exaggerations, so this is certainly good news for TSMC's customers including NVIDIA and AMD. However, there's a catch: even if practical perf/mm² might improve by more than 3x over 65nm, power-per-transistor won't magically go down 67%. So if you thought perf/watt was important in the last few years, you haven't seen anything yet.

From an economic perspective, this also has very severe implications on NVIDIA and AMD's strategy and what constitutes the 'perf/mm² vs perf/watt' sweetspot. The problem is simple: if you double both performance and wattage for a given chip cost, your PCB and cooler costs go up. If that happens, your overall Bill of Material (BoM) for the final end-consumer product goes up. And that means the relative chip costs go down for a given retail price tag; so assuming the end-consumer market doesn't grow, the amount of money that goes to TSMC goes down - and assuming constant gross margins for NVIDIA and AMD, their revenue and profits also go down.

This is obviously a major problem and we predict that in the coming years, executives and the investment community will slowly realize that gross margins are an outdated concept for semiconductor companies; while they do matter, they shouldn't be the real focus. The real focus should be 'gross margins * percentage of BoM' because it turns out there are a lot of strategic dynamics that can affect the latter by a significant amount in the coming years. This is both true at the board level and the overall system level.

Those who take the naive approach and focus on gross margins exclusively, rather than how much gross profit they can extract out of a final BoM, are destined to fail horribly. This is why chip design & synthesis will be more and more about perf/watt rather than perf/mm², because it turns out the most perf/mm²-efficient design might actually result in lower gross profits for your company. Similarly, other more exotic ideas including embedded memory might prosper in certain segments of the market in such an environment, while the cost-efficiency of others (such as multi-chip designs) might become much more complex to estimate.

And it's not just about cost efficiency; it's also about thermal limits and the implications on the ultra-high-end. We're already very near those thermal limits today, so if perf/mm² grows much faster than perf/watt and cost-per-mm² doesn't grow too much either, then we risk being completely limited by thermals in the $500+ market. And don't kid yourself: there's no evidence of 32nm or high-k magically fixing this either. Once again, the only real solution is to emphasize perf/watt throughout the design and synthesis process at the expense of perf/mm². Both dynamic power and leakage are important, although obviously the former remains the predominant issue.

In related news, it looks like TSMC's plans for 32nm and high-k changed a bit in the last few months: it was previously indicated that that there would be a 40nm variant with high-k, and that the first 32nm variant would be low-power for handhelds. And now it turns out only 32nm will have a high-performance variant with high-k (aimed at the CPU market) and the first 32nm variant will be... general-purpose, which means usable for PC GPUs. This is in sharp contrast to the 90nm and 65nm nodes, where the low-power variant came significantly earlier than the general-purpose one. 32G risk production is slated for 4Q09, and 32LP is slated for 1Q10.

Why this sudden change? Likely two things: first, Larrabee. NVIDIA is TSMC's largest customer right now, and AMD is also amongst the the largest ones. So if they want TSMC to prioritize their 32G process to improve their chances against Intel, they've got a lot of weight to make that happen. Secondly, the power benefit from 40nm to 32nm when considering both dynamic power and leakage may not be so impressive, and many handheld SoC manufacturers are moving to single-chip solutions that integrate some RF and analogue. In that context, wafer prices and yields are likely more important than gate density improvements.

In conclusion, TSMC looks like they're delivering very well on their roadmap and getting ahead of everyone else in the industry, at the obvious exception of Intel. That doesn't allow them, however, to break the laws of physics; silicon still leaks, and power density is still going up. Both will have increasingly important effects on the semiconductor industry, and executive-level strategic decisions must be made with proper consideration of the severe changes ahead.


Discuss on the forums

Tagging

graphics ± tsmc


Latest Thread Comments (13 total)
Posted by Time on Tuesday, 06-May-08 06:56:27 UTC
Quoting Mart
There is a certain ceiling for the clock speed in the amount of power you can use and the heat you can dissipate. AMD's Phenom cannot be clocked much higher because it's allready using more than 125Watt. Correct so far, or do I get something wrong?
No, no, no, no, no. Check out the last slide on this page: http://www.bit-tech.net/news/2008/03/27/amd_announces_new_phenom_processors/1 . See the massive jump in TDP when going from 2.4GHz to 2.5GHz?

That's because AMD have set the allowable voltage for the 9850 higher than the rest of their quad core line, that's also why AMD's tri cores are stuck in the same TDP as their quad core speed-equivalents. As a result AMD are then able to grab any chips that didn't quite make the grade and sell them anyway.

But as to the reason for Phenom being clocked so low: it was designed to be efficient, it's targeting the (fast growing) high efficiency server area. That's maybe not the reason why it is clocked so low, it also has a wierd idle instability issue where it's unstable at idle but stable at load.

Anyway the only point that I'm trying to make is that it isn't a thermal issue that stops Phenom (or to be more accurate K10, K10h, Barcelona or Agena) being clocked higher.

Posted by Mart on Tuesday, 06-May-08 16:31:18 UTC
Thanks Arun, that really cleared things up for me :)

Quoting Time
Anyway the only point that I'm trying to make is that it isn't a thermal issue that stops Phenom (or to be more accurate K10, K10h, Barcelona or Agena) being clocked higher.
Point taken, thanks for clarifying!

Posted by aca on Wednesday, 07-May-08 09:57:58 UTC
Quote
In conclusion, TSMC looks like they're delivering very well on their roadmap and getting ahead of everyone else in the industry, at the obvious exception of Intel. *That doesn't allow them, however, to break the laws of physics; silicon still leaks, and power density is still going up.* Both will have increasingly important effects on the semiconductor industry, and executive-level strategic decisions must be made with proper consideration of the severe changes ahead.
The article provides a good discussion on the power matter. But I cannot really place the sarcastic remark that I put in bold above. To me, it kind of deprecates the level of the article. In order to preserve the same content, it would be better (imho) to write it like:"Despite good prospects concerning integration and bla bla bla , the power density was not improved. It is reckoned that the latter will have important effects bla bla bla". But anyways, the article provided good content. Interesting to read, and I hope to see more of these coming.

Posted by Arun on Wednesday, 07-May-08 12:52:00 UTC
Quoting aca
The article provides a good discussion on the power matter. But I cannot really place the sarcastic remark that I put in bold above. To me, it kind of deprecates the level of the article. In order to preserve the same content, it would be better (imho) to write it like:"Despite good prospects concerning integration and bla bla bla , the power density was not improved. It is reckoned that the latter will have important effects bla bla bla".
Well, in my mind, that doesn't really say the same thing. I could have been more detailed there though, I admit - my point was that obviously TSMC would be able to add a lot of value for their customers if power/perf scaled down as fast as perf/(mm²*wafer cost). So if there was a way to achieve that, they would have done it - but they clearly didn't see one.

So arguably lower power consumption would add more value for many of TSMC's customers than higher performance per transistor, yet they just can't achieve that. TSMC's statements about high-k indicate they don't think it'll be a major help either; so there are just incremental improvements on the horizon, and it'll just become more and more of a problem.

Quote
But anyways, the article provided good content. Interesting to read, and I hope to see more of these coming.
Thanks! :) I've been thinking of doing more of that kind of analysis in the form of articles, rather than as part of news pieces. Stay tuned...

Posted by Time on Wednesday, 07-May-08 23:49:14 UTC
Quoting aca
The article provides a good discussion on the power matter. But I cannot really place the sarcastic remark that I put in bold above. To me, it kind of deprecates the level of the article. In order to preserve the same content, it would be better (imho) to write it like:"Despite good prospects concerning integration and bla bla bla , the power density was not improved. It is reckoned that the latter will have important effects bla bla bla". But anyways, the article provided good content. Interesting to read, and I hope to see more of these coming.
Are you kidding, it's sarcastic remarks like that that keep me awake though press releases :razz:.

Back to the article: So heat density is going up and IHS can't handle it? Will ATI's ringbus design now be more useful compared to NVIDIA's crossbar?

Posted by aca on Thursday, 08-May-08 09:51:04 UTC
Quoting Arun
Well, in my mind, that doesn't really say the same thing. I could have been more detailed there though, I admit - my point was that obviously TSMC would be able to add a lot of value for their customers if power/perf scaled down as fast as perf/(mm²*wafer cost). So if there was a way to achieve that, they would have done it - but they clearly didn't see one

So arguably lower power consumption would add more value for many of TSMC's customers than higher performance per transistor, yet they just can't achieve that. TSMC's statements about high-k indicate they don't think it'll be a major help either; so there are just incremental improvements on the horizon, and it'll just become more and more of a problem.
Probably they know how they could add more value. Still, there is a difference in knowing the path, and walking it. It seems like TSMC is focussing more on the 32 node. Maybe they had issues with metal gates/high-k for the 45nm, and decided to move their efforts to the next node. And knowing the troublesome introduction of high-k dielectrics, it is no shame either.
So I think they will make a step forward again with their 32 products. These look quite promisig: triple gates, very low k isolation, high k dielectric completed with copper interconnects and metal gates. Basically all of these should provide power reduction in the sense of less capacitive loads (leakage, less current drive) and resistive losses. But it will definately be intersting to see how the W/um[sup]2[/sup] evolves and with what kind of designs customers will respond.

Posted by Arun on Thursday, 08-May-08 09:59:29 UTC
32LP and 32G won't support high-k/metal gates, only 32HP will. And TSMC in public statements didn't seem overly excited about high-k; i.e. it's an advantage, but not an overwhelming one given the wafer cost difference.

Anyhow, one thing I realize I might have wanted to make clearer in my news piece: while higher performance won't magically improve perf/watt, it *will* improve (perf/watt)/$ if used towards that goal rather than raw perf/$. This is because you can then sacrifice die area and performance in favour of power throughout the design and synthesis processes, and this gets compensated by the higher performance. So higher performance is always good; but it means the 'free lunch' for engineers is over (arguably has been for a while, but it's really gradual imo) and they'll need to start thinking about more complex trade-offs and start using that perf/mm² advantage as a way to improve (perf/watt)/$ instead.

Posted by aca on Saturday, 10-May-08 12:29:11 UTC
I agree. So how about the development in this power consumption synthesis? Haven't seen much of it. Aren't current approaches merely qualitative? If we are going this route, I expect that there should be more quantitative analyses for doing EDA. Especially in the field of layout.

Posted by silent_guy on Saturday, 10-May-08 13:18:15 UTC
Quoting aca
I agree. So how about the development in this power consumption synthesis? Haven't seen much of it. Aren't current approaches merely qualitative? If we are going this route, I expect that there should be more quantitative analyses for doing EDA. Especially in the field of layout.
I'm not entirely sure what you're hinting at. At this point, it is possible to estimate final power consumption with, say, 10% accuracy, based on just your RTL code. In many cases, this can be done even without any simulation stimuli, if you're being smart about toggle factors. (The current tools, like PT-PI, can estimate toggle rates of combinational gates based on the logic cone that drives it.)

When you're talking about power EDA tools for layout, you're really already so far ahead in the process that there's no room for design iterations: your estimates will just become more accurate.

What's really needed are guidelines about saving power on the high-level architectural level. This is much, much harder and doesn't go much farther than 'don't move data all around the chip' or 'try not to recalculate what you've already calculated. And even then I wouldn't expect miracles: at the end of the day, transforming the same input into the same final output, will still require a certain minimum of logical operations.

Posted by soylent on Saturday, 17-May-08 12:02:49 UTC
It was my understanding that the (non-leakage) power output of CMOS was effectively proportional to the cube of frequency(because P proportional to V^2*f; minimum voltage for which the IC is capable of operating roughly proportional to frequency), but only linear in surface area.Given that 3d graphics is inherently so embarassingly well suited for parallelization, can't GPU's capture a big increase in performance from a smaller process even if each transistor has the same power output as last generation by just using more of them operating at a slightly lower frequency?(assuming leakage is kept under control)


Add your comment in the forums

Related graphics News

Travelling in Style: Beyond3D's C++ AMP contest
Beyond Programmable Shading CS448s first slides available
Khronos release OpenGL 3.3 and 4.0
Mazatech release AmanithVG 4.0, supporting OpenVG 1.1
OpenGL 3.0 is here (finally)
Old News: AMD CTO resigns, NVIDIA CFO retires, DDR3 for MCP7A, S3, etc.
SwiftShader 2.0: A DX9 Software Rasterizer that runs Crysis
S3 launches DirectX 10.1 Chrome 400 GPUs
GPGPU and 3D luminaries join 3D graphics heavyweights
The Technology of a 3D Engine - Part One