This is the first of maybe a few articles following my trip to Mobile World Congress 2011, although this specific article also includes an overview of various companies which did not show anything publicly at MWC to put everything in context. The last page (on discrete vs. integrated basebands and smartphone market predictions) is my personal opinion after discussing these subjects with a number of different people and companies at MWC and hearing very different takes on the subject.

NVIDIA Kal-El: Earlier than Expected


The chip formerly known in the industry as Tegra 3 (final name yet to be decided) gets the honour of being discussed first for a very simple reason: it will be the first to market. NVIDIA claims that it will be available in tablets by August and in phones by Christmas even though it only got back from the fab on February 3rd. That may seem too aggressive given the usual design times in the handheld market, but it's actually much more realistic than it looks! I've had the opportunity to speak quite a bit to Mike Rayfield (General Manager for the Mobile Business Unit) at MWC and drilled him on the seemingly unbelievable timeframes, and came out convinced that they can probably pull it off.

Tegra 2 started sampling to tablet customers in July 2009, but that was only the 23x23 package with 0.8mm ball pitch (and DDR2) for lower-cost tablet PCBs. The 12x12 0.4mm ball pitch package (with stacked LPDDR2) for phones only started sampling in Q4 2009 (NVIDIA wouldn't confirm it's actually the same chip, but presumably it is and the difference in timeframe is only a question of priorities). The smartphone OEMs apparently only started showing very strong interest at CES 2010, and NVIDIA only became aware of the details of specific projects in exec talks at MWC 2010 (although PCB design probably started a fair bit before that). As for tablets, obviously the main delay there was on the software side, and if it wasn't for that it could have shipped in that market much earlier.

Kal-El is already sampling to lead customers for both smartphones and tablets (they'll even be part of the debug process) but here's the critical bit: many customers already have their own PCBs ready and will immediately be testing the chip on them rather than only on NVIDIA's reference platform. This requires a level of customer engagement and confidence that is massively beyond not only what NVIDIA had with Tegra 2 but is also significantly higher than the industry average for application processors. One reason for this is that OEMs are realising that ultra-high-end devices without an ultra-high-end processor just won't sell, so it has become critical to be amongst the first on new high-end SoCs; NVIDIA is making it possible to go from chip to product much faster for those who are willing to make the effort.

At this stage, NVIDIA claims that their partners are further down the development process of end-user devices than they were for Tegra 2 at MWC 2010 - in fact they've already known the details of specific projects for some time. All the auxiliary chips on customer PCBs have been tested by NVIDIA and should work on Kal-El. Another advantage is that the software is extremely compatible with Tegra 2 - practically all the new features are exposed with no extra effort whereas Tegra 1 to Tegra 2 required quite a few software changes to benefit from more functionality. So they are extremely confident that it will already be very mature for the first devices because they know Android very well by now and the internal architecture is very similar to Tegra 2.

Kal-El might be on track to disrupt the industry's time-to-market dynamics, but what exactly is it anyway? Here are the specifications we know so far:
  • TSMC 40LPG process (because 28LPT and 28HPM wouldn't be ready in time for Q2 mass production)
  • Quad-Core Cortex-A9 with NEON (unlike Tegra 2) and Asynchronous Clocks (unlike all other A9s)
  • 12-core GPU (3x perf vs 8-core Tegra 2, probably just 2x the Pixel Shaders and higher clocks)
  • 1440p Video Decoder (1080p Blu-Ray including 60Mbps peaks, probably some dual-stream support)
  • 32-bit DRAM (no confirmation of DDR3 support, but likely up to 1066MHz LPDDR2 & 1333MHz DDR3)
  • 80mm² die size according to AnandTech (likely based on dinner conversation with Phil Carmack)
  • 14x14 package for smartphones (too many pins to keep 0.4mm pitch with Tegra2's 12x12 package)
  • 5x Tegra 2 performance (does not match other specs, NV claims they'll justify it later...)
Kal-El is a very impressive chip relative to the competition for 2011 tablets and smartphones, but NVIDIA also showed a roadmap that goes all the way up to 2014. Their goal was clearly to impress the press, but it had quite the opposite effect on me: if Kal-El in 2011 is a 5x boost over Tegra 2 but Wayne in 2012 is only a 2x boost despite being on 28nm, that means it's likely an even smaller architecture change. That also means the new GPU architecture, which was rumoured for this generation of hardware, possibly won't come until Logan in 2013, and neither will the Cortex-A15 (which will lead to interesting comparisons between 4xA9 vs 2xA15). The good news is that would allow Wayne to be cheap enough to target much more than just the ultra-high-end market. On the other hand, the competition certainly isn't standing still with their first 28nm chips...

ST-Ericsson A9600: Stealing the Show


It's not often that a product announcement gets a bit of spontaneous applause at an analyst event, but in the ST-Ericsson Nova A9600's case it was clearly warranted. The A9600 will likely be the most impressive chip of its generation with not only class-leading performance (dual-core 2.5GHz A15s and Imaginations Series 6 GPU cores) but also extremely innovative power saving techniques, which ST-Ericsson said they will talk about later this year - I managed to hear about several of those (mostly off the record), and I think it's fair to say they've exceeded even the long-term(!) expectations I set in my article on Handheld CPUs, especially on the overall system front.

Here's just one bit that's pretty easy to figure out with public information: ST-Ericsson claims 20K DMIPS, but there's no way to reach that number with only 2xA15. It's actually a heterogeneous CPU subsystem with one or more smaller CPU cores (Cortex-A5 or Cortex-A9?) next to the Cortex-A15s. They can be used to run the OS and background tasks (or just any lightweight workload) whereas the OMAP5's Cortex-M4s can only handle specific SoC subsystems. The easiest way to get to 20K DMIPS seems to be 2xA15+1xA9, but it could be more complicated than that... And as I said, there are quite a few more power saving innovations here than just heterogeneous multiprocessing.

Of course, the other very exciting part of the A9600 is the Imaginations PowerVR 'Rogue' GPU (aka Series 6). It is implemented in a multicore configuration and ST-Ericsson claims 210GFlops, more than 5GPixel/s visible fillrate (more than 13GPixel/s with PowerVR's 2.5x marketing multiplier for TBDR) and 350 million 'real' triangles/s. These numbers imply either 8 TMUs @ ~667MHz or 12 TMUs @ ~450MHz, and given the 2.5GHz frequency on the CPU and some of the power saving tricks I know about, I'd much rather bet on the former. It also implies 40 flops/TMU, which may seem strange but could be easily explained in at least three different ways: 1) 5-way FMAC with 4 shader cores per TMU, 2) 4-way FMAC with 4 shader cores per TMU + interpolation (on or off shader core?),  3) 4-way FMAC with 5 shader cores per TMU. All of them (and more) are perfectly plausible, so let's leave it at that.

Compared to the SGX544s in OMAP5, that's more than twice the GFlops per TMU, and also significantly improved API support (OpenGL ES Halti and OpenCL Full Profile with on-chip local memory from what I've heard on the grapevines). Apparently two key reasons why ST-Ericsson went for Rogue rather than ARM's Mali-T604 is time-to-market (implying later RTL delivery for T604 despite the earlier announcement) and ST-Ericsson's confidence in the OpenCL implementation, but they will definitively stick to Mali-400 in the low-end and will continue to evaluate future ARM cores. They had very good things to say about the Mali-400's area efficiency and don't see any problem with sticking with a dual-supplier strategy going forward if necessary.

There's little information on the other subsystems, but it does support 1080p 120fps video decode (i.e. 60fps 3D decode and presumably 1080p30 3D encode, maybe 1440p 2D decode like NVIDIA Kal-El) and at least dual-channel/64-bit memory (presumably LPDDR2/DDR3). It's made on a 28nm High-K process with Triple Gate Oxide (like Kal-El and many other modern SoCs to be able to combine both low leakage and high performance transistors on the same chip), probably multi-sourced (GlobalFoundries and maybe TSMC). It's not clear if it supports some of the most interesting next-gen I/O features like USB3 (for faster multimedia sideloading and interestingly also faster charging) but it seems pretty likely. Sampling is expected in late 2011, which makes it the first Imaginations Series 6-based SoC, and probably the second with Cortex-A15, one behind only Texas Instruments and maybe Samsung. That small delay looks to be worth it given the amount of innovation and sheer performance of the A9600.