Intellectual Property, Programmability and Transistor-Level Efficiency

Because many handheld chipsets are SoCs and there are plenty of different companies developing such products, it makes sense that not everything in them is developed in-house. That's where intellectual property (IP) comes in; it saves development costs and reduces risks, so most of the time there's not much to complain about if its capabilities fit your needs. Licensable IP blocks range from DSPs right through to wireless functionality and CPUs.

One IP company every hardcore 3D enthusiast will most likely know about is Imagination Technologies, whose divisions include PowerVR. The business model is simple: if someone wants to embed your design in a SoC, you receive a license fee and shipment-based royalties. This reduces expenses and risks for both the IP provider and the SoC manufacturer; although it tends to be less profitable to sell IP than actual chips. Furthermore, it might not make that much sense for very high-volume parts, since the economies of scale might then support in-house development.

There is one IP company which you generally can't escape from when making a SoC, though: ARM. They've got a very simple competitive advantage: they own the rights to the leading handheld CPU instruction set. This is partially similar to Intel's advantage with the x86 instruction set, although to an even larger extent because the only other company who has the rights to the instruction set is Marvell, through their XScale product line (purchased from none other than Intel in 2006), which they only sell as standalone chips, not for SoC licensing. As such, ARM fundamentally cornered a large and mildly lucrative market. The only companies which can bypass them are those who either do not care about application compatibilit, or do have their own in-house instruction-set. One noticeable example of that is Renesas, with the SH-Mobile application processor.

So, the next question is - what are you going to use an ARM or an XScale core for, anyway? Well, let us once again consider the iPod 5G and the iPod Shuffle 1G's examples. The PortalPlayer chip on the former is composed of two distinct ARM7 cores, while the SigmaTel core of the latter includes a DSP, but no real general-purpose CPU. So, both approaches handle sound processing and decoding just fine - but the DSP can't run an OS or mini-games, while an ARM7 core can. Obviously, even if we excluded the necessary LCD controller, the iPod Shuffle would need an extra chip to handle CPU-like functionality if it was to support a screen. So it's a perfect product for that specific product line, but not much more than that.

Clearly, an ARM7 CPU can do sound decoding and a fair bit more than that - and there are companies out there using ARM11s or ARM Cortex cores for video processing, too. But capabilities are only one (albeit big) part of the equation - power consumption also matters a lot, and even more so with certain customers such as Apple. And, generally speaking, general-purpose processors and power-efficiency are two things that don't go so well together.

When focusing on power-efficiency, you'd ideally want to hand-design everything and make all blocks fixed-function. This is generally not viable, though, for two big reasons: first of all, this would significantly increase engineering expenses and negatively affect time to market; and secondly, this would greatly reduce design flexibility. Imagine if you had no programmable processor, just fixed-function blocks for three major video codecs. If potential customers are looking for something that supports a wider variety of codecs, you are at an obvious competitive disadvantage. A company with at least some programmable logic might be able to handle that situation much better thanks to some software magic.

Furthermore, it has to be considered that if you need to support a bigger number of codecs, creating fixed-function blocks dedicated to each and every single one of them would waste die space, and reduce cost-competitiveness. Thus, it makes sense to try to share functionality between the various codecs when possible. As a logical extension of that, it could justify just adding one or more programmable CPU core(s); that might just be the easiest way to share functionality and minimize die size. It might not be the most efficient way to do it, but on the plus side of things, it does buy you a lot of software-level flexibility.