Beyond3D - Stream Processors, Inc. at ISSCC '07

Stream Processors, Inc. at ISSCC '07 - Page 5

Published on 22nd Mar 2007, written by TeamB3D for JPR - Last updated: 4th Apr 2007

New, more conventional video codec offerings

Providing an interesting context at ISSCC for comparison to the general-purpose, highly programmable Storm-1 were two video codec chips, both from National Chung-Cheng University in Taiwan.

Click for a bigger version

In stark contrast to the Storm-1, both chips are highly focused, fixed-function ASICs. We hesitate to perform many accurate comparisons with either of these chips against the Storm-1, as we're not exactly talking apples-to-apples. But both are compelling evidence of what can be achieved when designers are pushed to reduce the die size and power budget as far as possible with a more limited set of codecs to support and no requirement on programmability.

The first chip can handle encode of a 1280 x 720p (HD) video stream, occupying a relatively small 18 square mm of core die area. The second is 100% dedicated hardware which can handle decode of MPEG-1, MPEG-2, MPEG-4 (simple) and H.264 (baseline) at up to 1920 x 1080 resolution (not clear, but we assume interlace). And it does it consuming a miniscule 5 square mm (0.13µ) and only 71 mW of power.

Sorting through the promises - past and present - for programmable media processors

SPI's Storm-1 is not the first programmable media processor pitched in the marketplace, not by a longshot. They've been pitched for years, and they've sure taken their share of the limelight in the press and conferences like ISSCC. So many vendors of seemingly countless programmable media processors have made the argument before: that configurable in software, they can be more or less future-proof in the world of video processing. Vendors like Cradle, BOPS, Equator, Philips with TriMedia, and lots of others I'm probably forgetting.

Despite repeated attempts to capture market share with programmable video processors, few if any have succeeded, at least if you define success by selling millions of chips and contributing to a healthy bottom line. (You could argue Sony's PS2 Emotion Engine and PS3 Broadband Engine fit the bill as programmable video processors, but they first and foremost justify themselves for 3D processing, not video).

Being future-proof requires more than just programmability, but forward-engineering performance as well. A big selling point of a programmable video processor is the flexibility to adapt to future codec standards, profiles and levels. But the problem with building a future proof programmable video processor is that the goal is inherently paradoxical. Build for tomorrow and you're not optimal for today.

Forget about guaranteeing future codec compatibility, though that has been a problem. Let's assume you were smart enough to design your programmable processor today - streaming, general-purpose, whatever - compatible with a new hot codec or profile to be standardized three years from now. Tough, but doable with a media-centric set of instructions.

But there's a bigger problem here. Compared to a focused, fixed-function approach, building in more general-purpose programmability will to varying degrees mean a bigger die, often a much bigger die. And that's been one big Achilles heel for past programmable media processors looking for success in volume markets.

Alright, we said we'd hesitate to compare the Storm-1 to the simpler fixed-function chips at ISSCC, but we can't help it. Just take a look at Chung-Cheng U's H.264 video encoder chip. It can do 1280 x 720p H.264 baseline encode with scalable (unspecified max) bitrate out. The Storm-1 can also do H.264 baseline encode, but at the higher 1920 x 1080p resolution, and also with scalable bitrate (up to 25 Mbps). At a minimum, Storm-1's workload is on the order of 2.25 times more than what the simpler fixed-function encoder can handle (again, caveats with the roughness of the comparison here).

Now take a look at the difference in die size and power consumption. The Taiwanese H.264 video encoder consumes about 470 kgates and 18 square mm. The Storm-1 consumes 34 million transistors, and a mere 6 lanes of the DPU - just a small portion of the overall die - consumes about 19.2 square mm (both 0.13 micron processes), more than the simpler chips entire core size. Though SPI declined to specify the Storm-1's die size, we estimate it from the micrograph in the range of 100 square mm, say 5X the size of the Taiwanese chip. Rough sure, but we're looking at something like 2 ¼ times the performance, but at a cost of 5 times the size (or even more).

But for the sake of argument, let's go one step further, forget these comparisons and instead accept that both of the following statements are true: a fully-programmable processor costs no more die area than an ASIC, and it's architecturally equipped to handle whatever operations some new codec or profile will throw at it.

It's still not enough. Because even if you've come up with a more-programmable solution without the typical downsides compared to a less-programmable solution, you still have to make sure the performance will suffice for that application of tomorrow for which you're promising the flexibility to support. Take a look back at the Storm-1. Fully programmable, it should be able to decode any profile or level of H.264, holding promise for future-proofness, right? But when you figure performance into the equation, it could certainly fall short. In its current incarnation, the chip is limited to Baseline profile for 1080p H.264. So what happens when Baseline profile is not enough?

When MPEG-4 came out, lots of vendors jumped on Simple profile. But then months later, service providers said sorry, Simple isn't enough, we need the Advanced Video Coding tools (which eventually lead to H.264, but that's another story). Even today, Baseline isn't enough for high-definition Blu-ray, for example, which specifies High profile. Ultimately, if the performance level isn't there, having the compatibility to support the standard or profile is moot.

But wait, a programmable processor vendor might say, we'll build in today the ability to scale to higher performance for tomorrow. Well, OK, but shooting for higher performance - what you'll need not today but a few years down the road - brings us back again to taking on more chip area and more cost (it probably means longer development time as well, but let that slide). And while every customer will tell you they want flexibility to adapt to foreseen - or unexpected - shifts in standards, profiles or levels in the future, mass market volume demands mass market pricing. If you want big volume, the price rules.

Selecting the higher cost programmable solution offering great flexibility that you may or may not need is risky. You might choose it, but your competitor may not. He might gamble that - say at least for the next 2-3 years - he'll be safe limiting himself to today's lower performance point, and save critical dollars on a minimal, fixed-function ASIC. Then you end up with a chip that costs significantly more than your competitor's in order to run the codecs you can both run today, and still no guarantee you can run the profiles and levels of a codec that will arrive some day, if ever.

Stream Processors, Inc. at ISSCC '07 - Page 5

New, more conventional video codec offerings

Sorting through the promises - past and present - for programmable media processors

Page Navigation