NVIDIA is demoing a new in-house 3D GUI to showcase the APX 2500's capabilities.

Introduction

NVIDIA has just announced the 65nm APX 2500, an application processor supporting 720p H.264 video, OpenGL ES 2.0, and HDMI output. On the processing side, it sports an ARM11 core at 750MHz. We had a quick chat with Mike Rayfield and touched on a variety of subjects and interesting design choices...

We’ll begin by listing the chip’s specs in a really concise way:

• ARM11 MPCore CPU @ 750MHz, Single-Core.
• OpenGL ES 2.0 GPU, including CSAA support.
• 720p H.264 Baseline/MPEG-4/VC-1/WMV9 Decode.
• 720p H.264 Baseline/MPEG-4 Encode.
• 12MPixels Camera Sensor support.
• Dedicated hardware blocks for Audio & JPEG.
• Image Signal Processor (superset of the GoForce 5500’s).
• Dual Display support: 720p HDMI + 1280x1024 LCD/CRT.

A very good way to understand what this chip is capable of is to look at what NVIDIA is demoing at the Mobile World Congress: a prototype APX 2500 is connected via HDMI to a 60” LCD and decoding a 720p 30FPS H.264 Baseline video stream with a bitrate of 14Mbps. And products based on the chip should be able to do that non-stop for 10 hours per battery charge (when using, IIRC, an iPhone-sized battery). It would obviously be lower if played on the phone's own screen though, but there's not much to be done about that.

There are a number of things to discuss, but first let’s begin by a disclaimer – because of a lack of time, we didn’t get much information on the 3D part of the chip yet. Don’t worry though, we hopefully will in the next couple of days! As for availability, the chip is sampling today and will enter mass production in June. It is expected to be used in portable navigation devices by late 2008, in portable media players by early 2009, and in mobile phones by late 2009.

Application Processing

NVIDIA uses an ARM11 MPCore in the APX 2500, but it's only single-core. So why even bother with MPCore? The reasons are simple: they already had the license from a few years ago, it's in no way worse than a plain ARM11, and they thought it'd be good experience for the chips they'll make in the future.

At the same time, given that this is a high-end product, why not use a Cortex-A8? Because it's much larger and perf/watt is not necessarily better or even as good. You're also forced to use ARM's NEON unit, which is massive overkill if you've got your own hardware for multimedia processing.

However, since ARM11 is slower per cycle, you risk not being fast enough. NVIDIA's solution there is to clock the ARM11 at up to 750MHz, which we believe to be the fastest 65nm ARM11 announced in the handheld industry. How did they do it? No full custom, but plenty of advanced circuit techniques and gate engineering.

Based on a little bit of Googling, we found out that should result in performance of 920 Dhrystone MIPS, while TI's OMAP3430 should deliver 1100 Dhrystone MIPS with its 550MHz Cortex-A8. So definitely pretty close and likely a decent trade-off, given the noticeably smaller die size and possibly lower power.

What gate engineering implies is the ARM11 core uses gates that have higher performance, but also higher leakage than those in other parts of the chip. That's bad, right? Well, not really, because it also implies lower voltage for a given frequency, and thus potentially lower dynamic power.

Rayfield also mentioned they way they handled memory gave them an advantage to achieve higher clocks. We aren’t sure, but we wouldn’t be surprised if it meant they’re using ARM’s Advantage Memories (or something similar) for the caches, rather than the foundry’s standard SRAM.

The effects of leakage on standby power also aren’t a problem because NVIDIA deployed power islands/shutoff aggressively (more on that later). Overall, it’s hard to conclude how good all these trade-offs really are in practice without the raw data only engineers have, but it definitely sounds very sensible on paper.