IBM unveils HPC-targeted Cell processor at Cool Chips

Friday 20th April 2007, 06:06:00 PM, written by Carl Bender

IBM formally introduced the HPC-targeted version of the Cell Broadband Engine processor at this years recently concluded Hot Chips conference in Tokyo, featuring greatly expanded dual-precision FLOPS performance and the ability to address 16GB of DDR-2. To be used within Los Alamos National Laboratory's Roadrunner supercomputing project - expected to be the world's fastest at 1.6 Petaflops - the HPC revision of Cell's SPEs brings the dual-precission capability of Cell to just over 100 GFLOPS; up nearly four-fold from the original. Changes to the revised Synergistic Processing Elements include lowering DP cycle latency from thirteen cycles to nine and DP pipelining with full dual-issue capability. The SPEs also move closer to full IEEE compliance with the introduction of 'Denormal' and 'Expected NaNs' support, and make use of five new DP "compare" instructions introduced in SPE ISA 1.2.

The cost in transistors to accomplish this shift has been minimal, with the new revision weighing in at 250 million: only 9 million transistors more than the original.

The move to address a larger memory pool via the introduction of a DDR2 memory controller may be equally as important a selling point to potential Cell HPC customers as the DP performance itself. Previously, the low cap on memory association per socket - brought on through the original use of Rambus XDR - severely limited the Cell's utility in memory intensive computing applications where its processing throughput might otherwise have been quite appealing. The move to DDR2 has come at a price in terms of packaging complexity, however, with the pin count required making an enormous leap in order to compete with the memory bandwidth of the previous XDR-based design.

The chip also reflects what may be a problematic move at IBM down to 65nm fabrication. Although Cell in particular was called out at ISSCC 07 by IBM as being a difficult chip to shrink due to the arrangement of its cores and their associated memories - the primary concerns being power/current related - the new revision is fabbing at a size of 212 square millimeters, a meager 10% die size reduction from the original's 235. In addition, for operation at the same frequency of 3.2GHz, chip wattage is pegged at 100W vs a claim of 110W for the original. Since the wattage figures seem to stand in contradiction to the 65nm operating numbers released by IBM at ISSCC 07, it could be that part of the reason the HPC variant of the Cell runs hotter than its non-HPC 65nm cousin (and not much cooler than its 90nm predecessor) is that it relies on a thirstier memory controller.

Quibbles aside, the new HPC Cell variant still compares very favorably along the DP FLOP/watt axis, with IBM pegging FLOP/watt performance at nearly three times Intel's Clovertown. With Cell having gained some traction of late in the fields of imaging, video encoding, and defense, IBM feels the changes made in their new HPC revision will help in further broadening the potential customer base for the architecture.



Related News

Diving into Anti-Aliasing
RWT explores Haswell's eDRAM for graphics
ATI shoots a Bolt through its GPU compute stack
AMD releases CodeXL 1.0
Travelling in Style: Beyond3D's C++ AMP contest
Analysis of Ivy Bridge Graphics Architecture at RWT
RWT analyzes Kepler's architecture
Nvidia 680 GTX (Kepler) Released
Microsoft Releases C++ AMP Open Specification
Nvidia's 2x Guaranteed Program