As Roadrunner Breaks Petaflop Barrier, Heterogeneous Computing in the SpotlightWednesday 11th June 2008, 03:20:00 AM, written by Carl Bender
This week, the world of computing officially entered the petaflop era with the announcement that the Los Alamos National Laboratory's 'Roadrunner' had become the first supercomputer in history to cross the psychologically significant threshold of a thousand-trillion calculations per second, achieving 1.026 Rmax PFlops in Linpack.
at over twice the speed of November 2007's Top500 champion - Lawrence
Livermore's BlueGene/L - the significance of the milestone is such that
just a week out from ISC '08 (June 17th-20th), the conference has added
a special panel discussion to their program entitled “RoadRunner - the
First Petaflop/s System in the World and Its Impact on Supercomputing.”
This session will be open to all show attendees. Fittingly, Roadrunner
also serves as the first major example of heterogeneous/hybrid
supercomputing at a time when the trends of massive parallelism,
heterogeneous system architecture, and consumer-driven chip design have
come to be seen by the HPC community as the keys to future application
performance gains in a post-clockspeed age.
At the heart of the Roadrunner lies an enhanced-DP derivative of the STI Cell Broadband Engine (PowerXCell 8i), capable of ~102 GFlops of dual precision performance per chip and featuring greatly expanded memory addressing capabilities. Composed of 6,948 dual-core Opteron chips and 12,960 PowerXCell 8i processors, individual compute nodes within Roadrunner consist of the fusion between one LS21 Opteron blade and two QS22 Cell blades in a custom tri-blade configuration. 3,456 of these nodes are present across 288 BladeCenter housings, which in turn are networked to one another by over 55 miles of fiber optic cabling.
Rather than the hardware though, the greatest technical hurdle faced by engineers and scientists working on the project was the task of porting algorithms to a massively parallel environment in which they would be required to run effectively across three disparate architectures (Cell itself being heterogeneous). With no guarantee that the effort required or resultant speedup would validate the $/FLop potential of the system, a year-long evaluation period was started in September of 2006 to explore the extent to which the Cell architecture could be leveraged as an accelerator within the hybrid environment; the ~$110 million contract was awarded after final evaluation results noted a successful six-fold performance increase across several key algorithms. Within the system itself, Opterons are tasked with general processing jobs such as handling file system I/O, while Cell processors are assigned the heavy lifting associated with the scientific simulations. In practice, the Cells contribute ~95% of the sustained Flops throughput of which Roadrunner is capable.
Although heterogeneous computing has been viewed with trepidation by the HPC community due to the challenges associated with the programing models involved, Roadrunner exemplifies the gains that may be obtained in both throughput and energy efficiency by leveraging the strengths of different architectures within the same system. A shoe-in for world's fastest supercomputer when the Top500 list is published on June 18th, the efficiency gains afforded by the hybrid approach will put it in the running for the title of world's "greenest" as well. At 376 MFlops/watt, Roadrunner exceeds by 19 MFlops/watt the UK's Science and Technology Facilities Council BlueGene/P system housed at Daresbury Laboratory, presently ranked as the world's most efficient supercomputer.
To be delivered to Los Alamos later this summer, Roadrunner will be open for several months to the scientific community at large, where it is expected to contribute to efforts ranging from bioinformatics to climate modeling. Los Alamos National Laboratory is currently taking applications from institutions seeking time with the system. After this window closes, the computer will begin its primary mission under the direction of the National Nuclear Security Administration of analyzing the United States nuclear arsenal through a series of test simulations meant to address both safety and viability.
RWT explores Haswell's eDRAM for graphics
ATI shoots a Bolt through its GPU compute stack
AMD releases CodeXL 1.0
Travelling in Style: Beyond3D's C++ AMP contest
Analysis of Ivy Bridge Graphics Architecture at RWT
RWT analyzes Kepler's architecture
Nvidia 680 GTX (Kepler) Released
Microsoft Releases C++ AMP Open Specification
Nvidia's 2x Guaranteed Program