As Roadrunner Breaks Petaflop Barrier, Heterogeneous Computing in the Spotlight

Wednesday 11th June 2008, 03:20:00 AM, written by Carl Bender

This week, the world of computing officially entered the petaflop era with the announcement that the Los Alamos National Laboratory's 'Roadrunner' had become the first supercomputer in history to cross the psychologically significant threshold of a thousand-trillion calculations per second, achieving 1.026 Rmax PFlops in Linpack.

Debuting at over twice the speed of November 2007's Top500 champion - Lawrence Livermore's BlueGene/L - the significance of the milestone is such that just a week out from ISC '08 (June 17th-20th), the conference has added a special panel discussion to their program entitled “RoadRunner - the First Petaflop/s System in the World and Its Impact on Supercomputing.” This session will be open to all show attendees. Fittingly, Roadrunner also serves as the first major example of heterogeneous/hybrid supercomputing at a time when the trends of massive parallelism, heterogeneous system architecture, and consumer-driven chip design have come to be seen by the HPC community as the keys to future application performance gains in a post-clockspeed age.

At the heart of the Roadrunner lies an enhanced-DP derivative of the STI Cell Broadband Engine (PowerXCell 8i), capable of ~102 GFlops of dual precision performance per chip and featuring greatly expanded memory addressing capabilities. Composed of 6,948 dual-core Opteron chips and 12,960 PowerXCell 8i processors, individual compute nodes within Roadrunner consist of the fusion between one LS21 Opteron blade and two QS22 Cell blades in a custom tri-blade configuration. 3,456 of these nodes are present across 288 BladeCenter housings, which in turn are networked to one another by over 55 miles of fiber optic cabling.

Rather than the hardware though, the greatest technical hurdle faced by engineers and scientists working on the project was the task of porting algorithms to a massively parallel environment in which they would be required to run effectively across three disparate architectures (Cell itself being heterogeneous). With no guarantee that the effort required or resultant speedup would validate the $/FLop potential of the system, a year-long evaluation period was started in September of 2006 to explore the extent to which the Cell architecture could be leveraged as an accelerator within the hybrid environment; the ~$110 million contract was awarded after final evaluation results noted a successful six-fold performance increase across several key algorithms. Within the system itself, Opterons are tasked with general processing jobs such as handling file system I/O, while Cell processors are assigned the heavy lifting associated with the scientific simulations. In practice, the Cells contribute ~95% of the sustained Flops throughput of which Roadrunner is capable.

Although heterogeneous computing has been viewed with trepidation by the HPC community due to the challenges associated with the programing models involved, Roadrunner exemplifies the gains that may be obtained in both throughput and energy efficiency by leveraging the strengths of different architectures within the same system. A shoe-in for world's fastest supercomputer when the Top500 list is published on June 18th, the efficiency gains afforded by the hybrid approach will put it in the running for the title of world's "greenest" as well. At 376 MFlops/watt, Roadrunner exceeds by 19 MFlops/watt the UK's Science and Technology Facilities Council BlueGene/P system housed at Daresbury Laboratory, presently ranked as the world's most efficient supercomputer.

To be delivered to Los Alamos later this summer, Roadrunner will be open for several months to the scientific community at large, where it is expected to contribute to efforts ranging from bioinformatics to climate modeling. Los Alamos National Laboratory is currently taking applications from institutions seeking time with the system. After this window closes, the computer will begin its primary mission under the direction of the National Nuclear Security Administration of analyzing the United States nuclear arsenal through a series of test simulations meant to address both safety and viability.

Discuss on the forums



Latest Thread Comments (4 total)
Posted by ShaidarHaran on Wednesday, 11-Jun-08 16:20:42 UTC
Yay for heterogenous computing! Anyone remember Intel's "Project Z" being discussed back in the Northwood P4C days?

Posted by Carl B on Thursday, 12-Jun-08 00:56:32 UTC
Things seem to arise and disappear all the time in terms of Intel's internal projects. I do remember the Project Z rumors, as I used to pay close attention to all the ex-Alpha movements, but... who knows. According to previous roadmaps of Intel's they should in theory have their own heterogeneous efforts in mind going into the next decade, but who knows the extent to which that is still at play. I suppose that Larrabee grafted onto some more mainstream chip would still qualify, but as it'd be fully ISA compatible, I don't know... I have trouble calling it truly heterogeneous.It'll be fun to see the Top500 list a couple of years from now though. Specifically, it'll be interesting to see what Intel's efforts are in the supercomputing space, and whether any supercomputers start being built from clustered GPGPU workstations - though it seems a lot more complex from a system/network perspective to harness that computational power at the scale of a supercomputing center than at the desktop level. But there will no doubt be something.And as a fan of it, I hope that a couple of more Cell-based systems make it onto the list as well. With the QS22 essentially the heart of it, should be easy enough for institutions to mirror the Los Alamos system, with a lot of the library/framework problems already solved and provided in IBMs tools.

Posted by Carl B on Tuesday, 17-Jun-08 01:35:28 UTC
Article on Roadrunner's first set of tasked simulations over on the Los Alamos website:
Less than a week after Los Alamos National Laboratory’s Roadrunner supercomputer began operating at world-record petaflop/s data-processing speeds, Los Alamos researchers are already using the computer to mimic extremely complex neurological processes... like the novel tasks/research they are assigning to the system before it goes all-nuclear.

Posted by Sxotty on Wednesday, 25-Jun-08 16:16:14 UTC
Quoting Carl B
Article on Roadrunner's first set of tasked simulations over on the Los Alamos website:

I like the novel tasks/research they are assigning to the system before it goes all-nuclear.
Awesome point. :)

Add your comment in the forums

Related News

Diving into Anti-Aliasing
RWT explores Haswell's eDRAM for graphics
ATI shoots a Bolt through its GPU compute stack
AMD releases CodeXL 1.0
Travelling in Style: Beyond3D's C++ AMP contest
Analysis of Ivy Bridge Graphics Architecture at RWT
RWT analyzes Kepler's architecture
Nvidia 680 GTX (Kepler) Released
Microsoft Releases C++ AMP Open Specification
Nvidia's 2x Guaranteed Program