Justin Boggs

Justin is one of AMD's senior developer relations engineers on the CPU side, helping game developers get the best CPU performance out of their products. He gave two presentations, which we'll coalesce into one page. The morning presentation led into the afternoon's topics. Both focused on AMD processor technology, with special attention given to their upcoming native quad-core implementations.

Corporate Housekeeping and High-Level Bits

Justin started by mentioning that even though AMD bought ATI, most at AMD see it as a merger of expertise, technology and products. As a CPU guy, hearing him say that x86 was pervasive and had application in graphics wasn't too surprising. Fusion was next on the agenda. Boggs confirmed that the products would appear as both MCMs and single-die chips. Justin discussed Fusion's tie to Torrenza as well. Torrenza itself was portrayed not only as a socket architecture for coprocessors but also covering processors on add-in boards in slots, encompassing both common methods of getting new silicon connected to a system.

Fusion was all about increasing the minimum PC spec, Boggs said to quiet cheers from the crowd. AMD will ensure that Fusion isn't integration for the sake of cost, and Boggs emphasised that the available compute power from a Fusion product would help lift the baseline of performance for systems that use it as their central processing devices.

AMD's fabs were up next, with Boggs talking about their in-progress 32nm fab in New York, the migration of Fab36 to 300/45 (mm/nm) from 300/65, and Fab30 getting a wafer size boost to 300/45 as AMD sells off the current 200mm equipment currently producing 65nm devices. Closer relationships with AMD's foundry partners are also on the cards, as AMD anticipates volume growth in the number of wafer starts it'll order; next month's Barcelona launch is one of the main volume growth drivers that AMD is anticipating.

Roadmap details like the DDR3 transition and Socket AM3, DX10 IGPs in 2008, and HyperTransport 3.0 and PCI Express Gen 2 in 2007 were all mentioned, and Boggs was keen to talk up Griffin on top of that. Griffin is its next generation mobile processor architecture, a first for AMD according to Boggs in that it has been engineered from the ground up as a mobile processor rather than a binned desktop die. DisplayPort comes in 2008, and indeed AMD have recently tested a GPU implementation with VESA. PCIe Gen2 will appear on mobile platforms in 2008 as well.

Native Quad Core Architecture Highlights

SSE4a support (a subset of 4 instructions from the min SSE4 implementation) in Barcelona has been known for a while, as has the architecture's 128-bit SSE FPU. Boggs mentioned overclocking potential due to the split power plane, while keeping the CPU within its defined TDP. He mentioned the more efficient memory controller (~85% efficiency apparently), the float IPC rate (four 64-bit IEEE754 ops per clock, eight single precision, split 50:50 ADD:MUL) and the fact that AMD have tweaked the memory controller to better feed four processor cores.

The software support for the new architecture is what most were interested in, though. Boggs talked about the AMD Performance Library, which will ship with Barcelona microarchitecture support for the performance-critical code sections on launch. APL 1.1 will support updated SSE routines for Barcelona processors, and the library is increasingly popular with game developers according to Boggs, with its support for image and signal processing functions at high speed on the processor.

Looking forward to compilers supporting Barcelona microarchitecture enhancements, Microsoft will have support for Barcelona (in terms of SSE4a, 128-bit SSE operations and knowledge of the cache hierarchy in particular) in their CLR (and presumably C) compiler due to ship with Visual Studio 2008. Indeed, the current beta versions, codenamed Orcas, already have some of that support built in, allowing developers to test performance-critical CPU code on Barcelona systems before the official launch of the tools next year. For cross-platform developers using GCC, that compiler has had support for the new architecture for a little while now, thanks to AMD's engineers, so use a recent GCC4 build to get that.

Note that you don't need one of these compilers to run code on Barcelona; they're just the current compilers that support the specific architecture improvements that'll help software performance on the CPU.

To end, Boggs quickly returned to the hardware side, mentioning that the RDTSC instruction (which reads the CPU's internal timestamp) is now invariant and will return the right value no matter what core it's run on, and all cores will report the correct value at all times to software. The invariance comes at a cost though, so if you're using it for timing in your application, beware--there's now a 60 clock latency, so sampling it repeatedly might cause some slowdowns you weren't previously experiencing.

Minor architectural details were also presented, but we'll leave that for a more in-depth architecture analysis after Barcelona launches.

So software support at the compiler level should be good for AMD going forward and the architecture improvements hint at increased performance in gaming workloads, especially those that make heavy use of the FPU and SSE.