CUDA 4.0 AnnouncedMonday 28th February 2011, 07:36:00 PM, written by William
Consider just how much progress the GPGPU market has made in a relatively short period of time. It was only a couple of years ago I was playing around with Brook+ (as you can imagine that was not a pleasant experience). There is certainly plenty of work still to be done, but it is hard to be disappointed in progress that is being made. NVIDIA's CUDA 4.0 will be the latest example of this progress. In this release, NVIDIA has again lowered the barriers of entry for prospective developers (whether they are low enough is another discussion for another day).
One of the main focuses of CUDA 4.0 is to make developing
for multiple GPUs easier. NVIDIA’s big
improvement here is the release of GPU Direct 2.0. GPU Direct 2.0 enables peer-to-peer access,
transfers, and synchronization between Fermi-based GPUs. Previously, the GPUs had to utilize the
host’s CPU/main memory in order to communicate with each other. Now that the CPU is no longer involved, it
should be obvious how much more efficient data transfers between GPUs will
be. In addition, CUDA 4.0 will support
modified MPI implementations (this does not mean NVIDIA is providing their own
MPI implementation). This will help attract
HPC developers who are experienced with or accustomed to dealing with MPI. It should be noted that while support for MPI
implementations will work on any Fermi GPU, peer-to-peer transfers will only
work on Fermi-based Tesla GPUs (normal desktop users need not apply). [Update 4/12/11] As of RC2, Nvidia has
reversed course and allowed peer-to-peer transfers to work on any Fermi
The other big news in CUDA 4.0 is NVIDIA's Unified Virtual Addressing. UVA unifies the system’s memory and each GPUs memory into a single address space. Prior to CUDA 4.0, the host and each GPU had to have its own addressing space. UVA is a significant feature because not only will it immediately make a developer’s life easier, but it is bound to play an enormous role in CUDA’s future. It should be pointed out though that since the UVA is 64-bit, a Fermi GPU is required (my G92 is starting to show its age).
NVIDIA has also made some other general improvements for CUDA. They will now bundle the Thrust C++ Template Library with the CUDA SDK. I have been using Thrust extensively on my current research project and I highly recommend it. It has personally saved me a considerable amount of development time (no one likes to reinvent the wheel). If you’re a CUDA developer and do not mind C++ templates, it is definitely worth a look. CUDA 4.0 will also allow multiple CPU threads to share a single context on a GPU. This will make it easier for multi-threaded applications to utilize a single GPU (previously each thread had its own context on the GPU; there was no direct/easy way of sharing pointers between threads). Conversely, a single CPU thread can now control multiple GPUs. Mac developers can rejoice as CUDA-gdb is now supported on OS X. Finally, NVIDIA has added an automatic performance analyzer in their Visual Profiler. This will give developers specific suggestions on how to improve their application’s performance (I have a feeling I will be using this feature a lot).
So is this release a “game changer”? It depends on who you are. If you are a developer that has (or is interested in) a cluster of Tesla GPUs, the answer is a resounding “yes.” This release is also a game changer if you are a CUDA developer that makes use of multiple GPUs. If you are a small-scale CUDA developer (like myself), it is definitely a marked improvement but probably not earth-shattering (personally I am most excited about the potential Thrust and Visual Profiler improvements). If you are a normal PhysX user, you should probably go back to playing Metro 2033. This should not be surprising, though. NVIDIA makes more money selling Tesla cards to larger customers than selling desktop GPUs to small CUDA developers like me. Furthermore, very few end-users factor in CUDA support when purchasing a GPU. Thus, it makes sense that NVIDIA focuses on the market that will yield the most profits. Having said that, I think NVIDIA would like to see a much wider adoption of CUDA. I suspect that in the foreseeable future NVIDIA will continue to broaden its target CUDA audience with each release.
If you’re a registered CUDA developer, you can download the release candidate for CUDA 4.0 this Friday (March 4th).
ATI shoots a Bolt through its GPU compute stack
AMD releases CodeXL 1.0
Travelling in Style: Beyond3D's C++ AMP contest
Analysis of Ivy Bridge Graphics Architecture at RWT
RWT analyzes Kepler's architecture
Nvidia 680 GTX (Kepler) Released
Microsoft Releases C++ AMP Open Specification
Nvidia's 2x Guaranteed Program
It's Dead Jim - a debate about the future of the graphics API