Conclusion
Given that we took nearly 10,000 words for the rest of the article, it may be appropriate to finish in quite few in order not to dilute the article’s overall message. In summary:
- The Tesla 10-Series will likely be the first GPGPU-oriented card to ship in significant volumes.
- GT200’s performance improvements here are definitely much more impressive than for GeForce.
- The pricing structure changed, which makes 4-GPU 1U systems very attractive – this makes sense given that the price was likely artificially high and this will represent most of the 2008 revenue.
- The FP64 implementation, while unexpected, does make relatively good sense – especially when you consider that performance will go up by the time it becomes more important/gets deployed.
- Being able to run CUDA programs optimally on x86 CPUs is an interesting development both from a technical and a marketing perspective: if it’s good and scales great, some heads will turn.
- We should expect revenue for Oil & Gas and ‘Government & Education’ to ramp up before 2009.
- When excluding the (dynamic?) power of TMUs, ROPs, rasterisation etc. the ‘TDP’ is only 160W.
- Many of the applications being showcased are quite advanced and represent a substantial time investment; no other massively parallel API is that far today. CUDA is also ahead for ‘custom’ personal applications by scientists/academics/… but that can change much more rapidly.
- There are quite a few applications ramping up in the next year or two that can represent significant financial opportunities, but the scale isn’t there yet to reach the promised addressable market of several billions by 2011. However, in due time, it will likely get there.
- Consumer CUDA represents a potentially even larger opportunity over the long-term, but there is the question of how to get enough applications (ideally freeware) to get GPU-accelerated to really shift a lot of value away from the CPU. One theory is to create a vendor-agnostic API; another is to do more in-house application development. Both theories are far from perfect.
- This is an incredibly hard business to forecast, especially in the long-term. Anyone who thinks he can make a reliable estimate today is probably out of his mind.
As GPU architectures become more programmable and more complex in the future, and as Intel enters both the GPU and the GPGPU market with Larrabee, it will be very interesting to see to see what happens in this field from both a technical and a financial perspective and we look forward to some great innovations from a hardware perspective.
More important, however, may be the software aspect of massively parallel computing. Every single day, millions of man-hours are wasted to reading and thinking about how to scale performance beyond traditional serial processors. The CUDA model with tens of thousands of threads, shared memory, and straightforward synchronization is elegant, scalable and applicable to a surprisingly large number of problems. It is also very efficient from a hardware perspective (especially if branch divergence penalties can be tolerated), unlike quite a few ideas in the academic community.
It would be naive to believe, however, that the CUDA model of computation is anywhere near optimal. It is only but one big step in one interesting direction. It does arguably brings more to the table than Brook+ or Ct, but what’s really exciting is what we will be able to learn from this paradigm and how we can generalize its advantages over an even wider problem set. Most of the great ideas in 3D graphics, both past and present, have come from trying to incrementally extend existing designs... Similarly, it is tempting to believe that the holy grail of parallel programming may originate not from academia, but by considering how to extend a concept that’s already delivering exaflops of aggregate performance today.