14. Do you believe an API specific to a single hardware generation from a single vendor would ever make practical sense on the PC? If not, how would you maintain compatibility between different vendors and hardware generations - what do you think is the sweet-spot in terms of low level access?

MH : No. Again, I think we can learn a lot from the consoles and try to solve the latency issues in the API, OS, and HW when it comes to interaction between the host code and the GPU. I think something like DirectX/OpenGL with the *option* of more user control over device management is what we want. Direct command buffer submission with an API for building and modifying command buffers I think would be a good start.

AL : I think it’s more a question of the ubiquity of the given hardware. Obviously if every single person had that piece of hardware it would be well worth it to write an optimised path. Even with significantly smaller numbers of users, I think the cost-to-benefit ratio can still work out favourably provided that the cost is not absurd. For instance, I think the recent trend towards putting GPUs on all CPUs both vastly increases the market size for those integrated GPUs and also opens up interesting possibilities for optimisation that graphics APIs (which have historically been targeted more at discrete GPUs with disjoint memory spaces) do not address well.

15. From a graphics connaisseur standpoint, low-level access is clearly extremely attractive – sadly, the world is often run by bean-counters! Do you think that getting lower-level access would actually increase a game's overall profitability? Isn't the opposite far more likely to happen, given that development costs would obviously rise?

MH : Development costs would clearly rise, but more low-level access would allow the game developers to better differentiate. Developers that would be willing to put more in could get more out. I don't think we can honestly talk about low-level only, but I think as an industry we could come up with ways to allow for different level of entry. This would enable people to use higher level APIs to get things up and then start digging deeper if/where/when they want to.

AL : I am not a game developer so I can’t speak to that with any authority, but I think it’s a bit more complicated than just “it costs more to develop”. It’s not unreasonable to expect a low-level interface to enable playable experiences on some piece of hardware that wouldn’t normally provide it through a portable API, thus increasing the potential market. Furthermore I’m not certain that targeting a new graphics API is necessarily a huge cost, assuming that the game is already a multi-platform title. A large fraction of the cost in game development is the asset pipeline, and a low-level graphics API would not necessarily require any changes to that.

16.Where do you think APIs like DirectX or OpenGL are headed? Are they likely to become somewhat thinner, at least in some spots, in order to allow for some more granular manipulation of the entrails of future GPUs? Will they follow a joint evolutionary path, or will they diverge?

MH : I hope they will become thinner and we see tighter integration between compute shaders and graphics pipelines. There has been a lot of interesting research as of late on advanced rendering pipelines using DirectX/OpenGL augmented with compute shaders, and I think the next round of APIs should embrace those techniques and figure out how to make them work better together.

AL : DirectX certainly seems to be on a path to be as low-level as possible. There are very few facilities left that exist solely to make the programmer’s job easier. OpenGL’s path is a bit less clear as it has been playing catch-up for the past few years. Historically OpenGL has also considered ease of use more than other APIs, which may become relevant if OpenGL moves back into a leadership role in the coming years. It’s hard to say whether they will continue to evolve similar to the past years or diverge. I think a lot of that depends on whether someone really takes ownership of OpenGL and starts to drive it forward.

17. Do you think a GPU ISA would make sense? Fragmentation in the desktop CPU world was arguably removed when x86 won its wars against competing ISAs – do you see such a scenario coming for GPUs, or are we more likely to stop at an low level but intermediate language step, that's homogeneous from the perspective of ISVs and gets translated into hardware specific instructions behind the scenes?

MH : I think a standard GPU ISA would make it difficult for rapid innovation. It's convenient to be behind a JIT layer today which allows GPU IHVs to aggressively innovate while protecting the developer in terms of compatibility. Standard intermediates DXASM or something like LLVM-IR I think does make sense, but I also think it should be easier than it is today to "see the wizard behind the curtain" and see what is being generated in reality on the devices being targeted so you can better tune your code.

As a digression, I do think in general we need MUCH better cross-vendor/cross-OS development tools.

AL : I think a virtual ISA makes a lot of sense, but a standardised hardware ISA is more challenging. Given the rapid rate of innovation in GPUs as well as the flexibility in feature implementations that they have enjoyed in the past, I doubt that we could completely standardise on one ISA any time soon. It’s possible that in the long run hardware designs will converge to the point that it makes sense, but there’s no guarantee of that. Regardless, a virtual ISA is step one.

18. If you could make your very own software interface to the underlying hardware, what would it look like?

MH : DirectX/OpenGL + DirectCompute/OpenCL to build command buffers I can control directly + synchronization primitives. Show me all devices and let me do multiple GPU management myself instead of the IHV driver trying to figure out what I mean. I should have WAY fewer OS interactions to get things running beyond bootstrapping and resource allocation./registration.

AL : I don’t claim to have a clear vision of what the “perfect API” would look like, even if such a thing exists. That said if I were designing a new API I would make it as stateless as possible. Every command would reference all relevant parameters (or parameter structures for convenience). This constraint enables things like parallel execution, work submission from multiple sources and simple interfacing of library code. I also think it’s important to take a hard look at how memory resources are created, formatted and used. Historically a lot of that has been hidden behind the API which has often provided significant performance benefits without users having to understand low-level details. However, as we push towards lower level APIs it becomes necessary to expose these implementations and place them under user control.

19.What is your opinion of newcomers like OpenCL, CUDA or DirectCompute, as directions for future evolutions? Who gets what right, in terms of allowing more intimate access to the metal?

MH : All provide abstractions. CUDA is obviously the lowest since it only works on a single vendor and a very close representation of their HW. That low level access is often abused and developers end up finding out that their code has portability issues across generations but sometimes within a generation. DirectCompute and OpenCL represent more portable compute APIs and run across multiple vendors, and in the case of OpenCL across multiple OSes, but you can still write very vendor specific code when it comes to performance. I think OpenCL is a little further ahead than the rest as it's really designed for heterogeneous computing as it is less GPU centric than the rest so we are seeing it run on all sorts of devices from CPUs to GPUs in production and even FPGAs and DSPs in prototypes. OpenCL also has an extension mechanism that basically all of the vendors are taking advantage of to expose their unique capabilities. Granted I've spent a lot of time helping develop OpenCL, so I'm not impartial. ;-)

AL : They are all pretty similar in terms of the level of hardware access that they expose. DirectCompute is obviously the most useful for graphics simply because it is a first-class citizen in the graphics API and thus shares all the relevant resource handles and scheduling. Unfortunately, they all still try to hide too much if you want to write really efficient algorithms. For instance, SIMD widths and memory layouts are critically important when choosing and optimising algorithms, but these languages all try and abstract over them. Thus whenever you need something to run really well you end up having to first defeat these abstractions, which produces code that is less portable than if the hardware specifics were directly exposed in the first place.

20. Intel's Larrabee promised significantly greater programmability, but developers could obviously only exploit it by programming to the metal. While it is difficult to say how optimal the architecture might have been on different levels, a similar level of flexibility seems inevitable in the long-term, whether that's 2 or 20 years. Do you think an API of any kind will still make sense then? Would you at least still want a level of abstraction over the ISA?

MH : Again, I think the trick is being able to supply multiple levels and entry points to each of those levels. If I have to write my whole application in hex to use your platform, nobody is going to use it. So if there is only one option, do the higher level API. But I think the right answer is you want things like OpenGL and DirectX, and things higher like game engines, but you also want lower level access. If you want to go all the way to the metal, you should be able to do so. That being said, as you go lower and lower, fewer and fewer developers will come down with you, so at some part of the stack there will have to be a "standards break". It's finding the right levels to make standards while still allowing developers to get lower that will be the trick.

AL : That’s a hard question to answer because it depends a lot on the competitive landscape at that theoretical point in time. It’s reasonable to target a small number of fairly similar architectures or ISAs, but it can get quickly out of hand when there are too many combinations. For instance, it’s probably reasonable for a triple-A game with an emphasis on graphics to include different paths for current PC GPU architectures. There are a small number of relevant architectures and they are all fairly similar. On the other hand, it may not be reasonable to do the same on current mobile platforms since there are many GPUs, some of which are very different from others. That said my guess would be that in the long run physical constraints will force convergence and we will end up with a small number of fairly similar solutions. In that case, providing at most a virtual ISA and perhaps not even that would be reasonable.


20 questions later, this first bout is completed, and a general feeling of increased wisdom dominates, as this has been a rather educative trek, at least for us. It is always most nice to see how the (brilliant) minds of those directly in the industry deal with one topic or another. We'll take this small closing bit and use it to also mention that our wider outreach will serve to produce follow-ups to this debate, with other people taking up the Beyond3D mic in order to tell us how they feel about the fate of the graphics APIs. Au revoir pour le moment, gentlemen!