In this fast and ever-changing industry of PC graphics and games, many folks are left wondering what actually goes on "behind the scenes". On one hand, we have innovative graphics chip companies like NVIDIA providing ever more features for developers to use. On the other hand, we have the one thing that probably finally determines both the quality of games as well as the "usefulness" of new hardware - the developers. In other words, what do developers think of new graphics features and developments as provided by companies like NVIDIA and Microsoft (via the DirectX API) - what features to use and what features to ignore?
We took the chance of firing off some questions to three parties that perhaps cover the entire 3D gaming industry as a whole the most - a game developer, a benchmark developer as well as a chip-maker, hopefully providing some interesting viewpoints on both sides of the main thing that matters - what a developer (game or benchmark) thinks of 3D technology and what a chip-maker thinks of the technology introduced by them (or otherwise/elsewhere), both as applies to game/application programming.
We sent the same questions off to :
Without further ado, away we go!
Beyond3D : OpenGL and D3D seem to implement multitexturing in such a way that the programmer must be aware of the number of TMUs. So programming texture effects for a 2 TMU design like a GeForce is different from programming for a 3 TMU design like a Radeon. Currently 3D cards like a GeForce fail when you try to apply more than 2 layers to a texture, thus forcing multipass programming on the developer. Do you think we need a higher level so the developer can just specify the texture layers and blend modes and have the driver handle the rest?
Croteam : Call me old-fashioned, but I think programmer must do that, not API. Generally, I'm against high-level API portion, because it can limit your creativity and it tends to be rather slow.
API and hardware are here to help engine to do things faster, and not to interfere with engine's feature set.
There are examples where multitexturing just cannot be emulated with multiple passes. Like when you want two textures to be combined and the result to be alpha-blended with the background. In most cases it is mathematically impossible to recreate same results without multiple texture units.
But fortunately DirectX8's pixel shaders are a move to a better direction. Developers just check the pixel shader version, write some assembly-like material code and the driver handles the rest. Hopefully.
But keeping tight control over pixel shader versions, and requiring support for the whole feature set in the hardware is absolutely necessary. Developers can keep track of 2-3 versions, and will be glad to do so over the 100's of multi-texturing combinations.
NVIDIA : TMU is an obsolete term, by the way. It's a leftover from when the computation (blend stages) was tied directly to the texture read ability (texture stages). This is no longer true, since the TNT architecture. Modern hardware has texture loading separated from a processor that allows calculation and blending of texture addresses and interpolated and evaluated colors. Very soon, there will be powerful pixel processors where the number of textures that can be read is independent of the number of instructions that can be part of the pixel program. GeForce3 is the first of these processors. Most software developers do not wish the hardware developers to accept a high level shading model and handle multipass for them. The software developers want to program this themselves. Given that there are 10's of millions of TNT2-class graphics cards out there as an installed base, virtually ALL mass-market titles need to support rendering to 2 texture hardware, with multipass. Most physically based lighting models (such as BRDFs - bi-directional reflectance distribution functions) decompose cleanly into pairs of textures per pass. GeForce3, which supports 4 textures in a single pass, is a simple acceleration from that: every 2 passes can be collapsed into 1. For that reason, I believe that Radeon choosing to support 3 textures is an odd choice.
Beyond3D : Transparency is another place where the developer has to be aware of what he does because of the hardware. Do you believe that it's better to push this render order into the API/Driver or should it remain in the domain of the application where the developer can use hacks/tricks to create the correct render order more quickly?
Croteam : I guess you already know my answer. :-) It is very difficult for API to handle, but engine could do that very effectively. And who knows - maybe the hardware that can do that with multi-sampling buffers (with z-merging and stuff) is not so distant future of retail market.
MadOnion : The only way of sorting transparent polygons "correctly" is to do per-pixel comparisons. This obviously can't be done on software. If the hardware would take all the transparent objects in and sort them, we would not be against it at all. But the developer needs to have control over which polygons are sorted, and which are not, so that you can for example draw some sprites on screen for game on-screen displays.
Polygon sorting in software is always dead slow, and basically can't be done.
NVIDIA : The semantics of both OpenGL and Direct3D both state that rendering order is important, and that the application should provide the order. If we want to change that behavior, it also changes the meaning of all of the blend modes: each frame buffer blend is applied to "what was rendered before". Choosing a different model where the fragments are sorted back-to-front by the hardware would require developers to rethink how they do lots of things, multipass shading, for example. I think that asking for this is more laziness than anything else - like, wouldn't it be nice if the hardware did all of the hard work? However, software developers haven't thought out the rest of the details about how this would actually work yet.
Beyond3D : Which is better in your opinion? FSAA through Super-Sampling or rather through Multi-Sampling (e.g. a kind of sort-free edge anti-aliasing) with Anisotropic filtering?
Croteam : Multi-sampling! Much better. You have more control and better results. And yes, all the extra 'blurriness' you can easily solve with texture LOD biasing and anisotropic filtering. :-)
MadOnion : I think the two things that matter are FSAA image quality and performance penalties (additional video memory consumed and FPS). Multi-sampling seems to fare better in my opinion.
NVIDIA : The GeForce3 supports both multisampling and supersampling. Anisotropic filtering of textures is also supported, orthogonally from the mode of anti-aliasing selected. Multisampling is an optimization where a single textured and shaded value is calculated for all of the samples within a pixel. This should be very similar in quality to supersampling, since the texture filtering should correctly account for the area that is covered. Multisampling should be faster than supersampling, though, since it consumes less memory bandwidth. If more texture detail is wanted, though, supersampling is always available.
Beyond3D : Is Hardware Occlusion Detection the way of the future? If it is, would you still believe this to be true if you require a special render order (front-to-back) to see a real boost? Is such a render order even possible given the way games/apps are creating the 3D scene? What about the conflict with sorting for state changes (textures)?
Croteam : This is a very controversial issue. I can say that this is a great feature, but there are lots of problems with it. It shouldn't interfere with engine's renderer (i.e. programmer shouldn't worry how to render things in order to get some speed up), it must eliminate overhead of preparing a scene and, most of all, render the scene correctly and fast. Tough case, but I still believe in it. :-)
However there's another, better way of how hardware can help engine to do an occlusion detection. Scene feedback! It would be very nice if an engine can render simple polygons of an object bounding box(es) in one, rather small buffer and then get feedback whether this poly(s) were actually rendered. From that info, an engine could easily decide what portions of scene are visible.
MadOnion : HW Occlusion detection is required mainly to save memory bandwidth, and whatever I think, it will probably be more and more important in the future as memory BW doesn't grow as fast as 3D performance in general.
But if we can reduce overdraw in that stage, it's much more effective than reducing it in the last parts of the pipeline. Enter scene management and in 3D engines. As 3D scenes get more complex, object amounts grow radically. This means more time gets spent in the 3D engine's scene management (before anything gets rendered or passed to the hardware).
One way we have approached this problem is with portals, and they also reduce overdraw greatly, thus lessens the need for HW occlusion detection. This makes the whole rendering process faster as non-visible objects are not passed to the HW for transform and lighting.
The object rendering order is not a big issue with us at least.
NVIDIA : Hardware occlusion detection is definitely the wave of the future. Random order of rendering still benefits from this optimization, since rarely do objects arrive in purely back to front order, either. I don't think that there is any conflict with sorting for state changes, since I wouldn't sort triangles within objects anyway. I would recommend a rough front to back sorting, of objects or characters.
Beyond3D : What has been more important up-to now - T&L or Per-Pixel Effects like EMBM, DOT3 and Register Combiners? What about the future, rather super-duper Pixel Shaders or Vertex Shaders? Maybe both?
Croteam : Yes, both! :-) Both of them improves scene quality a great deal. So, I can't really decide. Maybe T&L, just because per-pixel effects are here to simulate some stuff that will be very hard to do with lots of polys. They cannot do the real stuff. T&L on the other hand, is for real - it doesn't simulate, you really have higher poly count and you can do with it whatever you want. Don't get me wrong, I wouldn't replace per-pixel effects with a high-power T&L unit, since these features are very complementary.
MadOnion : Hardware T&L has definitely been the most important feature of 2000. It has offloaded the CPU to do other tasks in games and allowed for higher polygon counts, resulting in a better game play experience.
The spot as the 2nd most important features could have been FSAA or Texture Compression. Both of them improve image quality significantly, but neither has so far been used widely. I believe we will see wider acceptance of these features this year. Albeit nice features, EMBM & DOT3 don't even come close to the others.
Vertex and Pixel shaders are going to be the hit of the future. When? I don't know, but I wouldn't hold my breath to see them widely used in games anytime soon.
First we will see apps using vertex shaders, then vertex shaders combined with pixel shaders. Pixel shaders as such won't give much benefit over traditional multi-texturing if they are not combined with the vertex shaders.
The transition to vertex shaders will not be easy. Although Intel and AMD have done good job optimizing vertex shaders, it still means that on every DX7 class 3D accelerator (GF2, Radeon et al) vertex shader content will be transformed and lit with the CPU. Pixel shaders are even more difficult since there are no software fallbacks. If you don't have a DX8 pixel shader compatible hardware (there are none at the shops yet), you can't run any content with pixel shaders.
For game developers, it's also a fairly big content (graphics) development & tech change, moving from fixed function pipeline to pixel shaders (but that's a long story!).NVIDIA : I think that so far, we have seen more interesting content enabled by higher polygon counts made possible by hardware Transform and Lighting. In the future, as developers begin to take advantage of the programmability in the vertex processor and pixel shaders, we will see amazing stuff. I believe that developers will use the vertex processor to pre-calculate and setup values for interpolation to be used by the pixel shaders. Only when used together will you see the true power of each.