Displacement Mapping Q&A
With the announcement by Matrox of their new Parhelia chip with its capability of hardware Displacement Mapping there has been much interest in the feature and of how 3Dlabs P10 will actually be able to process it. There have been some questions as to whether the P10 pipeline is really flexible enough to facilitate such a feature in hardware or whether it would still use the CPU for some operations. I've thrown Neil a few questions with regards to Displacement Mapping. Our questions in are in blue his responses are in italics:
There has been some scepticism of the P10 truly being able to process Displacement Mapping on chip and not just getting a software generated vertex stream from the CPU; can you give us any indication of how the P10 chip handles Displacement Mapping?
The displacement lookup (and optionally the tessellation) is done by the texture subsystem and the results left in memory where they can be read just like a regular vertex buffer. On the second pass the vertex shader will pick up the displaced vertices, light them and then they get processed as normal. This is a good example of using the flexibility of the SIMD arrays for not just their default purpose.
What type of driver overhead will there be?
If the tessellation and displacement of an single input patch or triangle only results in a few new vertices then the overhead will be quite high, but in the normal case where you tessellate more finely and batch multiple input triangles or patches together, the overall driver overhead will be insignificant.
How will this be exposed - outside of DX9 OpenGL 2.0 appears to have no provisions for it will you be making some provisions within OpenGL 2.0 or by some extension?
This is still an active area of discussion within the ARB as to how to add Displacement Mapping functionality to OGL2 - we will solve this problem though - it is key.
Would the be any future IP issues over OpenGL support as supposedly Matrox has licensed Displacement Mapping to Microsoft for inclusion in DirectX and we could end up with another IP minefield similar to the one thrown up by nVIDIA and their Vertex/Fragment shader issues?
There is much prior art with regards to Displacement Mapping and so it is unlikely that Matrox would be able to claim any IP that would prevent independent implementations. The ARB has recently become a little more savvy with dealing with non-specific IP claims. To affect the creation of a specification, an IP claim must make it impossible to create ANY implementation of the specification that doesn't infringe that IP. In the fields of shaders and displacement maps there is enough public domain and prior art as to make this situation extremely unlikely - so the creation of the ARB specifications will continue - even if the implementers of the specification must take care not to infringe any IP (as always).
Editors Comments
The P10 chip is clearly going to be one of the most flexible 3D graphics processors to date, and it appears evident that many other 3D vendors will be following a similar path in the months to come – if not exactly the same as 3Dlabs P10, then in the form of more general programmability. However, the question remains on how well P10 will fare in the competitive consumer market place?
As it stands right now, P10 will not achieve DirectX9 compliance if floating point texture shading stages are a stipulation in DX9, though this may not be an issue for some time as it's likely to be a while before game developers make sufficient use of the functionality of the shader stage to require the higher dynamic range facilitated by floating point pipelines. However, it will be interesting to see what features of DX9/10 that can be supported by way of the programmability of the chip – displacement mapping is one of the major new additions to DX9, thanks to Matrox, and it would seem that 3Dlabs have already worked on algorithms to support this feature via the programmability of the of the Vertex Processors array. What other features will be exposed via the API and through the chip's apparent flexibility remains to be seen?
A lot of P10's success in both markets it's aimed at rides on the levels of performance it can achieve. Generally speaking, you would tend to think that there is a trade-off of flexibility that the architecture must incur for performance, so it's a question of how big that trade-off will be. Of course, 3D performance is also greatly affected by staple elements, such as clock speed and bandwidth – with a 256-bit DDR bus, it would seem that P10 will have bandwidth in abundance. However, with 76 million transistors at .15µ it would be difficult to imagine ultra high speed parts.
So, when can we expect to see P10 hit the shelves? Well, undoubtedly the 3Dlabs workstation variants will come first, probably quite soon in fact; however, it will probably take Creative a little while longer to bring the consumer variants over since Creative will have to transition their current nVIDIA based range to the new parts and, as 3Dlabs primary focus has been in the Workstation market, may well need to establish a developer relations and game compatibility testing team. Although 3Dlabs say there may be some tweaks to the P10 architecture to tailor it to the desktop consumer environment, 3Dlabs/Creative will not be shifting it .13µ which would tend to indicate it has to come here this year, particularly with nVIDIA known to be bringing a .13µ part to the market this year, and, likely, ATi, too. .15µ parts may begin to seem a little breathless by the end of the year for the high end consumer space, even with high levels of bandwidth.
However, to me it's undeniable that 3Dlabs have brought us one of the more innovative, interesting, and potentially flexible architectures of the year so far. On paper it seems P10 will be able to achieve any feature currently touted by any announced 3D hardware for the consumer space, and may even have the capability to add more as APIs such as DirectX evolve and begin to expose more. As I said before, it's likely that other 3D vendors will follow a similar approach in creating more generalised programmable hardware rather than fixed function ones, so in this sense 3Dlabs have got the jump on everyone, which should see their software development teams in good stead for future hardware iterations. Ongoing development of the architecture should be easier as they have the general functionality already there, with just stages possibly needing changes (such as the integer texture shader stage). Moreover, the parallelism of many of the processor arrays is taken into account via the compiler as well, which means that increasing performance over feature generations could just be a case of increasing the size of the arrays.
Our thanks to Neil Trevett from 3Dlabs for taking the time to discuss the P10 technology with us.
- Feel free to makes comments relating to this article here.