So, Why Not Now for the PC?
The most immediate question that comes to mind is if all the graphics elements that are seen within the XBOX 360 are so good, why aren't we seeing it in the PC space yet?
Xenos's particular range of features are going into a closed box environment, hence the API can be tailored to expose all of the features of the chip, however on the PC space graphics processors really need to be tailored to the capabilities of the current DirectX release. This is where Xenos has an issue in that its features and capabilities are clearly beyond the current Shader Model 3.0 DirectX9 specification while it lacks features that are expected to be a requirement for WGF2.0.
WGF2.0 has requirements for virtualisation, and whilst Xenos has the luxury of being able to access the entire system memory this is by virtue of the fact that it is the system memory controller. Part of the virtualised requirements of WGF2.0 appear to be able to include unlimited length shaders, where Xenos has some hard coded limits here and, whilst large and defeat-able through a couple of methods, probably wouldn't meet the requirements for WGF2.0 here. When we looked into WGF2.0 in our DirectX Next article there was, at that point, a suggested requirement to the graphics pipeline to have a fully integer instruction set as well as the floating point pipelines, however Xenos's ALU's are purely float in operation.
The shader processing design is clearly very different from today's graphics processors, but then there is the fact that PC's will be catering to a greater range of utilisation of features as there is a quicker evolution cycle as far as graphics are concerned - some titles being released even now are very limited in their shader use, whilst others are utilising them extensively; Xenos's design is likely to be most beneficial when the majority weight of processing requirements goes towards shaders as opposed to the more fixed functionality elements of the pipeline. Arguably, though that balance is already shifting, and if Xenos is actually as good at shader processing as it purports it still begs the question as to why ATI are looking towards more traditional shader pipeline over the next 12-18 months instead of using this, even though it has slightly greater capabilities than current PC API's allow. Perhaps the answer lies in the fact that this is such a big change that trialing it in a closed box environment, where developers will have more time to tailor specifically to the hardware requirements, as the hardware will stay the same for the next 3-5 years, makes sense as they can also use the experience gained from that to assist in the development of a PC architecture based on a similar processing methodology.
One area of PC graphics processing that a unified architecture will be sure to benefit from immensely is that of the Workstation market. PC graphics processors are primarily designed for desktop PC's, hence their main target is for gaming which biases the workload very much more to the raster (rendering) pipeline rather than the geometry pipeline - current high end graphics processors have two to three times the math logic dedicated to pixel shading than vertex processing. Many workstation applications, such as CAD and CAM use, put the onus heavily on the geometry processing as they will be often rendering very detailed geometric representations of objects and frequently viewed in wire frame mode, however most workstation graphics processors sold are derived directly from desktop processors, which isn't necessarily optimal as they are designed for pushing pixels. With a unified shader architecture the graphics processor is neither biased towards either pixel or vertex processing in terms of the ALU math capabilities and is much more versatile in its potential usage - workstation graphics that use such an architecture can suddenly find themselves with many times the geometry throughput performance, at more or less the same costs, as the utilisation is automatically balanced and spread across the entire array of ALU's that are available on the entire graphics processor.
However, possibly one of the most immediate (without WGF2.0 for Windows being here) application for unified shaders is actually outside of the PC space and in mobile phone 3D graphics engines. Presently ATI are yet to produce a fully shader capable "Imageon" graphics processor for the handheld markets and are not expected to until 2006, however with the onus on minimal power utilisation in minimal die size in the handset space anything that mitigates wastage is going to be a welcome element, and with slightly less rigid specification targets to meet in the handheld arena a unified shader architecture may be the ideal approach when, inevitably, ATI choose to create shader enabled handset parts.
Conclusion
Overall it looks as though Xenos represents some highly interesting design choices on many fronts and clearly seems as though ATI have attempted to come up with a very different architecture to, at the very least, target the specific needs of the XBOX 360 console platform. It will be very interesting to see the performance and quality of graphics it is able to produce once developers have had decent access to development kits based on the final hardware, however we suspect that it won't be until the second generation of XBOX 360 titles before we see developers being able to seriously scratch the surface of understanding the processing capabilities of Xenos and the XBOX 360 as a whole, given that most of the first generation titles will not have been developed on the final hardware. That being said, though, much of the graphics architecture is transparent to the developer and they shouldn't need to concern themselves much with the types of workloads they are requesting of graphics processor as this will all be handled automatically, without stalling any part of the pipeline.
Apart from the interesting use of eDRAM in this design, which is clearly targeted towards the console environment (although from its operation even this could potentially be moved into other the PC space if the driver forced a Z only pass, however this may be a little risky) the design of the ALU arrays, texture processing and threaded nature of the system is clearly a large departure from any of the shader architecture we've seen so far. Despite having a raw ALU quantity that exceeds any platform currently available, the primary key to the design of the processing is that of "efficiency" when processing shader programs, by organising the workloads in a threaded manner in order to try and constantly keep the available processing elements active, interleaving latency bound dependant operations and having a unified platform that is agnostic to whether it is processing Vertex or Pixel Shaders and never having one type of operation stalling the other. The primary question here is exactly how "inefficient" are current architectures in relation to this one, which is a difficult question to answer because no hardware vendor is going to tell you their graphics processors are inefficient. All we can say at the moment is that clearly Xenos's shader processing architecture is fundamentally and significantly different from current platforms and ATI obviously must have perceived an issue with current methodologies otherwise they wouldn't have gone to these lengths to change the pipeline.
In the future, with WGF2.0's unified shader language, it would be hard not to see this type of threaded shader architecture not make its way across to ATI's PC products.







