H.264 Playback

As the initial part of our Avivo article looked into, H.264 is a new encoding mechanism that has the potential to provide both higher quality and better compression than mechanisms such as MPEG-2. This is acehieved by increasing the complexity of compression routines thus, in turn, placing a higher onus on the processing requirements for the decoding of the video. With high definition optical formats such as HD-DVD and Blu-Ray turning to H.264 its viewed as being increasingly important that PC's are capable enough to support the compression scheme at full frame rates, without dropping frames and without using too much power (important for the notebook PC segment). For these reasons there's been plenty of activity towards getting H.264 support on graphics chip in order to offload some of the CPU operations to allow for smooth playback.

Although the X1000 series have always been billed as having hardware support for H.264, right after their initial release they didn't have any software support that could expose it. However, curtsey of the Catalyst 5.13 update and driver releases beyond that, and CyberLink with their H.264 Codec supporting ATI's hardware, it is now possible to get DXVA (Microsoft DirectX Video Acceleration) support of H.264 encoded content through Windows Media Player. Using a few video clips from Apple's QuickTime HD Gallery we've compared the CPU utilisation (measured using MS's Perfmon utility, taken over a 1m 40s span of video) of software H.264 decoding using Apple's QuickTime player, against boards from the X1000 series playing via Media Player using the CyberLink ATI accelerated Codec.




QuickTime Software 0.00 26.56 13.56
WMP10 + CyberLink Decoder X1300 PRO 0.00 20.31 11.11
X1600 XT 3.13 18.75 11.97
X1800 XT 3.13 20.31 12.01
X1900 XT 0.00 18.75 11.79

With 480p content the CPU utilisation isn't that high in the first place, so the the reduction of CPU utilisation with hardware assisted H.264 decode is fairly small overall.




QuickTime Software 1.56 48.44 29.19
WMP10 + CyberLink Decoder X1300 PRO 4.69 46.88 27.70
X1600 XT 12.50 37.50 24.65
X1800 XT 12.50 39.06 26.58
X1900 XT 9.38 35.94 23.96

With 720p content there are more pixels to process, hence the workload is greater overall, and there is a higher CPU utilisation in all cases. Again, with this Athlon FX53 processor in this system, the reduction in CPU utilisation for hardware assisted decoding isn't huge, although the X1900 does shave off about 5% average utilisation.




QuickTime Software 14.06 45.31 30.45
WMP10 + CyberLink Decoder X1300 PRO 9.37 43.75 27.15
X1600 XT 7.81 37.50 24.76
X1800 XT 14.06 42.19 25.98
X1900 XT 15.63 35.94 24.55

Overall the Fantastic Four 720p trailer is slightly more demanding than the Kingdom of Heaven clip, and in this instance the X1900 is reducing the average CPU utilisation by nearly 6%.

All the boards here are able to provide a reduction in CPU utilisation, including the X1300 on 720p content. According to ATI they have placed restrictions on what can be achieved on what class of board, with X1300's supposedly being good for 420p/576p content, X1600 up to 720p content and X1800/X1900 providing acceleration all the way up to 1080i content, however it's not quite a simple as that.

ATI's H.264 decoding solution relies on both custom logic as well as shader code being processed over the pixel shader pipelines so the performance can be slightly reliant on the graphics capabilities of the board in question - the fewer the pipelines the lower the decoding power, hence the lower the resolution, in theory. In practice it's not quite as clear cut at that as not only are things going to be dependant on the resolution, but also the bit rate at which the video is encoded - X1300, for instance, may be fine at decoding full HD resolution sources encoded at lower bit-rates. Also, given the performances here it's evident that the decoding isn't highly optimised yet and we suspect that there is more performance to come for the entire line.

Looking back over the performance results for the graphics acceleration we can begin to see a pattern with the X1600 and X1900 performing close to each other, but with them both having a lower CPU utilisation than the X1300 and X1800. It could be the case that these performance differences are linked to their shader configurations, with X1600 and X1900 having more shader power per pipeline than X1300 and X1800, and if this is the case could it also suggest that the higher end boards are also using the video decoding over a single quad?

For now, though, the H.264 codec for hardware acceleration on ATI's X1000 boards is available from CyberLink, with a 30 day free trial period and a $14.95 charge for continued use. At present this is the only decoder available, however ATI are working with other software vendors and there should be other solutions available in the future.