Further Analysis
Trilinear Filtering
I've used Quake3's Demo001 timedemo benchmark to take some further results with Trilinear filtering enabled. The following table shows the results:
Its clear to see that with Trilinear and no texture compression KYROII suffers just as bad as the original KYRO did; despite being able to process two Trilinear filter pixels per clock KYROII has up to a 35% performance degradation for enabling Trilinear (without texture compression), albeit at a faster rate than the original KYRO! This harks back the texture bandwidth limitation I've mentioned before; in an already texture bandwidth limited situation Trilinear's need for grabbing twice the texture samples isn't going to help the situation.
However, we can see that KYRO's Trilinear performance is turned around once compression is enabled. With texture compression texture bandwidth (in DXT1 mode) becomes one sixth of an issue, and hence much of the issue is erased. As we can see, with S3TC in 32bit the performance difference between Trilinear with compression and Bilinear is even translated into a gain; showing that there is even some texture bandwidth limitations with Bilinear filtering.
There has also been some confusion over the use of Trilnear filtering on KYRO based chipset, through the usage of Quake3's 'r_colormiplevels' option which colours the mip map bands to identify where they are, and if they are being blended. The following two images are of from Quake 3 with r_colormiplevels set on with a the KYROII Vivid!XS; the first is with just Trilinear filtering, and the second is with Trilinear filtering and compression enabled:
From the first image we can see that with Trilinear and TC disabled that KYRO/KYROII is doing exactly as you would expect for Trilinear with nice smooth mip map transitions; however from the image with Trilinear and TC enabled it would be easy to believe that it is falling back to Bilinear filtering, rather than doing Trilinear. In actual fact with compression enabled KYRO/KYROII is neither falling back on Bilinear, nor is it actually doing Trilinear filtering in the traditional sense; it has a special implementation that should be more or less equal in quality to Trilinear in the traditional sense, but without the performance degradation.
The following 3 images are from the same location in Quake3, but without the colouring for the mip map levels. The first is of normal Bilinear for reference, the second is of Trilinear without TC, and the third is Trilinear with TC:
Its clear to see from this image comparison that the Trilinear filtering with TC enabled is doing more than merely Bilinear filtering is; in my opinion it could even be argued that is giving a slightly higher quality result than normal Trilinear filtering is.
A traditional Trilinear implementation (as occurring without TC on KYRO/KYROII) is taking 4 samples from the first mip map level, four sampled from the next mip map level down, performing two sets of bilinear filters and finally with a linear stage to combine the two results.
When TC is enabled with KYRO/KYROII instead of taken 4 samples from the next mip map layer down it generates the equivalent of those samples on the fly by taking 16 texel samples from the same mip map level.
Quakes3's 'r_colormiplevels' is looking at the mip map level used, and as KYRO is only using one for each mip map band in Trilinear with TC the application can only show this; it doesn't however show the number of samples used. Another thing to note is that as this scheme requires 16 texel samples to produce the lower level this would be even more costly if the data wasn't in the cache, which is why it was only enabled with TC on, as its far more likely that the data will be present in the cache.