SwiftShader 2.0: A DX9 Software Rasterizer that runs Crysis

Friday 04th April 2008, 12:00:00 AM, written by Arun

TransGaming has just released SwiftShader 2.0, an highly optimized software rasterizer that supports DX9 and Shader Model 2.0 and scales with multi-core processors. It can run (albeit slowly) many modern games and it makes a dual-core Penryn perform similarly to the GeForce FX5600/5700 in 3DMark05.

Nicolas Capens (aka Nick on our forums) is the creator and lead programmer behind SwiftShader - he started by writing the DX7/DX8 swShader many years ago, and eventually turned it into a commercial product in 2005 with the help of TransGaming and Gavriel State in particular.

A demo is now available on TransGaming's website, with which we ran 3DMark05 and obtained a score of ~400 on a stock Core 2 Duo E8400. That's still not mind-blowingly fast, but keep in mind Direct3D's reference rasterizer would likely score in the single digits and the SGX-based IGP in Intel's upcoming Silverthorne-based Menlow platform for UMPCs/MIDs is claimed to only score ~150. It would also be much more than enough to run Vista's Aero interface smoothly.

Finally, we were told SwiftShader would run Crysis in the mid-single digits at the lowest settings on Intel quad-core systems. Definitely not very playable yet, but that should make it clear SwiftShader is perfectly usable for casual games. We look forward to seeing how SwiftShader evolves in the future and how it will perform on future high-end CPUs such as Intel's Nehalem and AMD's Shanghai - certainly it might be a fun way to benchmark CPUs once in a while.


Discuss on the forums

Tagging

graphics ± swiftshader, software, rasterizer, 3dmark05, crysis


Latest Thread Comments (183 total)
Posted by Scali on Friday, 10-Jul-09 07:12:36 UTC
Quoting Nick
For those interested: TransGaming Empowers O3D Developers With SwiftShader (http://www.transgaming.com/news/?id=122).
I suppose this means it's going to run on multiple OSes, including linux and Google's Chrome OS?

Posted by zed on Monday, 13-Jul-09 00:39:33 UTC
excellant stuff, on a related note Ild never even heard of O3d until now

Posted by BRiT on Monday, 13-Jul-09 07:19:54 UTC
Congratulations! This is good news indeed.

Posted by HolySmoke on Monday, 13-Jul-09 21:09:06 UTC
Quoting Scali
Yea, that's what one would think. However, when I was playing around on my Intel X3100,
I noticed that Crysis ran slower in D3D10 mode than in D3D9 mode, even at the lowest settings.

So I conducted a small test on my own. I rendered the exact same scene with the exact same shaders in D3D9 and D3D10, and D3D9 was around 10% faster.
And I literally mean the exact same shaders. With the D3DX compiler you can compile the exact same sourcecode for D3D9 or D3D10.
The shaders were very trivial anyway, just per-pixel diffuse lighting. Nothing beyond SM2.0, although I compiled them for SM3.0 and SM4.0.

Makes me wonder where the extra overhead comes from in D3D10. Is it just poor Intel drivers, or does D3D10 really do something different?
One would think that D3D10 would be faster, because my code would theoretically work more efficiently in D3D10. I update all shader constants in one call, and I don't need BeginScene()/EndScene(), and things like that.

I've also tried it on my 8800GTS. The difference between D3D9 and D3D10 was minimal, but still the D3D9 was a smidge faster in Vista.
When running the D3D9 code on XP Pro or XP x64, it was faster than either D3D9 or D3D10 in Vista. I've only tried it in windowed mode so far, though... Perhaps the Vista desktop is a limiting factor in performance, I'll have to see what happens when I run both in fullscreen to eliminate that factor.
Crysis has a major confound when judging D3D10 performance in the form of it's in-engine texture streamer. It's disabled at the lower two texture settings and kicks in at the higher two.

It is a major confound because DX10 already does a form of streaming of it's own in addition to the engine based one. Disabling texture streaming in the engine brings memory usage up to ~1.5gb in DX9 while remaining at 1gb in DX10 mode at the highest texture detail settings. The in-engine streamer also introduces artifacts so it becomes an apples-to-oranges scenario.

The same memory usage behavior is true of all DX10 games I've tried, but with nowhere near the performance drop. Far Cry 2, for example, drops from ~700mb to ~400mb going from DX9 to DX10 while still managing to perform faster.

Posted by Scali on Tuesday, 14-Jul-09 10:36:33 UTC
Doesn't that have to do with the virtual videomemory system in D3D10 though?
From what I understood, in DX9 all texturememory is mapped into the virtual address space at all times. But with DX10 they aren't mapped into the address space at all unless you specifically Map() them...?

Posted by CouldntResist on Tuesday, 14-Jul-09 15:42:59 UTC
Can we stop with the video memory "virtualization" once and for all? It's been established as WDDM 2.x vaporware.

Posted by HolySmoke on Tuesday, 14-Jul-09 17:55:32 UTC
Quoting Scali
Doesn't that have to do with the virtual videomemory system in D3D10 though?
From what I understood, in DX9 all texturememory is mapped into the virtual address space at all times. But with DX10 they aren't mapped into the address space at all unless you specifically Map() them...?
That's pretty much what I meant. What I was getting at (poorly, now that I re-read the post) was that because the game features an in-engine streamer you may have to control for it.

From what I understand, this built-in streaming engine was introduced to prevent the game from crashing at the 32-bit limit in DX9 mode at the higher texture settings. While I've never experienced it myself, I know that some setups can't run the game at full detail with streaming disabled for that reason. But while it's a fine solution to a DX9 issue, it's wholly unnecessary when running in DX10 while still being enabled by default. So, ideally, you'd want to disable in-engine texture streaming manually during testing (edit: setting textures to low or medium achieves the same result).

Don't get me wrong, I've never managed to get Crysis to run faster under DX10 than DX9 and I think it's definitely an engine issue. But if you want to test the actual rendering section of the engine (especially since you're using custom shaders) then you'd want to make sure that the streaming portion is disabled because it's a performance affecting workaround to an altogether unrelated problem.

Posted by Davros on Tuesday, 14-Jul-09 19:23:59 UTC
wouldnt the crysis devs know about streaming in dx10 and disable it in preference of their own streaming
plus the 32bit limit in dx9 would surely still exist in dx10 (vista32)

Posted by Scali on Tuesday, 14-Jul-09 19:50:31 UTC
Quoting Davros
plus the 32bit limit in dx9 would surely still exist in dx10 (vista32)
Not if you only map memory when required.
As long as the CPU doesn't need to access the videomemory, there's no reason for it to take up address space on the CPU side.

All I know is that he has a point.
I have a PC with a Radeon X1900XTX, which reports WAY higher memory usage than when running Crysis on a GeForce 8800.
The Radeon goes over 1.5 GB, sometimes close to 2 GB, while the GeForce uses about 1 GB to 1.2 GB. I don't know what causes it, but the DX9 machine just uses way more memory than the DX10 one does, despite having higher detail.

Posted by Demirug on Tuesday, 14-Jul-09 20:19:03 UTC
How many times this is comes up again?

The typical video memory window is 256 MB. It needs to be mapped to the address space of any application that uses Direct3D. Vista is somewhat smarter here as it can dynamically change the size of the mapped window. But this can although cause crashes if the window needs to grow and there is no address space left.

When it comes to resource allocation Direct3D 9 and 10 behaves different.

10 is quite easy as any resource needs its own size of the address space. This is even true for textures that are in the video memory. The reason for this is that the virtual video memory manager must be able to swap the resource to the system memory. For different reasons they are swapped to the process that owns the resource.

9 is more complicated. Before SP1 any managed resource needs twice the address space. One time for the system memory copy and one time for the real video resource. This was done for compatibility reasons. But with graphics cards that contain a large amount of video memory and only 2 GB address space you run in problems. Therefore there was a hotfix that becomes part of SP1. This hotfix tries to eliminate the address space requirement for the real resource when possible. But it still needs more address space then 10.

I don’t have the exact numbers for BattleForge here but it requires significant less address space with Direct3D 10 compared to 9.


Add your comment in the forums

Related graphics News

Khronos release OpenGL 3.3 and 4.0
Mazatech release AmanithVG 4.0, supporting OpenVG 1.1
OpenGL 3.0 is here (finally)
[Analysis] TSMC 40G to deliver up to 3.76x the perf/mm^2 of 65G & Power Implications
Old News: AMD CTO resigns, NVIDIA CFO retires, DDR3 for MCP7A, S3, etc.
S3 launches DirectX 10.1 Chrome 400 GPUs
GPGPU and 3D luminaries join 3D graphics heavyweights
The Technology of a 3D Engine - Part One
Samsung joins Qimonda and Hynix in the GDDR5 race
Stanford previews new F@H GUI and GPU client at SuperComputing 2007