SwiftShader 2.0: A DX9 Software Rasterizer that runs Crysis

Friday 04th April 2008, 12:00:00 AM, written by Arun

TransGaming has just released SwiftShader 2.0, an highly optimized software rasterizer that supports DX9 and Shader Model 2.0 and scales with multi-core processors. It can run (albeit slowly) many modern games and it makes a dual-core Penryn perform similarly to the GeForce FX5600/5700 in 3DMark05.

Nicolas Capens (aka Nick on our forums) is the creator and lead programmer behind SwiftShader - he started by writing the DX7/DX8 swShader many years ago, and eventually turned it into a commercial product in 2005 with the help of TransGaming and Gavriel State in particular.

A demo is now available on TransGaming's website, with which we ran 3DMark05 and obtained a score of ~400 on a stock Core 2 Duo E8400. That's still not mind-blowingly fast, but keep in mind Direct3D's reference rasterizer would likely score in the single digits and the SGX-based IGP in Intel's upcoming Silverthorne-based Menlow platform for UMPCs/MIDs is claimed to only score ~150. It would also be much more than enough to run Vista's Aero interface smoothly.

Finally, we were told SwiftShader would run Crysis in the mid-single digits at the lowest settings on Intel quad-core systems. Definitely not very playable yet, but that should make it clear SwiftShader is perfectly usable for casual games. We look forward to seeing how SwiftShader evolves in the future and how it will perform on future high-end CPUs such as Intel's Nehalem and AMD's Shanghai - certainly it might be a fun way to benchmark CPUs once in a while.


Discuss on the forums

Tagging

graphics ± swiftshader, software, rasterizer, 3dmark05, crysis


Latest Thread Comments (171 total)
Posted by ninelven on Saturday, 13-Jun-09 12:25:12 UTC
Wow, you bump the over a year old thread just to make that comment. Congratulations you are the winnar!*of the first annual biggest asshole award on B3D.

Posted by Scali on Saturday, 13-Jun-09 12:26:54 UTC
Quoting Thorburn
Did you not just answer your own question? SwiftShader is useful for DX9 code, WARP for DX10/11...
Not really. As far as I know, the idea was to make SwiftShader support DX10. I recall Nick saying that he wanted to have the first SM4.0 implementation...
If that is still the goal, then SwiftShader would become a direct competitor.
Or has WARP moved the goalposts for SwiftShader now?

Quoting Thorburn
I'd imagine a typical DX10 title will have more complex shaders making them slower.
Depends on how you look at it.
If we take the usage of SwiftShader as a software solution for 'casual games', relieving the developer from worrying about hardware compatibility... then that doesn't hold.
WARP could do the same thing, except you would use DX10/DX11 instead of DX9. Arguably, more powerful shaders are actually better for a software renderer. You can render effects with more elegant algorithms, rather than just bruteforce and multipass.
In a few years, DX9 code may no longer be relevant, and even casual games might use DX10/11.

In other words: does this mark the end of SwiftShader? Was it already dead anyway? Or is SwiftShader moving forward, and will it compete with WARP?

Posted by Thorburn on Saturday, 13-Jun-09 12:59:17 UTC
Quoting ninelven
Wow, you bump the over a year old thread just to make that comment. Congratulations you are the winnar!
I'm not sure that really makes him an asshole, it was a valid observation and I assume SwiftShader is still being worked on in some capacity.

Posted by ninelven on Saturday, 13-Jun-09 18:20:08 UTC
Quoting Thorburn
I'm not sure that really makes him an asshole
I am.

Posted by Simon F on Monday, 15-Jun-09 09:17:21 UTC
Quote
I'm not sure that really makes him an ***
Quoting ninelven
I am.
I think we at least have a "Winn*a*r" of the "B3D foot in mouth contest". :roll:

Posted by Nick on Monday, 15-Jun-09 09:48:08 UTC
Has anyone succeeded at getting WARP to run Crysis? I keep getting an error about the D3D10ReflectShader entry point not being found. I assume that's either because it's an older beta build, or they use a slightly modified version of Crysis.

Posted by Nick on Monday, 15-Jun-09 10:00:04 UTC
Quoting Thorburn
I'd imagine a typical DX10 title will have more complex shaders making them slower.
The triple-A titles that push the envelope certainly do, but applications suited for software rendering are no more complex than when using Direct3D 9. They typically don't use anything beyond the capabilities of a Shader Model 2.0 card.

Crysis with all settings on low looks no different to me when running with DX9 or 10, and with the latter API only requires Shader Model 2.0.

Posted by Scali on Monday, 15-Jun-09 10:43:30 UTC
Quoting Nick
The triple-A titles that push the envelope certainly do, but applications suited for software rendering are no more complex than when using Direct3D 9. They typically don't use anything beyond the capabilities of a Shader Model 2.0 card.

Crysis with all settings on low looks no different to me when running with DX9 or 10, and with the latter API only requires Shader Model 2.0.
Yea, that's what one would think. However, when I was playing around on my Intel X3100,
I noticed that Crysis ran slower in D3D10 mode than in D3D9 mode, even at the lowest settings.

So I conducted a small test on my own. I rendered the exact same scene with the exact same shaders in D3D9 and D3D10, and D3D9 was around 10% faster.
And I literally mean the exact same shaders. With the D3DX compiler you can compile the exact same sourcecode for D3D9 or D3D10.
The shaders were very trivial anyway, just per-pixel diffuse lighting. Nothing beyond SM2.0, although I compiled them for SM3.0 and SM4.0.

Makes me wonder where the extra overhead comes from in D3D10. Is it just poor Intel drivers, or does D3D10 really do something different?
One would think that D3D10 would be faster, because my code would theoretically work more efficiently in D3D10. I update all shader constants in one call, and I don't need BeginScene()/EndScene(), and things like that.

I've also tried it on my 8800GTS. The difference between D3D9 and D3D10 was minimal, but still the D3D9 was a smidge faster in Vista.
When running the D3D9 code on XP Pro or XP x64, it was faster than either D3D9 or D3D10 in Vista. I've only tried it in windowed mode so far, though... Perhaps the Vista desktop is a limiting factor in performance, I'll have to see what happens when I run both in fullscreen to eliminate that factor.

Posted by rapso on Saturday, 20-Jun-09 15:27:14 UTC
One point is the driver overhead, vista virtualizes all memory, so it also pushes the data to the drivers when and how it wants, that intermediate-buffering-overhead is what makes vista in general slower than xp (and also makes it more difficult to make application specific optimizations like drivers did for a lot of games on winxp).

the main difference between d3d9 and d3d10 is the constant handling.

in D3D10-mode, the driver has to assume there are some new constants that have to be set, that means that at least the cache needs to be flushed. But in worst case it means that some shader optimizations are either done on per drawcall basis or not enabled at all.

in D3D9 mode, the driver gets the constants you set probably via the commandbuffer, if u dont set any, it knows all data is up2date, all shader can be kept.

const buffer shall usually save cpu-overhead on application side, but applications usually work with const-buffers like simple constants, updating most of them frequently and barely save constants for the long-term (like maybe material settings). additionally, if you want to change just one simple constant, you have to update them all, that leads to more overhead than saving.
also the drivers have to keep track of all constant settings, even if u just change one, you have to push the whole const-buffer over the bus to vmem.

Posted by Scali on Saturday, 20-Jun-09 16:06:15 UTC
Quoting rapso
One point is the driver overhead, vista virtualizes all memory, so it also pushes the data to the drivers when and how it wants, that intermediate-buffering-overhead is what makes vista in general slower than xp (and also makes it more difficult to make application specific optimizations like drivers did for a lot of games on winxp).
Well, D3D9 was faster even on Vista and Windows 7.

Quoting rapso
the main difference between d3d9 and d3d10 is the constant handling.

in D3D10-mode, the driver has to assume there are some new constants that have to be set, that means that at least the cache needs to be flushed. But in worst case it means that some shader optimizations are either done on per drawcall basis or not enabled at all.

in D3D9 mode, the driver gets the constants you set probably via the commandbuffer, if u dont set any, it knows all data is up2date, all shader can be kept.
Doesn't make sense to me.
I do update the constants all the time, at the very least I need to update the transform matrices for the object animation, and light positions and such.
With D3D9 I have to make a separate call for each constant that I update. With D3D10 instead, I just map the entire constant buffer in one go, put the new values in, and unmap it.
So in D3D10 I specifically tell the driver "I'm done with it, the constant buffer is up to date now", where with D3D9 it doesn't know what is going on exactly.
In my case I update all constants every frame anyway, because I used very simple shaders.

Also, what you're saying isn't entirely correct. You can have multiple constant buffers, and you should order them to the frequency of updating them. All this should make D3D10 more efficient, when used properly. So you don't need to push "the whole constant buffer" over the bus. Only the buffer you're updating at the time. Since you do the update in a single go, it should get maximum performance with a burst transfer over the bus.
However, in my case the constant buffers were very small. Only one matrix and a few float values. So bandwidth shouldn't be an issue anyway.

I wonder if it may have something to do with thread safety. D3D9 isn't thread-safe by default, and I never used that flag to get a thread-safe instance. I don't think D3D10 has this option, so perhaps you always get a thread-safe instance by default, which would explain at least some of the extra overhead.


Add your comment in the forums

Related graphics News

Mazatech release AmanithVG 4.0, supporting OpenVG 1.1
OpenGL 3.0 is here (finally)
[Analysis] TSMC 40G to deliver up to 3.76x the perf/mm^2 of 65G & Power Implications
Old News: AMD CTO resigns, NVIDIA CFO retires, DDR3 for MCP7A, S3, etc.
S3 launches DirectX 10.1 Chrome 400 GPUs
GPGPU and 3D luminaries join 3D graphics heavyweights
The Technology of a 3D Engine - Part One
Samsung joins Qimonda and Hynix in the GDDR5 race
Stanford previews new F@H GUI and GPU client at SuperComputing 2007
Crytek postpone Crysis beta until October 26th