So it's clear from Kirk's CUDA course that you have two execution personalities on the chip, a CUDA one and a graphics one, that the chip configures itself for depending on what's running at any given time. And because you could be running a CUDA context and a graphics context on the chip at the same time, where you share the chip, is there any setup or teardown cost associated with switching between personalities like that, when executing a new context?

It's similar to having two graphics applications running in the sense that there is a penalty for a context switch, but the chip works very hard to minimise that and it's certainly less than it's ever been on our hardware.

So an application programmer should just assume that the behaviour there is free and not worry about sharing a CUDA context with a graphics context on any given chip?

Right, but you can benchmark it if you want and you'll see that it's lower, but it's not a big deal.

So we know Vista's not a big deal right now as a CUDA host because your customers aren't requesting it, but because Vista can reasonably using the GPU all the time because of the interface on a host system, are there any issues you're aware of there with personality switching?

No, it's really just the same as always, where you just share the chip as you would with two graphics contexts and the desktop is just a GPU client.

It's just the same time-slicing method as always there with a CUDA context running?

Yeah, the GPU just time slices on its own while running and it just decides what it's doing there automatically, so there's specific hardware support for that.

Is there any difference across operating system in terms of sharing the GPU, say between Windows and Linux?

The code between Windows and Linux is pretty similar there, but we spend a bunch of time in code for each OS handling cases where the process can die. CTRL+C is easy, but there's a bunch of other ways a process can die on each! Compare that case to one where a debugger attaches, for example. So we spent a lot of time doing per-OS process management to make sure things are correct there. Outside of those obvious OS things there's not much that's different. My developers aren't tied to one OS, for example, so while they interact with both for regression testing, the CUDA differences are minimal enough that there's no set OS we make people use there.

This is maybe a cool question for some folks to hear answered, so how big is your team and how many guys do you have developing CUDA?

The actual size is always growing, and the numbers, well let's just say I have a driver team and a compiler team, and access to guys in the graphics driver and compiler teams elsewhere in the company too, so the team size is variable depending on what we're working on. So we try to leverage work already done and pull resources already created for other efforts if they apply, plus we have our CUDA specific side which in some cases has added an entire level of complexity to some part of our systems. I also have guys working on our shipping libraries, and we also have DevTech guys that work with ISVs on specific technical support issues. It's a pretty serious effort, but we're continually growing so if you know people, get them to get in touch! I have a bunch of positions open ready to fill, go look at the postings on nvidia.com.