Virtual Video Memory


The memory system on most consumer graphics processors is setup in a way that is very computer graphics centric – pretty much all it sees are texture objects, vertices, triangles, and shaders. If you want to render some geometry with a texture map, but the texture is not currently located in video memory, that entire texture with all of its mip-maps must be loaded into video memory before the graphics processor can begin rendering. There are a number of problems with this method, however. For one, it is extremely wasteful – for a given frame you’re never going to need to use the entire texture object such as mip-maps -- in fact, the mip-map level you need the least often is the one that consumes the most memory and bandwidth: the first level. This is especially true for games set in very large, open environments where the vast majority of textures on screen are located so far away that only the lowest resolution mip-map levels are required. All of this excess information wouldn’t be a problem if only one texture needed to be transferred to the graphics card, but in cases where you’re constantly spilling over to AGP/system memory things can start getting aggravating for the user. Bandwidth on modern graphics cards is roughly ten times faster than that of the AGP bus, making it glaringly obvious to the user when their card has run out of available video memory – the frame rate of whatever application they’re using starts fluctuating sporadically.

One solution to this is to simply do all texturing directly from AGP memory. This would certainly remove the asymptotic performance problems with the above memory system, but only because now everything for every frame is going over the same, slow bus. In a sense this fixes the fluctuating problem by making everything equally slow; so clearly a better solution must be possible.

With graphics processors becoming increasingly similar to their general-purpose brethren, perhaps a better solution could be obtained by emulating how CPUs handle memory management. And, indeed, very similar problems were met as general-purpose processors evolved, since not all programs can always fit into CPU caches, and executing programs directly from system memory is far too slow. The solution back then was to change the focus away from physical memory constraints and instead use virtual memory. With virtual memory, the programmer no longer has to worry so much about the exactly amount of cache or system memory and instead handles all memory allocations on the virtual address space, which is divided up into small pages (usually around 4KB in size). It’s then up to the implementation to manage each page and make sure the appropriate pages are in cache when they need to be, in system memory when they aren’t, or in a page file on the hard drive if there’s no other room.

When extended to graphics processors, virtual video memory takes care of the stuttering performance problems nicely, since all textures, shaders, et al. are split up into small chunks that can be seamlessly transferred over the bus. A 4KB page file, for example, equates to a 32x32x32bit sub texture, which is big enough so that you probably won’t have to transfer many pages over the bus whenever a new texture becomes visible (i.e. it is unlikely that much more than a 32x32 texel region of the texture has been exposed for the current frame), but small enough so that it can be done without virtually any performance hit.

Integer Instruction Set


While no longer having to deal with instruction limits is a big gain in programmability, there is still a lot more needed if the goal is to make graphics processors more general-purpose. A major area that needs improvement is in integer processing. Currently, almost everything done inside shaders is totally floating point (outside of static branching and the like), which is fine for most graphics operations, but it becomes a real big problem when you start doing dynamic branching or wish to do a form of non-interpolable memory lookup, such as indexing a vertex buffer.

On current graphics processors, the only type of memory addressing you can do is a texture lookup, which uses floating-point values. If the address does not exactly align with a texel, either the nearest texel is taken (in the case of point sampling), or several texels are and interpolated between to obtain a value somewhere in between the closest texels. For textures this is fine, but this is clearly completely inadequate for general memory addressing, where contiguous blocks of memory may be completely unrelated to one another (and hence, interpolating between them is completely meaningless). Luckily, Microsoft is including an entire integer instruction set in the 4.0 shader model for just these types of problems.