Part One: Vertex Shader

Vertex Shader's, just like High Order Surfaces, were introduced by DirectX 8.0, where it was version 1.0 and 1.1. In DX9 and the newest generation GPUs like R300 and NV30, VS is extremely enhanced. VS2.0 in DX9 provides not only support of all instructions in VS 1.x and increases the number of both instruction slots and constant registers, and it adds new instructions for static flow control. The highlights of VS3.0 include indexing, dynamic flow control, and texture lookup abilities. The vertex processing units in the new GPUs, whether it comes from R300 or NV30, exceeds DX9 VS2.0. We should praise both ATI and NVIDIA!

Vertex Processing Power

Now, let's look at the specifications of DX8 VS1.1, DX9 VS2.0, R300 VS, NV30 VS and DX9 VS3.0.

(Note: the information relating to DirectX 9 was taken from the leaked DX9 Beta2.1 specification. Being in beta form some or many specification are subject to change prior to the release of the final version. We are aware that Beta 3 is currently being tested and may contain many different specifications to those talked of in this article. The DirectX9 specifications in this article should only be used as a guide and not as a definitive statement as to how DX9 will appear when its finally released - Ed.)

Version 1.x 2.0 2.0 2.0+ 3.0
Max Runtime Instruction Number 128 64k 64k 64k 64k
Instruction Slots 128 256 256 256 256
Call and Return - ü ü ü ü
Nested Subroutine - - - 4 4
Nested Loop - - - ü* 4
Static Flow Control - ü ü ü ü
Dynamic Flow Control - - - ü ü
Per Channel Masking - - - ü ü
Texture Lookup - - - - ü

* It seems there are no nested loop limitations in NV30 because any float constant component can be used as loop counter.

(The instruction slots listed in the above table are minimum counts required to meet specification, higher instruction counts can be exposed through the DX Caps mechanism. Current indications from DX9 Beta3 are that the minimum number of instruction slots for VS3.0 is 512 - Ed.)

Don't be alarmed by the 64k instruction number since it is the result of LOOP. Both VS2.0 and VS3.0 only have 256 instruction slots, and that is to say the number of static instructions cannot surpass 256. In fact, game engines don't need such "long" loops, but DCC [Digital Content Creation] applications do need it.

We also find that R300 VS is almost DX9 VS2.0, NV30, however, is much like DX9 VS3.0. Yes, DX9 VS2.0 is designed according to R300, and NV30 VS goes beyond DX9 VS2.0.

Because of supporting nested subroutines, dynamic flow control, and per channel masking, NV30 shows its muscle here. In theory, NV30 can carry more powerful vertex shading programs which can show complex visual effects running faster. Why say so? Let’s continue our analysis.