OVER 9000! >:O

Halo 3's particle system used an uber-shader framework, which while offering extensive control over particles, meant that the cost per particle was quite high.  Uber shader can be thought of as a phat shader with branched sub-programs, which are enabled or disabled according to some fetched input value. As a result there are going to be a zillion permutations of end results, which while flexible and easy, can be an inefficient use of resources.  While the uber shader does imply heavy dynamic branching (and more memory consumed for the shader cache), all the thread scheduling going on in Xenos mitigates the issue to some degree, but if one can avoid that, it's all the better.

So basically, in Bungie's new system, particles are numerous and small, have very short life-times and simple pixel shaders. Over 24K particles can be rendered per frame with pseudo-collisions (more on that soon) and in less than 0.3ms on the GPU (<1% frame time).  Not bad for a six-year-old piece of... hardware! There is, of course, a CPU cost, but it amounts to telling the GPU what to render i.e. miniscule. Everything from there is done on the GPU, so there are no traffic/scope issues with the CPU and the like.

In trying to save space, Bungie distinguishes between static and dynamic particle states. The dynamic particle state is stored in three textures for a total of 28 bytes per particle, a significant reduction from Halo 3's 80 bytes/particle. A 4-component, 128-bit (FP32 per component) buffer stores particle position (XYZ) and age, a necessary precision due to considering all particles in the world space. The second buffer stores the velocities and the delta age in a 64-bit/FP16. Lastly, a 32-bit INT8 buffer is used to store 2D rotation & scale (X,Y vector) for randomized particle appearance, an 8-bit encoding controlling the brightness of monochromatic particle illumination, and an 8-bit reference for a particular particle's static data or type. Said static data is stored in a library texture consisting of three rows: physics (drag, gravity, turbulence), collision (elasticity), and render property inputs for both vertex and pixel shaders (Vertex Shader: size, orientation/velocity/screen facing, motion blur, Pixel Shader: color tint, texture index). The 8-bit reference for the particle type looks at a particular column, so it has the information for all three sets of data.

For the particle collision, the depth buffer is sampled from the point of view of the camera and compared to the particle. With the particle behind a certain depth, there is collision. The main (small) problem is that there won't be collision with anything off-screen or at an oblique angle, but assuming that the particles have a short life-span, it's not really a big deal to worry about unless you're some maniac fanboy pixel counter. Pseudo-Collision indeed!

When a collision is detected, the screen-space normal buffer is sampled and the surface normal is compared to the particle velocity. Particles moving in the direction of a surface normal are eliminated so that they do not bleed through the surface, but incoming particles moving against the surface normal are bounced i.e. reflected about the surface normal with a velocity damped by elasticity. Particles below a certain velocity are marked for rest and can eventually fall through and disappear; actually, any particle marked for rest no longer has any physics update so they just stop rolling around and disappear due to life-time, or falling through ground, or being destroyed because there are many newer particles and the particle buffer only keeps track of so many. 

One of the nifty features of using the screen-space normal buffer for determining bounce is that the particles can be compared against the textured bumpmaps and not just the polygon surface. i.e. particles are able to bounce from pseudo-collision with normal mapped textures.