Larrabee's Rasterisation Focus Confirmed

Wednesday 23rd April 2008, 08:33:00 PM, written by TeamB3D

For many months, researchers and marketing fanatics at Intel have been heralding the upcoming 'raytracing revolution', claiming rasterisation has run out of steam. So it is refreshing to hear someone actually working on Larrabee flatly denying that raytracing will be the chip's main focus.

Tom Forsyth is currently a software engineer working for Intel on Larrabee. He previously worked at Rad Game Tools on Pixomatic (a software rasterizer) and Granny3D, as well as Microprose, 3Dlabs, and most notably Muckyfoot Productions (RIP). He is well respected throughout the industry for the high quality insight on graphics programming techniques he posts on his blog. Last Friday, though, his post's subject was quite different:

"I've been trying to keep quiet, but I need to get one thing very clear. Larrabee is going to render DirectX and OpenGL games through rasterisation, not through raytracing.

I'm not sure how the message got so muddled. I think in our quest to just keep our heads down and get on with it, we've possibly been a bit too quiet. So some comments about exciting new rendering tech got misinterpreted as our one and only plan. [...]
That has been the goal for the Larrabee team from day one, and it continues to be the primary focus of the hardware and software teams. [...]

There's no doubt Larrabee is going to be the world's most awesome raytracer. It's going to be the world's most awesome chip at a lot of heavy computing tasks - that's the joy of total programmability combined with serious number-crunching power. But that is cool stuff for those that want to play with wacky tech. We're not assuming everybody in the world will do this, we're not forcing anyone to do so, and we certainly can't just do it behind their backs and expect things to work - that would be absurd."

So, what does this mean actually mean for Larrabee, both technically and strategically? Look at it this way: Larrabee is a DX11 GPU with a design team that took both raytracing and GPGPU into consideration from the very start, while not forgetting performance in DX10+-class games that assume a rasteriser would be the most important factor determining the architecture's mainstream success or failure.

There's a reason for our choice of phrasing: the exact same sentence would be just as accurate for NVIDIA and AMD's architectures. Case in point: NVIDIA's Analyst Day 2008 had a huge amount of the time dedicated to GPGPU, and they clearly indicated their dedication to non-rasterised rendering in the 2009-2010 timeframe. We suspect the same is true for AMD.

The frequent implicit assumption that DX11 GPUs will basically be DX10 GPUs with a couple of quick changes and exposed tesselation is weak. Even if the programming model itself wasn't significantly changing (it is, with the IHVs providing significant input into direction), all current indications are that the architectures themselves will be significantly different compared to current offerings regardless, as the IHVs tackle the problem in front of them in the best way they know how, as they've always done.

The industry gains new ideas and thinking, and algorithms and innovation on the software side mean target workloads change; there's nothing magical about reinventing yourself every couple of years. That's the way the industry has always worked, and those which have failed to do so are long gone.

Intel is certainly coming up with an unusual architecture with Larrabee by exploiting the x86 instruction set for MIMD processing on the same core as the SIMD vector unit. And trying to achieve leading performance with barely any fixed-function unit is certainly ambitious.

But fundamentally, the design principles and goals really aren't that different from those of the chips it will be competing with. It will likely be slightly more flexible than the NVIDIA and AMD alternatives, let alone by making approaches such as logarithmic rasterisation acceleration possible, but it should be clearly understood that the differences may in fact not be quite as substantial as many are currently predicting.

The point is that it's not about rasterisation versus raytracing, or even x86 versus proprietary ISAs.  It never was in the first place.  The raytracing focus of early messaging was merely a distraction for the curious, so Intel could make some noise.  Direct3D is the juggernaut, not the hardware.

"First, graphics that we have all come to know and love today, I have news for you. It's coming to an end. Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future."  That's why the message got so muddled, Tom.  And no offence, Pat, but history will prove you quite wrong.

Discuss on the forums


intel ± larrabee, raytracing, rasterisation

Latest Thread Comments (254 total)
Posted by 3dilettante on Tuesday, 08-Jul-08 15:33:06 UTC
Just to update one of the Larrabee threads:

Unless Gelsinger's just making stuff up, a 32-core 45nm Larrabee at 2 GHz could be expected to produce 2 SP TFLOPs.

This appears to support the speculation that Larrabee's SP/DP ratio is 2:1, since Intel already proposed (although this data is much older) that Larrabee could do 1 TFLOP DP with 24 cores at 2.5 GHz.
The DP ratio is a far sight better than current GPUs.
The SP peak is more problematic a comparison, as there is a range of 1 to 1.5 process nodes that GPUs will transition through in the meantime.
It seems clear that by late 2009/early 2010, peak numbers will be even less comparable, where we'll be comparing GPUs versus Larrabee in some possible kind of partial emulation.

If other speculation that each core is roughly 10mm2 is true, we could also suppose that at a bare minimum, Larrabee will be at least 320mm2.

Wattage numbers will prove interesting, I think.
Power improvements on TSMC's processes are not expected to be that large, while Intel's rumored to have a worst-case draw of 300W on a 45nm Intel process.
GPUs may increase power draw when their FLOP counts reach that high, though just looking at the 4850's peak FLOPs/Watt numbers would put it in a better light compared to Larrabee.

The integer pipeline will harken back to the general outlines of the original Pentium, though the FP half of things would be radically expanded.

Posted by nAo on Tuesday, 08-Jul-08 15:38:02 UTC
While a 2 Ghz clock makes perfect sense to me it might end up being a conservative figure..BTW..does your 10mm2 estimation takes in account TMUs and L2 as well?

Posted by 3dilettante on Tuesday, 08-Jul-08 15:42:57 UTC
Intel's older slides had a 2.5 GHz ceiling.

Posted by 3dilettante on Tuesday, 08-Jul-08 15:50:16 UTC
Quoting nAo
BTW..does your 10mm2 estimation takes in account TMUs and L2 as well?
Going by the old Intel slides B3d showed a while back, no.
This is each core and its corresponding L1s.

I'm still not sure if the corresponding sector of L2 per core will count towards 10mm2 or not.

I've excluded any special-purpose hardware, memory controllers, and all the fun bits of the uncore that are so important for multi-core designs.

My bare minimum estimate is one that I expect to be exceeded by a good amount. If the core is only the core and L1s, I'd expect it to be exceeded by a very significant amount.

Posted by nAo on Tuesday, 08-Jul-08 16:26:11 UTC
Then I guess the 10mm2 figure must be quite old and based on a different process, given that L2, TMUs, memory controller, etc.. might account for 1.5-2 times the cores area or even more.

Posted by 3dilettante on Tuesday, 08-Jul-08 16:32:07 UTC
Intel never gave an estimate of anything beyond the core size. In my interpretation, it never included anything but the execution core + L1s in the 10mm2 estimate.

Without knowing the number of other units and their relative areas, I couldn't estimate more than 32 times the estimated core area. Obviously, the non-core elements have an area >>0.

Posted by corysama on Wednesday, 06-Aug-08 02:51:51 UTC
Quoting Jawed

This is bloody tantalising:

But it's hidden and I can't find anything else on the topic :sad:

If that gets your attention, you'll probably enjoy this:

Posted by Simon F on Wednesday, 06-Aug-08 09:02:47 UTC
Quoting Jawed
But it's hidden and I can't find anything else on the topic :sad:
How is it hidden? It just appears to describe using polynomial evaluation with a small number of different polynomials for each function on a particular hardware architecture.

Posted by Jawed on Wednesday, 06-Aug-08 10:48:05 UTC
Quoting Simon F
How is it hidden? It just appears to describe using polynomial evaluation with a small number of different polynomials for each function on a particular hardware architecture.
Plebs like me can't read the document: pay to view. Jawed

Posted by Jawed on Wednesday, 06-Aug-08 12:59:09 UTC
Quoting corysama
If that gets your attention, you'll probably enjoy this:
With this and the Imagine paper kindly forwarded to me, hopefully there's some clues on what Intel will be doing for transcendentals on Larrabee. Presumably there'll be a split between single-precision high-speed versions for graphics and something more refined for double precision. I'll look at these later. Thanks all. Jawed

Add your comment in the forums

Related intel News

RWT explores Haswell's eDRAM for graphics
RWT: An Updated Look at Intel's Quick Path Interconnect
32nm sixsome over at RealWorldTech
Intel Core i3 and i5 processors launched
Analysis: Intel-TSMC announcement more complex than reported
Intel and TSMC join forces to further Atom
Fudzilla: Intel 45nm Havendale MCM replaced by 32nm+45nm MCM
Intel announce Core i7 processors, reviews show up
Intel's Aaron Coday talks to Develop about Larrabee
Larrabee to also be presented at Hot Chips