RWT Analyzes Bulldozer Benchmarks

Wednesday 30th March 2011, 10:29:00 PM, written by Alex G

Recently, benchmarks for AMD's eagerly awaited Bulldozer architecture have leaked online. So far, this has mostly created uncertainty about the performance of future products, rather than answering questions.

David Kanter, long time friend at RealWorldTech and always eager to discuss CPU architecture and performance, takes a look at the test system and benchmarks and explains the difficulties in precisely estimating performance. He also goes on to analyze the benchmark results and draws several conclusions about Bulldozer's microarchitecture and performance and what it may mean for future products. 

Well, we certainly aren't going to spoil you, but we do encourage you to head over and check out the thorough analysis for yourself! Anyone remotely interested in Banana Dong (*ahem* B3D Codename for Bulldozer) shan't be disappointed.

Discuss on the forums

Tagging

b3d;cpu;amd ± bulldozer,


Latest Thread Comments (6 total)
Posted by Pete on Wednesday, 30-Mar-11 23:08:00 UTC
I can't help but read the article title as a reference to a certain triple rainbow (http://www.youtube.com/watch?v=OQSNhk5ICTI).Reading now!

Posted by dkanter on Thursday, 31-Mar-11 07:15:02 UTC
Quoting Pete
I can't help but read the article title as a reference to a certain triple rainbow (http://www.youtube.com/watch?v=OQSNhk5ICTI).

Reading now!
That was on purpose : P

David

Posted by 3dilettante on Thursday, 31-Mar-11 23:59:33 UTC
I've tried to imagine what a bad case would be for BD.

I suppose it would be code that didn't use FMA, shuffled a lot (cutting FP throughput in half), had two threads slamming the write pipe with scattered writes that didn't coalesce in the write coalescing cache, and potentially wasn't blocked optimally for the smaller L1.

Posted by entity279 on Friday, 01-Apr-11 06:44:27 UTC
But what does the FP shuffle actually do? Sorry, but i really am clueless on this one.

Posted by 3dilettante on Friday, 01-Apr-11 14:48:03 UTC
It moves values around within a SIMD register(s).
The XBAR unit can go further in how it can permute vectors than what AVX is able, but it also takes up one of the two FP issue ports.
This could save instruction usage by having a permute move values around within and between vectors in a single operation, instead of having to use multiple less generic shuffles to achieve the same end.
That's in XOP, however, so it may be a very useful instruction that will not get used as much as it could.

Posted by entity279 on Friday, 01-Apr-11 15:29:26 UTC
Thank you :)


Add your comment in the forums