AMD announces new GPGPU card, hints at RV670 specs

Thursday 08th November 2007, 01:01:00 AM, written by Tim

AMD has announced the RV670-based FireStream 9170 GPGPU processor as well as the FireStream SDK. Notable are 2GB of RAM, a 775-800 MHz core clock, 500 GFLOP/s, double precision support, and Brook+ (based on Brook, obviously!) as the official high-level language. But, let's look at these things one at a time.

First, the card itself is an RV670 at 775-800 MHz with 320 shader processors coupled with 2 GB of GDDR3. We can be sure it's RV670 and not R600 because it has a TDP of 150W, which bodes well for the consumer versions of the RV670. The 500 GFLOP/s figure indicates that RV670's shader core will be very similar to R600's, which we covered in our R600 architecture overview. It's also clear that RV670 does support double precision at some level, but how fast and whether this will be available on consumer cards is still to be answered. 2GB of GDDR3 isn't enough information to end the 256-bit versus 512-bit debate, but considering the size of the chip if it is built on a smaller process (as well as the fact that R600's extra memory bandwidth didn't help performance in the vast majority of applications), it's most likely a 256-bit chip.

The software side, though, is more interesting. AMD introduced an R580-based FireStream card last year without any software support but the ability to compile PS3.0 HLSL to CTM. With the 9170, AMD is introducing the FireStream SDK, which looks an awful lot like what we've seen of CUDA. Brook is now the officially supported language in the form of Brook+. What the + entails isn't yet known, but it's likely that it brings the baseline for support past the D3D9 level that Brook required. This would allow some important functions to be supported, such as gather and scatter, that weren't possible in D3D9. There's also the Compute Abstraction Layer, which seems to be AMD's equivalent to PTX. It's an intermediate assembly language that should allow code to be optimized on the fly for whatever architecture is present, presumably with a JIT compiler in the driver. Finally, as previously thought, the AMD Core Math Library will be getting GPGPU support.

The FireStream 9170 should be available in 1Q 2008; hopefully, the SDK will be available sooner--we want to poke at it!

Discuss on the forums

Tagging

ati ± gpgpu, firestream, cuda, brook, hooraymhouston


Latest Thread Comments (128 total)
Posted by Jawed on Thursday, 27-Mar-08 19:23:27 UTC
Quoting Farhan
Yeah, obviously you have to do that for an ADD. I was just talking about the MUL.
What I'm proposing is that the final stage for the MUL is a pipelined-add, for p1+p2. Hence the exponent adjustment and trading of significant bits in p2 against bits of p1. You queried this addition earlier saying it needs to be done at 54+27 bits precision, but I hope I've shown that treating it as a normal floating point add (de-normalising: shifting one operand and modifying the exponent) allows it to be performed with only 54 bits (for a 53 bit final result). Jawed

Posted by Jawed Something different on Thursday, 27-Mar-08 19:33:59 UTC
Any chance that modifying/widening the DP4 paths will provide the requisite stages? Jawed

Posted by Farhan on Thursday, 27-Mar-08 23:02:05 UTC
Quoting Jawed
What I'm proposing is that the final stage for the MUL is a pipelined-add, for p1+p2. Hence the exponent adjustment and trading of significant bits in p2 against bits of p1. You queried this addition earlier saying it needs to be done at 54+27 bits precision, but I hope I've shown that treating it as a normal floating point add (de-normalising: shifting one operand and modifying the exponent) allows it to be performed with only 54 bits (for a 53 bit final result). Jawed
Regardless of whether it's a pipelined add, you can't do that shifting thing for the MUL because that would be incorrect. The alignment for p1 and p2 is always fixed (they are not 2 completely independent FP numbers, think of them as having a shared exponent). The addition is always between the top 54 bits of p1 and the bottom 54 bits of p2, with the carry propagation having to go through all the way to the MSB of p2 (27 bits).

Posted by Jawed on Friday, 28-Mar-08 03:58:22 UTC
Quoting Farhan
Regardless of whether it's a pipelined add, you can't do that shifting thing for the MUL because that would be incorrect. The alignment for p1 and p2 is always fixed (they are not 2 completely independent FP numbers, think of them as having a shared exponent).
I've diagrammed a possible set of exponents: Code:
---------
Blo 27Alo 27 --- w55Bhi 53Alo 27 --- z81 ---Z+W ===== z82 partial sum 1 ===== Blo 27Ahi 53 --- y81Bhi 53Ahi 53 --- 107 ---X+Y ===== 108 partial sum 2 ===== p1 z82p2 +108 ======= 109 =======
--------- For the sake of clarity, both A and B have exponent 53. When split into hi and lo parts, the hi parts keep their exponent, 53, while the lo parts are normalised to exponent 27 (though it could be lower for either of them). I've then worked through the multiplications and additions, calculating the maximum value of each of the resulting exponents. Doing this I think I've understood my mistake. When I said "the count of significant bits in p2 determines how many bits from p1 are used, i.e. 54-p2+27" that's wrong, it should be the difference in exponents as there's always 54 significant bits in p2. --- My suggestion is the addition, p1+p2, is done on the final adder in the pipeline (in lanes X and Y). This adder is required to perform a DADD instruction, so in this case it is also used for p1+p2. Since DADD has to support two 53-bit operands by being a 54-bit adder, the addition of p1+p2, 27 bits + 54 bits requires no extra hardware dedicated to MUL. So, what I'm thinking is that a conventional single precision DP4 needs to perform a final ADD on 4 MULs. So the DP4 instuction requires a 4 operand adder. I'm wondering if this same adder can also support:* DADD A, B* DMUL p1, p2* DMAD p1, p2, CC comes from A*B+C. Does DP4 work like that, though? Jawed

Posted by itaru on Sunday, 25-May-08 12:35:05 UTC
http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=95565&enterthread=y
AMD Stream SDK v1.1-beta Now Available For Download

The AMD Stream Team is pleased to announce the availability of AMD Stream SDK v1.1-beta!

The installation files are available for immediate download from:
FTP Download Site For AMD Stream SDK v1.1-beta (ftp://streamcomputing:streamcomputing@ftp-developer.amd.com/AMD_Stream_SDK/v1.01.0-beta)

The AMD Stream Computing website will be updated in the next few days to reflect this new release.

With v1.1-beta comes:

- AMD FireStream 9170 support
- Linux support (RHEL 5.1 and SLES 10 SP1)
- Brook+ integer support
- Brook+ #line number support for easier .br file debugging
- Various bug fixes and runtime enhancements
- Preliminary Microsoft Visual Studio 2008 support


If you have any questions, please do not hesitate to post your question to the forum.

Sincerely,
AMD Stream Team

Posted by wingless on Saturday, 07-Jun-08 14:22:47 UTC
Quoting itaru
http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=95565&enterthread=y
AMD Stream SDK v1.1-beta Now Available For Download

The AMD Stream Team is pleased to announce the availability of AMD Stream SDK v1.1-beta!

The installation files are available for immediate download from:
FTP Download Site For AMD Stream SDK v1.1-beta (ftp://streamcomputing:streamcomputing@ftp-developer.amd.com/AMD_Stream_SDK/v1.01.0-beta)

The AMD Stream Computing website will be updated in the next few days to reflect this new release.

With v1.1-beta comes:

- AMD FireStream 9170 support
- Linux support (RHEL 5.1 and SLES 10 SP1)
- Brook+ integer support
- Brook+ #line number support for easier .br file debugging
- Various bug fixes and runtime enhancements
- Preliminary Microsoft Visual Studio 2008 support


If you have any questions, please do not hesitate to post your question to the forum.

Sincerely,
AMD Stream Team
Awesome. I hope we see more ATI support in GPGPU before CUDA takes over the market.

Posted by Karoshi on Monday, 09-Jun-08 01:31:00 UTC
Quoting itaru
AMD Stream SDK v1.1-beta Now Available For Download The AMD Stream Team is pleased to announce the availability of AMD Stream SDK v1.1-beta! With v1.1-beta comes: - AMD FireStream 9170 support - Linux support (RHEL 5.1 and SLES 10 SP1) - Brook+ integer support - Brook+ #line number support for easier .br file debugging - Various bug fixes and runtime enhancements - Preliminary Microsoft Visual Studio 2008 supportIf you have any questions, please do not hesitate to post your question to the forum. Sincerely, AMD Stream Team
Wishlist:- Brook CUDA backend.A quick search around here didnt find any references to this. I think I read a post sugesting CUDA on CTM or CAL a few days ago. Brook on CUDA seems easier.Disclaimer: I know CUDA and AMDs stream SDK only at the executive PDF level.I see advantages to a brook port to cuda.

Posted by itaru on Monday, 16-Jun-08 08:50:23 UTC
http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~126593,00.html
AMD Stream Processor First to Break 1 Teraflop Barrier

—Next-generation AMD FireStream™ 9250 processor accelerates scientific
and engineering calculations, efficiently delivering supercomputer performance at
up to eight gigaflops-per-watt —

The AMD FireStream 9250 stream processor includes a second-generation
double-precision floating point hardware implementation delivering
more than 200 gigaflops, building on the capabilities of the earlier
AMD FireStream™ 9170, the industry’s first GP-GPU with double-precision floating point support.
The AMD FireStream 9250’s compact size makes it ideal for small 1U servers
as well as most desktop systems, workstations, and larger servers and
it features 1GB of GDDR3 memory, enabling developers to handle large, complex problems.

AMD is also working closely with world class application and solution providers
to ensure customers can achieve optimum performance results.
Stream computing application and solution providers include CAPS entreprise,
Mercury Computer Systems, RapidMind, RogueWave and VizExperts.
Mercury Computer Systems provides high-performance computing systems
and software designed for complex image, sensor, and signal processing applications.
Its algorithm team reports that it has achieved 174 GFLOPS performance for
large 1D complex single-precision floating point FFTs on the AMD FireStream 9250

Posted by MfA on Monday, 16-Jun-08 12:31:37 UTC
174 GFLOPs is incredibly fast (CUFFT did around 20 on the G80 last I looked).

Posted by Anarchist4000 on Monday, 16-Jun-08 21:02:34 UTC
1 TFLOP,


Add your comment in the forums

Related ati News

ATI shoots a Bolt through its GPU compute stack
AMD releases CodeXL 1.0
ATI 69xx Series launches - Crocodile Dundee beware
ATI 68xx Series Launches
ATI releases OpenGL4.0 preview driver, for great justice
ATI 5830 launched, baffled looks follow
ATI Cypress Gaming Performance Analysis
ATI Catalyst 10.1 Display Driver
ATI Radeon HD 5670 released, bringing DX11 for less than $100
ATI 5970 comes out to play, completes ATI's lineup