Cell vs G80

Blogged under Cell, Consoles, Industry News, Sony, PlayStation by Barry Minor on Wednesday 5 September 2007 at 6:13 pm

I recently ran across an interesting paper, Stackless KD-Tree Traversal for High Performance GPU Ray Tracing, which documented the strides made by GPU based ray-tracing over the last decade and introduced a new way of mapping acceleration structure traversal to modern GPUs, namely Nvidia's new G80. The paper was authored by Philipp Slusallek's talented computer graphics group at Saarland University in Germany. Our own Cell iRT ray-tracer was based on papers written by Philipp's students so we have great respect for their work. It was interesting to see the great lengths researchers are willing to go through in order to harvest a fraction of the floating point potential locked away in these black boxes.   

From 10,000 feet here's how the Cell processor stacks up to Nvidia's new G80 GPU:

 

Both parts are compared at 90-nanometre.  

As you can see the G80 is twice as big, which is a good indication it requires twice the power, and produces twice the floating point power on paper.  However when we ran one of the benchmarks discussed in the paper, the Stanford Bunny, we found that the Cell processor when combined with the iRT produces significantly better performance (we don't have access to the other datasets listed in the paper):

  

 

Left to Right:  

2.6 GHz AMD Opteron - Saarland Ray-tracer

Nvidia GeForce 8800 GTX - Saarland Ray-tracer

Sony Playstation3 (partial 3.2 GHz Cell processor running Linux) - IBM iRT

3.2 GHz Cell Processor - IBM iRT

IBM QS20 Blade (Two 3.2 GHz Cell Processors) - IBM iRT  

In fact one Cell processor is four to five times faster at ray-tracing the Stanford Bunny than the G80 and the Cell QS20 blade, which has comparable floating point power on paper, is eight to eleven times faster.  Both the G80 and Cell crush the AMD Opteron at ray-tracing which is arguably the most popular production rendering processor today. It's also interesting to note that secondary rays are less costly on Cell which is where ray-tracing becomes interesting.  Primary ray cast is only interesting from an academic perspective. The real issue is secondary rays and GPUs have traditionally had problems with these do to their incoherent nature. When you factor power into the equation it gets even more interesting, given that Cell is half the size of the G80 and produces five times the ray-tracing performance.  

Things are starting to get interesting and Intel is hot on the trail with their Larrabee part which is said to be designed for ray-tracing.  

Only time will tell….

19 Comments »

  1. Comment by Mak — September 27, 2007 @ 1:04 pm

    Just to be clear, these results are Cell ALONE, not even combined with the RSX. Surely adding the power of the RSX to the results would make the PS3 even better..

  2. Comment by Bob — September 27, 2007 @ 7:08 pm

    WHO CARES??!!? GAMES SELL CONSOLES

  3. Comment by JK — September 27, 2007 @ 8:52 pm

    I think alot of people have the same opinion about articles such as this. Or at least gamers do. Let’s stop reading articles about how powerful the Cell and PS3 are, and start seeing software that actually demonstrates it. Just my opinion, as we rapidly approach the 1 year anniversary of the PS3.

  4. Comment by Barry Minor — September 27, 2007 @ 11:33 pm

    Mak, yes these results are for the Cell chip alone. In fact the PS3 numbers were measured using 3/4 of one cell chip (only 6 SPEs are available under Linux)

  5. Comment by Barry Minor — September 27, 2007 @ 11:51 pm

    Bob and JK, the Cell processor is applicable to many markets outside of the game industry. While these numbers may not seem relevant to the current generation games they are very relevant to today’s digital media customers and tomorrows consoles. Don’t look at your feet look at the horizon.

  6. Comment by No No — September 28, 2007 @ 12:39 am

    who cares, in actual games 8800 owns cell

  7. Comment by Stephen Schober — September 28, 2007 @ 2:50 am

    Ok,
    I would like to point out the the test congfiguration for the AMD platform is not a good one to use when comparing a G80 to the cell. The 2.6Ghz Opteron cannot unleash the full power of a 8800GTX. In my opinion it is a bad comparison because the 800GTX should be linked with a quad core AMD that launches later this year or at the very least 2 dual core opterons not a single core 3 generation old processor. So the test results are extremely biased.
    -Stephen Schober

  8. Comment by Barry Major — September 28, 2007 @ 4:42 am

    Ray tracing sounds interesting, but I’m not sure that the bunny test is really indicative of anything (no ai routines, animation, or any other objects). I’ve read that raytracing is being worked by intel as a means for replacing the gpu with the cpu, because the cpu is better at raytracing math. It is highly unlikely that we’ll see this departure from traditional gpu oriented design within the next 5 years, but it improves overall performance pc, os, and software design could change several years from now.
    I think the 6 spe setup was intended to equate to the PS3, which reserves at least 1 of the spe’s for the OS (not available to games).

  9. Comment by jdklaffke — September 28, 2007 @ 9:32 am

    killzone 2 will demonstrate the power quite nicely this february

  10. Comment by chris — September 28, 2007 @ 10:31 am

    this is a non usable sample due to the fact that linux limits the strangth of the cell

  11. Comment by n1kko — September 28, 2007 @ 12:44 pm

    Barry, this article was posted on a user generated news gaming site www.n4g.com, which hosts an allotment of illiterate “fanboys”.

    I thoroughly enjoy your blog, keep up the great work!

  12. Comment by Barry Minor — September 28, 2007 @ 11:37 pm

    Stephen, This particular test runs completely out of local GPU memory so the host processor isn’t involved. Changing the host clock and/or adding cores wouldn’t change the results.

  13. Comment by Barry Minor — September 28, 2007 @ 11:58 pm

    n1kko, thanks I always enjoy posting real code producing real numbers which helps cut through the usual marketing flimflam. It’s one thing to say you have a lot of flops but showing them being used to solve a problem is always more meaningful.

  14. Comment by Barry Minor — September 29, 2007 @ 12:25 am

    Barry Major, Raytracing scales sub-linearly with scene complexity so adding additional objects doesn’t change things much and SPEs can be reserved or repurposed (with a simple DMA) for other tasks like ai, physics, etc. CPUs are becoming more specialized (Cell) and GPUs are becoming more general (G80) which gives programmers the performance and flexibility to explore visually superior software rendering techniques like raytracing.

  15. Comment by michael — October 1, 2007 @ 1:58 am

    I don’t understand why people continue to pursue the topic of cell = ps3. This test is a bullet through those fantastical debates on which processor is better, just because one processor is on said console. processors are mainly used to move data. the cell proves that it can easily handle information and calculations with few errors.

  16. Comment by Greg Ruthenbeck — December 20, 2007 @ 8:11 pm

    The Cell clearly has a lot of potential. It’s great to see an actual example where a direct performance comparison has been attempted. I only say ‘attempted’ because any implementation will need to differ substantially in order to run on the different processors. Has the Cell code just been optimised better than the GeForce code?

    Even if better optimisations are the only reason why Cell has higher performance in this test, I get the impression that that is a strength of the Cell. It’s relatively architecture makes it possible to write code that uses nearly every cycle (rather than requiring complex drivers which introduce memory latencies which can be hard to optimise out of more complex architectures).

  17. Comment by Horus — February 8, 2008 @ 5:18 pm

    Hello mr. Minor,

    I have a question about ps3; from what I understand cell processor is extremely powerful and it is what gives the ps3 most of its power.256mb of ddr3 ram for gpu ,however, sounds extremely limited for such machine, can cell use its 256mb xdr memory for texturing and overcome this or does the ps3 has other tricks for overcoming this?Even if rsx could use 256mb memory of cell for textures, 512 would also be limited for such powerful machine(unless xdr memory can do much higher resolution textures than ddr3 memory).

    To sum it up would 256mb ddr3 and 256md xdr ram be enough to produce best graphics ever seen in games which this console was designed for?

    I am asking this here because I couldnt find an answer for this on web and I got the idea that cell can do textures from what I read on this site.

    Thanks

  18. Comment by Barry Minor — April 1, 2008 @ 4:37 pm

    More is always better but $$ also factor into the equation. The Cell systems I work with, IBM Cell blades, have GBs of memory attached to each processor so I don’t have to deal with such limitations. ;-)

    From a PS3 perspective techniques such as procedurally generated content, textures and geometry, can greatly reduce the games memory consumption.

  19. Pingback by SimBioSys Blog » Blog Archive » The fast and the furious: compare Cell/B.E., GPU and FPGA — May 3, 2008 @ 10:08 am

    […] falls far short of that, while the Cell/B.E. has demonstrated benchmark results (e.g. for real-time ray tracing application) far superior to that of the G80 GPU despite the theoretical throughput being lower than the […]

RSS feed for comments on this post. TrackBack URI

Leave a comment

Check Spelling
Activate Spell Check while Typing
The postings on this site solely reflect the personal views of the authors and do not necessarily represent the views, positions, strategies or opinions of IBM or IBM management.

GT design based on the Identification theme for Wordpress by neuro.