Cell vs G80
I recently ran across an interesting paper, Stackless KD-Tree Traversal for High Performance GPU Ray Tracing, which documented the strides made by GPU based ray-tracing over the last decade and introduced a new way of mapping acceleration structure traversal to modern GPUs, namely Nvidia's new G80. The paper was authored by Philipp Slusallek's talented computer graphics group at Saarland University in Germany. Our own Cell iRT ray-tracer was based on papers written by Philipp's students so we have great respect for their work. It was interesting to see the great lengths researchers are willing to go through in order to harvest a fraction of the floating point potential locked away in these black boxes.
From 10,000 feet here's how the Cell processor stacks up to Nvidia's new G80 GPU:
Both parts are compared at 90-nanometre.
As you can see the G80 is twice as big, which is a good indication it requires twice the power, and produces twice the floating point power on paper. However when we ran one of the benchmarks discussed in the paper, the Stanford Bunny, we found that the Cell processor when combined with the iRT produces significantly better performance (we don't have access to the other datasets listed in the paper):
Left to Right:
2.6 GHz AMD Opteron - Saarland Ray-tracer
Nvidia GeForce 8800 GTX - Saarland Ray-tracer
Sony Playstation3 (partial 3.2 GHz Cell processor running Linux) - IBM iRT
3.2 GHz Cell Processor - IBM iRT
IBM QS20 Blade (Two 3.2 GHz Cell Processors) - IBM iRT
In fact one Cell processor is four to five times faster at ray-tracing the Stanford Bunny than the G80 and the Cell QS20 blade, which has comparable floating point power on paper, is eight to eleven times faster. Both the G80 and Cell crush the AMD Opteron at ray-tracing which is arguably the most popular production rendering processor today. It's also interesting to note that secondary rays are less costly on Cell which is where ray-tracing becomes interesting. Primary ray cast is only interesting from an academic perspective. The real issue is secondary rays and GPUs have traditionally had problems with these do to their incoherent nature. When you factor power into the equation it gets even more interesting, given that Cell is half the size of the G80 and produces five times the ray-tracing performance.
Things are starting to get interesting and Intel is hot on the trail with their Larrabee part which is said to be designed for ray-tracing.
Only time will tell….




Just to be clear, these results are Cell ALONE, not even combined with the RSX. Surely adding the power of the RSX to the results would make the PS3 even better..
WHO CARES??!!? GAMES SELL CONSOLES
I think alot of people have the same opinion about articles such as this. Or at least gamers do. Let’s stop reading articles about how powerful the Cell and PS3 are, and start seeing software that actually demonstrates it. Just my opinion, as we rapidly approach the 1 year anniversary of the PS3.
Mak, yes these results are for the Cell chip alone. In fact the PS3 numbers were measured using 3/4 of one cell chip (only 6 SPEs are available under Linux)
Bob and JK, the Cell processor is applicable to many markets outside of the game industry. While these numbers may not seem relevant to the current generation games they are very relevant to today’s digital media customers and tomorrows consoles. Don’t look at your feet look at the horizon.
who cares, in actual games 8800 owns cell
Ok,
I would like to point out the the test congfiguration for the AMD platform is not a good one to use when comparing a G80 to the cell. The 2.6Ghz Opteron cannot unleash the full power of a 8800GTX. In my opinion it is a bad comparison because the 800GTX should be linked with a quad core AMD that launches later this year or at the very least 2 dual core opterons not a single core 3 generation old processor. So the test results are extremely biased.
-Stephen Schober
Ray tracing sounds interesting, but I’m not sure that the bunny test is really indicative of anything (no ai routines, animation, or any other objects). I’ve read that raytracing is being worked by intel as a means for replacing the gpu with the cpu, because the cpu is better at raytracing math. It is highly unlikely that we’ll see this departure from traditional gpu oriented design within the next 5 years, but it improves overall performance pc, os, and software design could change several years from now.
I think the 6 spe setup was intended to equate to the PS3, which reserves at least 1 of the spe’s for the OS (not available to games).
killzone 2 will demonstrate the power quite nicely this february
this is a non usable sample due to the fact that linux limits the strangth of the cell
Barry, this article was posted on a user generated news gaming site www.n4g.com, which hosts an allotment of illiterate “fanboys”.
I thoroughly enjoy your blog, keep up the great work!
Stephen, This particular test runs completely out of local GPU memory so the host processor isn’t involved. Changing the host clock and/or adding cores wouldn’t change the results.
n1kko, thanks I always enjoy posting real code producing real numbers which helps cut through the usual marketing flimflam. It’s one thing to say you have a lot of flops but showing them being used to solve a problem is always more meaningful.
Barry Major, Raytracing scales sub-linearly with scene complexity so adding additional objects doesn’t change things much and SPEs can be reserved or repurposed (with a simple DMA) for other tasks like ai, physics, etc. CPUs are becoming more specialized (Cell) and GPUs are becoming more general (G80) which gives programmers the performance and flexibility to explore visually superior software rendering techniques like raytracing.
I don’t understand why people continue to pursue the topic of cell = ps3. This test is a bullet through those fantastical debates on which processor is better, just because one processor is on said console. processors are mainly used to move data. the cell proves that it can easily handle information and calculations with few errors.
The Cell clearly has a lot of potential. It’s great to see an actual example where a direct performance comparison has been attempted. I only say ‘attempted’ because any implementation will need to differ substantially in order to run on the different processors. Has the Cell code just been optimised better than the GeForce code?
Even if better optimisations are the only reason why Cell has higher performance in this test, I get the impression that that is a strength of the Cell. It’s relatively architecture makes it possible to write code that uses nearly every cycle (rather than requiring complex drivers which introduce memory latencies which can be hard to optimise out of more complex architectures).
Hello mr. Minor,
I have a question about ps3; from what I understand cell processor is extremely powerful and it is what gives the ps3 most of its power.256mb of ddr3 ram for gpu ,however, sounds extremely limited for such machine, can cell use its 256mb xdr memory for texturing and overcome this or does the ps3 has other tricks for overcoming this?Even if rsx could use 256mb memory of cell for textures, 512 would also be limited for such powerful machine(unless xdr memory can do much higher resolution textures than ddr3 memory).
To sum it up would 256mb ddr3 and 256md xdr ram be enough to produce best graphics ever seen in games which this console was designed for?
I am asking this here because I couldnt find an answer for this on web and I got the idea that cell can do textures from what I read on this site.
Thanks
More is always better but $$ also factor into the equation. The Cell systems I work with, IBM Cell blades, have GBs of memory attached to each processor so I don’t have to deal with such limitations.
From a PS3 perspective techniques such as procedurally generated content, textures and geometry, can greatly reduce the games memory consumption.
[…] falls far short of that, while the Cell/B.E. has demonstrated benchmark results (e.g. for real-time ray tracing application) far superior to that of the G80 GPU despite the theoretical throughput being lower than the […]