No more mysteries: Apple's G5 versus x86, Mac OS X versus Linux
by Johan De Gelas on June 3, 2005 7:48 AM EST- Posted in
- Mac
Workstation, yes; Server, no.
The G5 is a gigantic improvement over the previous CPU in the PowerMac, the G4e. The G5 is one of the most superscalar CPUs ever, and has all the characteristics that could give Apple the edge, especially now that the clock speed race between AMD and Intel is over. However, there is still a lot of work to be done.First of all, the G5 needs a lower latency access to the memory because right now, the integer performance of the G5 leaves a lot to be desired. The Opteron and Xeon have a better integer engine, and especially the Pentium 4/Xeon has a better Branch predictor too. The Opteron's memory subsystem runs circles around the G5's.
Secondly, it is clear that the G5 FP performance, despite its access to 32 architectural registers, needs good optimisation. Only one of our flops tests was " Altivectorized", which means that the GCC compiler needs to improve quite a bit before it can turn those many open source programs into super fast applications on the Mac. In contrast, the Intel compiler can vectorize all 8 tests.
Altivec or the velocity engine can make the G5 shine in workstation applications. A good example is Lightwave where the G5 takes on the best x86 competition in some situations, and remains behind in others.
The future looks promising in the workstation market for Apple, as the G5 has a lot of unused potential and the increasing market share of the Power Mac should tempt developers to put a little more effort in Mac optimisation.
The server performance of the Apple platform is, however, catastrophic. When we asked Apple for a reaction, they told us that some database vendors, Sybase and Oracle, have found a way around the threading problems. We'll try Sybase later, but frankly, we are very sceptical. The whole "multi-threaded Mach microkernel trapped inside a monolithic FreeBSD cocoon with several threading wrappers and coarse-grained threading access to the kernel", with a "backwards compatibility" millstone around its neck sounds like a bad fusion recipe for performance.
Workstation apps will hardly mind, but the performance of server applications depends greatly on the threading, signalling and locking engine. I am no operating system expert, but with the data that we have today, I think that a PowerPC optimised Linux such as Yellow Dog is a better idea for the Xserve than Mac OS X server.
References
Threading on OS Xhttp://developer.apple.com/technotes/tn/tn2028.html
Basics OS X
http://developer.apple.com/documentation/macosx/index.html
116 Comments
View All Comments
mongo lloyd - Tuesday, June 7, 2005 - link
At least the non-ECC RAM, that is.mongo lloyd - Tuesday, June 7, 2005 - link
Any reason for why you weren't using RAM with lower timings on the x86 processors? Shouldn't there at least have been a disclaimer?jhagman - Tuesday, June 7, 2005 - link
OK, this clears it up, thanks.One little thing still, what is the number you are giving in the ab results table? Is it requests per second or perhaps the transfer rate?
demuynckr - Tuesday, June 7, 2005 - link
jhagman:As i mentioned before, we used gcc 3.3.3 for all linux, and gcc 3.3 mac compiler on apple, because that was the standard one.
I did a second flops test with the gcc 4.0 compiler included on the Tiger cd, and the flops are much better when compiled with the -mcpu=g5 option which did not seem available when using the gcc 3.3 Apple compiler.
As for ab i used these settings,
ab -n 100000 -n x http://localhost/
x for the various concurrencies: 5,20,50,100,150.
spinportal - Monday, June 6, 2005 - link
Guess there's no one arguing that the PPC is not keeping its paces with the current market, but rather OS/X able to do Big Iron computing. And if rumors be true, where will you be able to get a PPC built once Apple drops IBM for Intel?In a Usenet debate in 93, Torvalds and Tannenbaum go roasting Mach microkernel vs. the death of Linux. Seems Linus' work will be seeing more light of day, and Mach go the way of the dodo. Will Apple rewrite OS/X for Intel x86/64? As far as practical business sense, that's like shooting off one's leg foot.
spinportal - Monday, June 6, 2005 - link
jhagman - Monday, June 6, 2005 - link
Could you please give the exact method of testing apache with ab? It is really hard to try to redo the tests when one does not know which methodology was used. The amount of clients and switches of ab would be appreciated.Also an answer to why Apple's newest gcc (4.0) was not used would be an interesting one and did you _really_ use gcc 3.3.3 and not Apple's gcc?
Other than these omissions I found the article very interesting, thanks.
demuynckr - Monday, June 6, 2005 - link
Yes I have read the article, I also personally compiled the microbenchmarks on linux as well as on the PPC, and I can tell you I used gcc 3.3 on Mac for all compilation needs :).webflits - Monday, June 6, 2005 - link
demuynckr, did your read the article?"So, before we start with application benchmarks, we performed a few micro benchmarks compiled on all platforms with the SAME gcc 3.3.3 compiler. "
BTW I ran the same tests using Apple's version of gcc 3.3
As you can see my 2.0Ghz now beats the 2.5Ghz on 5 of the 8 tests, and a 2.7Ghz G5 would be on par with the Opteron 250 when you extrapolate the results.
Lets face it, Anandtech screwed up by using a crippled compiler for the G5 tests
----------------------------
GCC 3.3/OSX 10.4.1/2.0GHz G5
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0140 997.2971
2 -1.4166e-13 0.0108 648.4622
3 4.7184e-14 0.0089 1918.5122
4 -1.2546e-13 0.0139 1076.8597
5 -1.3800e-13 0.0312 928.9079
6 3.2374e-13 0.0182 1596.1407
7 -8.4583e-11 0.0348 344.3954
8 3.4855e-13 0.0196 1527.6638
Iterations = 512000000
NullTime (usec) = 0.0004
MFLOPS(1) = 827.5658
MFLOPS(2) = 673.7847
MFLOPS(3) = 1037.6825
MFLOPS(4) = 1501.7226
demuynckr - Monday, June 6, 2005 - link
Just to clear things up: on linux the gcc 3.3.3 was used, on macintosh gcc 3.3 was used (the one that was included with the OS).