No more mysteries: Apple's G5 versus x86, Mac OS X versus Linux

Name: No more mysteries: Apple's G5 versus x86, Mac OS X versus Linux
Item: No more mysteries: Apple's G5 versus x86, Mac OS X versus Linux
Author: Johan De Gelas

by Johan De Gelas on June 3, 2005 7:48 AM EST

Posted in
Mac

116 Comments | Add A Comment

116 Comments

Mac OS X versus Linux

Lmbench 2.04 provides a suite of micro benchmarks that measure the bottlenecks at the Unix operating system and CPU level. This makes it very suitable for testing the theory that Mac OS X might be the culprit for the terrible server performance of the Apple platform.

Signals allow processes (and thus threads) to interrupt other processes. In a database system such as MySQL 4.x where so many processes/threads (60 in our MySQL screenshot) and many accesses to the kernel must be managed, signal handling is a critical performance factor.

Larry McVoy (SGI) and Carl Staelin (HP):

" Lmbench measure both signal installation and signal dispatching in two separate loops, within the context of one process. It measures signal handling by installing a signal handler and then repeatedly sending itself the signal."

Host	OS	Mhz	null	null call	open I/O	stat	slct clos	sig TCP	sig inst
Xeon 3.06 GHz	Linux 2.4	3056	0.42	0.63	4.47	5.58	18.2	0.68	2.33
G5 2.7 GHz	Darwin 8.1	2700	1.13	1.91	4.64	8.60	21.9	1.67	6.20
Xeon 3.6 GHz	Linux 2.6	3585	0.19	0.25	2.30	2.88	9.00	0.28	2.70
Opteron 850	Linux 2.6	2404	0.08	0.17	2.11	2.69	12.4	0.17	1.14

All numbers are expressed in microseconds, lower is thus better. First of all, you can see that kernel 2.6 is in most cases a lot more efficient. Secondly, although this is not the most accurate benchmark, the message is clear: the foundation of Mac OS X server, Darwin handles the signals the slowest. In some cases, Darwin is even several times slower.

As we increase the level of concurrency in our database test, many threads must be created. The Unix process/thread creation is called "forking" as a copy of the calling process is made.

lmbench "fork" measures simple process creation by creating a process and immediately exiting the child process. The parent process waits for the child process to exit. The benchmark is intended to measure the overhead for creating a new thread of control, so it includes the fork and the exit time.

lmbench "exec" measures the time to create a completely new process, while " sh" measures to start a new process and run a little program via /bin/ sh (complicated new process creation).

Host	OS	Mhz	fork hndl	exec proc	Sh proc
Xeon 3.06 GHz	Linux	3056	163	544	3021
G5 2.7 GHz	Darwin	2700	659	2308	4960
Xeon 3.6 GHz	Linux	3585	158	467	2688
Opteron 850	Linux	2404	125	471	2393

Mac OS X is incredibly slow, between 2 and 5(!) times slower, in creating new threads, as it doesn't use kernel threads, and has to go through extra layers (wrappers). No need to continue our search: the G5 might not be the fastest integer CPU on earth - its database performance is completely crippled by an asthmatic operating system that needs up to 5 times more time to handle and create threads.

Mac OS X: beautiful but… Workstation, yes; Server, no.

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

116 Comments

View All Comments

mongo lloyd - Tuesday, June 7, 2005 - link
At least the non-ECC RAM, that is.
mongo lloyd - Tuesday, June 7, 2005 - link
Any reason for why you weren't using RAM with lower timings on the x86 processors? Shouldn't there at least have been a disclaimer?
jhagman - Tuesday, June 7, 2005 - link
OK, this clears it up, thanks.

One little thing still, what is the number you are giving in the ab results table? Is it requests per second or perhaps the transfer rate?
demuynckr - Tuesday, June 7, 2005 - link
jhagman:
As i mentioned before, we used gcc 3.3.3 for all linux, and gcc 3.3 mac compiler on apple, because that was the standard one.
I did a second flops test with the gcc 4.0 compiler included on the Tiger cd, and the flops are much better when compiled with the -mcpu=g5 option which did not seem available when using the gcc 3.3 Apple compiler.
As for ab i used these settings,
ab -n 100000 -n x http://localhost/

x for the various concurrencies: 5,20,50,100,150.
spinportal - Monday, June 6, 2005 - link
Guess there's no one arguing that the PPC is not keeping its paces with the current market, but rather OS/X able to do Big Iron computing. And if rumors be true, where will you be able to get a PPC built once Apple drops IBM for Intel?
In a Usenet debate in 93, Torvalds and Tannenbaum go roasting Mach microkernel vs. the death of Linux. Seems Linus' work will be seeing more light of day, and Mach go the way of the dodo. Will Apple rewrite OS/X for Intel x86/64? As far as practical business sense, that's like shooting off one's leg foot.
spinportal - Monday, June 6, 2005 - link
jhagman - Monday, June 6, 2005 - link
Could you please give the exact method of testing apache with ab? It is really hard to try to redo the tests when one does not know which methodology was used. The amount of clients and switches of ab would be appreciated.

Also an answer to why Apple's newest gcc (4.0) was not used would be an interesting one and did you _really_ use gcc 3.3.3 and not Apple's gcc?

Other than these omissions I found the article very interesting, thanks.
demuynckr - Monday, June 6, 2005 - link
Yes I have read the article, I also personally compiled the microbenchmarks on linux as well as on the PPC, and I can tell you I used gcc 3.3 on Mac for all compilation needs :).
webflits - Monday, June 6, 2005 - link
demuynckr, did your read the article?

"So, before we start with application benchmarks, we performed a few micro benchmarks compiled on all platforms with the SAME gcc 3.3.3 compiler. "

BTW I ran the same tests using Apple's version of gcc 3.3
As you can see my 2.0Ghz now beats the 2.5Ghz on 5 of the 8 tests, and a 2.7Ghz G5 would be on par with the Opteron 250 when you extrapolate the results.

Lets face it, Anandtech screwed up by using a crippled compiler for the G5 tests

----------------------------
GCC 3.3/OSX 10.4.1/2.0GHz G5

FLOPS C Program (Double Precision), V2.0 18 Dec 1992

Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0140 997.2971
2 -1.4166e-13 0.0108 648.4622
3 4.7184e-14 0.0089 1918.5122
4 -1.2546e-13 0.0139 1076.8597
5 -1.3800e-13 0.0312 928.9079
6 3.2374e-13 0.0182 1596.1407
7 -8.4583e-11 0.0348 344.3954
8 3.4855e-13 0.0196 1527.6638

Iterations = 512000000
NullTime (usec) = 0.0004
MFLOPS(1) = 827.5658
MFLOPS(2) = 673.7847
MFLOPS(3) = 1037.6825
MFLOPS(4) = 1501.7226
demuynckr - Monday, June 6, 2005 - link
Just to clear things up: on linux the gcc 3.3.3 was used, on macintosh gcc 3.3 was used (the one that was included with the OS).

No more mysteries: Apple's G5 versus x86, Mac OS X versus Linux

Mac OS X versus Linux

Post Your Comment

116 Comments

View All Comments

mongo lloyd - Tuesday, June 7, 2005 - link

mongo lloyd - Tuesday, June 7, 2005 - link

jhagman - Tuesday, June 7, 2005 - link

demuynckr - Tuesday, June 7, 2005 - link

spinportal - Monday, June 6, 2005 - link

spinportal - Monday, June 6, 2005 - link

jhagman - Monday, June 6, 2005 - link

demuynckr - Monday, June 6, 2005 - link

webflits - Monday, June 6, 2005 - link

demuynckr - Monday, June 6, 2005 - link

Log in

Don't have an account? Sign up now