Mac OS X Achilles Heel
It is clear that profiling MySQL on the kernel is the only way that we are going to be able to pin-point why exactly MySQL is so slow on Mac OS X. So, why did I state that I believe the threading engine in Mac OS X to be rather slow? Well, I admit that I should have made it more clear in the article that I didn't have rock-solid evidence. However, my suspicion is based on more than speculation.First of all, notice that the Mac OS X performance is decent with a concurrency of one, or one simulated user. It still performs well when a second user is simulated, as the second CPU can kick in and push performance higher. Let us check the scaling, by putting the numbers of our MySQL graphic into a table.
Concurrency | Dual G5 2,5 GHz Tiger | Scaling (Concurrency one=100%) | Dual G5 2,5 GHz Linux 2.6 | Scaling (Concurrency one=100%) | Dual Opteron 2.4Ghz | Scaling (Concurrency one=100%) |
1 | 192 | 100% | 228 | 100% | 290 | 100% |
2 | 274 | 143% | 349 | 153% | 438 | 151% |
5 | 113 | 59% | 411 | 180% | 543 | 187% |
10 | 62 | 32% | 443 | 194% | 629 | 217% |
20 | 50 | 26% | 439 | 193% | 670 | 231% |
35 | 50 | 26% | 422 | 185% | 650 | 224% |
50 | 47 | 25% | 414 | 182% | 669 | 230% |
The performance at concurrency 1 and 2 is mediocre, but not really bad. Notice that the scaling of Mac OS X from one to two is not fantastic, but is almost as good as the Linux machines. Once we worked with 5 concurrent users, however, performance collapses on Mac OS X: we get only 60% of the performance at concurrency one. With Linux, both CPUs are not stressed at a concurrency of two, and increasing the load makes the CPUs work harder.
The G5 (Linux) achieves its peak quicker as it is a bit slower in this integer intensive task than the Opteron. However, it is important to remark that while performance begins to decline very slowly as we increase the number of users, there is no collapse! At a concurrency of 50, we still have 80% more performance than at a concurrency of one, showing that Linux handles the extra load of the extra threads very well. On Mac OS X, performance has plummeted to one quarter of our initial performance, showing that the threads are creating an additional overhead somehow.
Secondly, it is a fact that our benchmark is not disk limited. In that case, it is well documented that MySQL performance depends on the threading performance of the OS. A few examples:
MySQL Reference Manual for version 5.0.3-alpha:More:
"MySQL is very dependent on the thread package used. So when choosing a good platform for MySQL, the thread package is very important"
"The capability of the kernel and the thread library to run many threads that acquire and release a mutex over a short critical region frequently without excessive context switches. If the implementation of pthread_mutex_lock() is too anxious to yield CPU time, this will hurt MySQL tremendously. If this issue is not taken care of, adding extra CPUs will actually make MySQL slower"Darwin (6.x and older) used to be quite a bit slower when it came to context switches, but our own LMBench testing shows that the latest Darwin 8.0 performs context switches just as/nearly as fast as Linux kernel 2.6. So, a possible explanation might be that more context switches happen, but we still have to find a method to measure this. Suggestions are welcome....
From the MySQL site:
"As a multithreaded server, MySQL is most efficient on an operating system that has a well implemented threading system"Thirdly, we have the Lmbench benchmarks, which are not conclusive, but point in the same direction. Even the high latency for the TCP measurements (see above) on Mac OS X might indicate relatively poor threading performance. MySQL has a TCP/IP connection thread, which handles all connection requests. This thread creates a new dedicated thread to handle the authentication and SQL query processing for each connection.
The split funnel suspect
The last suspect is the locking system. In Panther, only two threads could lock into the kernel to execute code of the kernel. One thread could lock into the networking part, while the other into the rest of the kernel services.In Tiger, the locking is finer. Although Apple's documents indicate that it is still rather coarse grained, it is clear that more than two locks into the kernel can exist at the same time. In the case of MySQL, this should be a very important improvement, but we didn't see any improvement at all when performing the tests on both Panther and Tiger. This is speculation, but based on our data, we are tempted to hypothesize that the new locking system isn't really working right now, and that Tiger continues to behave like Panther.
Does it affect you?
What does this all mean? Whether or not you skipped the technical part, you probably want to know how it affects your Apple (server) experience.It is clear that if you plan to run MySQL on Apple hardware, it is better to install YDL Linux than to use OS X. If you need excellent read performance, the maximum performance of your server will be up to 8 times better. If your server is only going to serve a limited number of users, YDL Linux will allow you to run with a less expensive system.
If the usage pattern of your server is more OLTP, Transaction processing oriented, we give you the same advice. Our quick tests with InnoDB show the same kind of behavior and we have noticed very slow file system performance. At this point, we do not have enough data to be conclusive. We noticed, for example, that importing data in our database (via the ">" command) took up to 8 times longer.
47 Comments
View All Comments
Lori - Friday, September 2, 2005 - link
http://en.wikipedia.org/wiki/Microkernel">http://en.wikipedia.org/wiki/MicrokernelMacOS X uses a modified microkernel (a monolithic / microkernel hybrid). The idea was to cut down IPC costs by putting servers that would be IPC heavy directly into the kernel. However, there has recently been a lot of work in the microkernel world to reduce this IPC cost and bring its speed near that of a monolithic kernel.
L4Ka::Pistachio is an example of this:
http://www.l4ka.org/">http://www.l4ka.org/
leviat - Thursday, September 1, 2005 - link
If the problem is indeed in the thread creation portion of the OS, it would be interesting to see how a single threaded webserver fairs. I would love to see a benchmark test of Lighttpd (www.lighttpd.org) to see a comparison of how that runs on Darwin vs linux-ppc.Another interesting test would be to see MySQL can be configured to precreate the handler threads. This might allow us to see how it handles the context-switching between the multiple threads and allow for it to compete.
Anyways, great article!
JohanAnandtech - Friday, September 2, 2005 - link
What exactly to do you mean by single threaded? Because Apache 1.3 works with processes, and is thus single-threaded per user.MySQL can make use of a Thread cache, we played with it but it didn't give any substantial boost. I don't see how the software would be able to precreate all threads as it has close down and open connections. If you got some insight, please share :-).
Context switching is quite fast on the G5 OS X, give or take a few percentages compared to Linux x86 or G5 Linux, as we tested with lmbench.
Lori - Friday, September 2, 2005 - link
Actually there are more than one way to handle multiple connections in a server application.To give you some examples...
1. Multi process
2. Multi thread
3. Some hybrid of the two
You can see combinations of these types all provided by Apache 2's MPMs. (perchild, prefork, threadpool, worker, leader.. etc)
4. Asynchronus multiplexing.
Your program becomes its own schedular. You can do all your processing within a single thread. Also read up on non blocking i/o. I am actually surprised apache does not have a MPM to handle this type of connection multiplexing but I also read its harder to get OS support.
Letsee... links... umm... ahh...:
http://www.kegel.com/c10k.html">http://www.kegel.com/c10k.html
Avalon - Thursday, September 1, 2005 - link
Seems like once you remove the G5 from OSX, it's a very capable chip.jamawass - Thursday, September 1, 2005 - link
Great article, in response to the previous post Anand has posted tons of server articles on x86 systems so Apple is fair game here. Secondly Apple servers are based on OSX in the market, corporations want to know the real world performance not the desktop feel. Also Johan's speculation on Apple's move to Intel raises some troubling questions for Apple execs.karlreading - Thursday, September 1, 2005 - link
a lot of people commenting on how apple have mad a wrong dicision turning to intel.possibly, but IMHO, and, if im not mistaken, didnt the opteron dominate all the tests.
so in my mind whilst its true for people to doubt apple for going intel, x86 on the whole is still a very viable option if you go the AMD route.
yes i know people will say AMD dont hae the capacity, but amd powered macs should be how x86 macs are done.
karlos
karlreading - Thursday, September 1, 2005 - link
also worth noting is that they say the FP poerformance is as good as the fastest x86 chip. well scuse me, but isnt that a 2.7ghz g5 part there testing there? thats the fastest g5 currently avalible isnt it? well then why not test the opteron 254. thats the fastest x86 chip, running 2.8ghz, rather than the 850/250 2.4ghz part tested? that would put some lead against the g5 and also, 2.8ghz is a lot closer than 2.4ghz is to the 2.7ghz g5's core speed. if were trying to be fair.if we was being really picky we would be stating duakl core opteron as the fastest x86, but i digress....
karlos
JohanAnandtech - Friday, September 2, 2005 - link
You are right about the recentely introduced 2.8 GHz Opteron. Well, to be really accurate, at the time of the introduction of the 2.7 GHz G5, a 2.6 Ghz opteron was available.Anyway, It was not my intention to be "accurate", it was more a general impression. Give or take a few percent, the G5 can compete FP wise :-).
Pannenkoek - Thursday, September 1, 2005 - link
It's a matter of scalability, SMP support and not so much of how fast some system calls are executed as the reason for the bad performance I would think. Linux is the most used OS for superclusters these days, Mac OS remains a desktop OS. It's no wonder that it performs poorly as a serious server on a multiprocessor/core system. It would have been interesting to see how Windows would have faired (on the x86 of course), if we are testing OSes in this way.However, MySQL benchmarks say little about desktop performance, Anandtech's audience consists of desktop users and the reason people love or hate Mac OS is its desktop. Nevertheless, almost a great article. It should have been if the autor could have resisted the temptation of too much speculation, instead of honest benchmark numbers.