MySQL Bugs: #13578: The longer mysqld runs, the slower everything is

Bug #13578	The longer mysqld runs, the slower everything is
Submitted:	28 Sep 2005 17:10	Modified:	4 Sep 2006 7:08
Reporter:	Shawn Hogan	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server	Severity:	S2 (Serious)
Version:	5.0.21, 4.1.18	OS:	MacOS (Mac OS X Server 10.4.4)
Assigned to:		CPU Architecture:	Any

Description:
This has been an ongoing issue that a few different people have reported:

http://forums.mysql.com/read.php?24,43796,43796#msg-43796
http://forums.mysql.com/read.php?24,39066,43474#msg-43474

In my case, mysqld is the ONLY process/service running (besides the core OS stuff). Basically the longer the mysqld process runs, the slower it is (with no overall change in queries or qps). CPU usage slowly rises as well (the attached image shows almost 4 days of CPU usage on the server [red arrows are when the mysqld process was restarted, and at no point was the server itself restarted]). Immediately following the restart of the mysqld process, the CPU load drops to practically nothing and query times are back to being smoking fast.

I've set my long_query_time to 5, and at this point I can ALWAYS get to 2M total queries (which takes 5-6 hours in my case) before it starts logging slow queries. At that point all different types of queries are logged as "slow", even the simplest select of a single row on a small (properly indexed) table. The longer the mysqld process, the slower it gets. If I were to let it run for a 5 days or so, a simple query that takes 0.01 seconds normally will take more than 60 seconds.

There is no disk/virtual memory thrashing going on with the server at any point of mysqld being slow.

I've been through a couple MySQL consultants who were have confirmed it does appear to be some sort of issue with mysqld itself.

I'm going to attach the output of variables and status before immediately before a mysqld process restart (some of the values at that time were absurdly high when I was trying to throw memory at things when I was troubleshooting it).

How to repeat:
Run MySQL on Mac OS X.

Suggested fix:
Restart the mysqld process after it runs ~4M total queries.

4 days of CPU usage (arrows show where mysqld process was restarted)

Attachment: mysql_cpu_usage.gif (image/gif, text), 72.62 KiB.

Output of SHOW STATUS and SHOW VARIABLES immediately before a needed restart

Attachment: mysql_probs.txt (text/plain), 29.08 KiB.

Also, more than happy to give any MySQL developer a shell account to the server for anything.

Thank you for a problem report. I've read the forum threads you mentioned. It was interesting. Did you noted that at the second one a real problem (cartesian join) was pointed out?

So, please, give us more information on how to repeat this situation. The results of SHOW PROCESSLIST commands (may be, several of them) just when you have this hang and just after the restart may help, for the beginning.

You may ask developers to login remotely and try to solve you problem, but this is an option of our Gold support, as far as I remember.

It's not so much a hang of the copying to a tmp table (although that happens too).  Rather every query and every query type get slower as time goes on (for example, normally it takes 0.01 seconds to truncate a HEAP table.  If the server runs for 24 hours, it can take 10 seconds, and just gets slower the longer it runs).  For example, I can take the mysql server off the network (no mysql clients making any queries) after it's been running for awhile and every type of query is slow with that query being the only query going on.  The only way to solve it is to restart the mysqld process (not the server).

But I will attach a couple odd looking processlists I've randomly run across trying solve this issue.

Shows almost every mysqld thread hung on opening/closing files

Attachment: processlist1.txt (text/plain), 154.10 KiB.

All query threads with a NULL state

Attachment: processlist2.txt (text/plain), 148.22 KiB.

No state on any query thread except for one that's doing an INSERT into a tiny HEAP table

Attachment: processlist3.txt (text/plain), 23.46 KiB.

I was able to pinpoint the problem (at least partially).  It's the query_cache.  My query cache was set to 256MB, and I noticed that all databases would hang whenever the query cache needed to be purged of a certain table's data (on an update of a table).  With 256MB allocated for the query cache, sometimes the garbage collection of the query cache would be as "fast" as 10 seconds (see proccesslist3.txt) and other times it would take more than 60 seconds (see processlist1.txt).

The smaller I set the query_cache_size, the better the overall performance (currently I have it running at 2MB).  Performance still isn't where it is when mysqld is first started, but at least performance is good enough that it doesn't HAVE to be restarted.

I'm thinking of dropping the key_buffer_size from 256MB to 2MB also and see if I get the rest of the missing performance.

Either way, at the very least it definitely seems like a problem with the query_cache (when queries are flushed out) on Mac OS X Server.

Thank you for additional information. Large query_cache will not help you in case of really defferent queries or queries with some functions like I saw in the processlists you posted. See http://dev.mysql.com/doc/mysql/en/query-cache-how.html for details. So, you may even get better performance switching it off at all. (It takes time to manage it and to look it through, just to not found anything useful. Alhtough, your query chache related status variables does not looked to bad for me...) Please, try and inform about the results.

key_buffersize is a different story, so I do not think it should be decreased.

I changed severity, because your server can really work without restarts now.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

I ended up setting the server up to text message my cell phone whenever mysqld starts spinning out of control (which these days is about 36 hours between mysqld restarts).

After watching it closely for the last month, I think I've finally figure out what the cause is.  As time goes on, mysqld's memory get so segmented that queries take longer and longer as it has to start accessing thousands of different memory segments for each query.

This is what top is showing currently for the process (restarted <5 hours ago):

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
25879 mysqld      16.5% 80:45.94  >>   329  1097   202M+ 2.33M   195M+  376M+

So there are more than 100 threads (not sure how many), 329 Mach ports and 1,097 different memory segments.  When I had to restart mysqld early this morning, there were more than 100 threads (not sure how to see how many when it goes over the allowed columns), more than 1,000 Mach ports and 9,600 memory segments (this was after about 20 hours of running).

The machine is a fully loaded top of the line (as of this writing) Xserve with 5GB of RAM and no other processes running (beyond normal system processes).  It's not virtual memory/disk thrashing or anything (mysqld is using ~200MB) and the machine is showing more than 4GB of inactive physical memory.

If this is something that can't be fixed (because it's a core operating system issue), it would be good to know, that way we can stop using Mac OS X as a MySQL server (it's terribly impractical to have to restart your db server daemon every 24-48 hours.

As it may be an OS issue, can you, please, try to upgrade your server to Tiger (Mac OS X 10.4.3)? Also, please, try 4.1.16 available for some time already.

I've upgraded it to OS X Server 10.4.4 and MySQL 4.1.16, and I'm still seeing the same problem.  For the time being I've written a script that checks the slow queries every 15 minutes and restarts the mysqld process once it's more than 500 slow queries (with slow query time of 5 seconds).  I never get a single slow query for the first 3-4 hours of mysqld running, then it starts snowballing and gets slower and slower (the auto-restart usually happens around 24 hours of uptime).

Please, try to use newer version, 4.1.18 (4.1.16 had some problem with UPDATE and DELETE optimization that can lead to wastage of resources) and inform about the results. Your my.cnf may be useful also - we can try to allocate more memory to MySQL and check for any difference. Do you have any queries with large [NOT] IN conditions (hundrends or thousands of values to check against)?

4.1.18 seems to be no different... still requiring that the mysqld process be shutdown about every 24 hours as it's performance degrades due to more and more unique memory segments it's using.

For the most part, the queries running on the DB server are very simple (1 record returned, no joins, hitting index properly [no table scans], etc.)  As far as memory allocation goes in my.cnf, the more memory I take AWAY from things, the better it runs (I guess it doesn't need to allocate and deallocate as much memory so it doesn't get fragmented as quickly).

Anyway, may I take a look at your my.cnf? Do you use query cache?

When I first realized it's a memory fragmentation issue, I ended up lowering the query cache down to 4MB (from 256MB).  The more memory allocated for things in the my.cnf file, the quicker the memory gets fragmented to the point mysqld is unusable.

Also, 99.5% of the queries have no gain from the query cache, so those ones all use the SQL_NO_CACHE directive in the SELECT statements.

[mysqld]
skip-external-locking
log-slow-queries
long_query_time=5
max_heap_table_size=16M
ft_min_word_len=3
max_connections = 1000
key_buffer_size = 256M
myisam_sort_buffer_size = 128M
join_buffer_size = 262144
read_buffer_size = 262144
sort_buffer_size = 2M
table_cache = 1800
thread_cache_size = 512
connect_timeout = 10
max_tmp_tables = 64
tmp_table_size = 32M
max_allowed_packet = 4M
read_rnd_buffer_size = 524288
max_connect_errors = 10
thread_concurrency = 4
query_cache_limit = 2M
query_cache_size = 4M
query_cache_type = 1
query_prealloc_size = 16384
query_alloc_block_size = 16384

Here's something I ran across randomly, and sounds like maybe the same problem?

http://www.linuxjournal.com/article/7451

<i>Of note, Memcached uses a slab allocator for memory allocation. Early versions of Memcached used the malloc from glibc and ended up falling on their faces after about a week, eating up a lot of CPU space due to address space fragmentation. A slab allocator allocates only large chunks of memory, slicing them up into little chunks for particular classes of items, then maintaining freelists for each class whenever an object is freed.</i>

Yes, it can be the same problem, but as glibc is used by large MySQL instances on other platfroms (Linux) and we had not identified exactly same problem there (yet), I am not sure.

As new version, 4.1.19, released, I suggest you to try it (64-bit version!, http://dev.mysql.com/get/Downloads/MySQL-4.1/mysql-max-4.1.19-apple-darwin8.2.1-powerpc-64...).  Please, inform about differences (if any).

I've actually upgraded to 5.0.21 and have been experiencing the same problems (the mysqld process needs to be restarted every 4 or 5 hours).  I'm currently thinking about putting together a workaround that basically runs 2 copies of mysqld and alternating restarting each process every hour or so (that way one process is always available, although on a different port).  Not a "good" solution, but at this point I have no choice but to restart the mysqld process multiple times per day (via a monitoring script).

Have you tried with 64-bit version of 5.0.21?

Yep, that's what I'm using.

Please, try to use a newer version, 5.0.24, just released, and inform about the results.

We are experiencing the same symptom on Solaris 10 OS running 4-AMD 64bit processors with 10GB real memory. MySQL version is 5.0.18 64 bit. The server is dedicated to MySQL and performs data warehouse/data mart queries. Performance is excellent immediately after a reboot, but deteriorates rapidly.  

"Flush tables" command restores performance to almost normal, but within 1-2 hours it is crawling again. Queries that run in 10 seconds after reboot run for 10-15 minutes when performance degrades. The system has plenty of RAM and is not swapping. Iostat shows some activity, but generally the disks are not very busy. All tables are MyISAM, query_cache_type=2 (Demand) to bypass query cache unless requested because queries are 99+% unique.

Comment added to this bug report because of note on May 12th relative to memory allocation and lack of evidence of this problem on other platforms.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".