Bug #868 | on linux with NPTL, mysqld hangs under high load | ||
---|---|---|---|
Submitted: | 17 Jul 2003 13:44 | Modified: | 30 Nov 2006 12:56 |
Reporter: | elaine forbes | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 3.23.54-log, current RH/mysql rpms | OS: | Linux (Redhat 9.0, Lunar linux) |
Assigned to: | CPU Architecture: | Any |
[17 Jul 2003 13:44]
elaine forbes
[21 Jul 2003 5:52]
Alexander Keremidarski
Please provide as much details as possible so we can repeat this problem. Did you tried the same test with RedHat hack which is supposed to turn off NPTL? export LD_ASSUME_KERNEL=2.2.5; mysqld_safe &
[11 Sep 2003 7:41]
elaine forbes
Appologies for the delay in getting back to you on this. I've not had the time to reboot this box to redhat, however I'm sure that your suggested work-around of: export LD_ASSUME_KERNEL=2.2.5; mysqld_safe & would work, as the issue replicated more or less exactly on a 2.5 kernel with NPTL. I would *like* to be running/testing mysql fully in an NPTL enabled environment however thus far I've not had much success building mysql from source against NPTL headers and libraries. Mysql(binary) does run a good bit faster on NPTL, and I assume that once it's compiled to specifically use NPTL the performance gain will be better. I see you've marked this as 'reproduced' so unless you ask I'm not going to attach the php+apache+mysql configuration in which I found the problem.
[21 Mar 2004 8:45]
[ name withheld ]
Seems we got a similar problem here. MySQL randomly hangs on a SMP-system (dual Xeon) with Fedora Core 1. Afaik this also features the NPTL-threads, since it's the successor of RedHat 9. The times it hangs are not reproducable however, and also occur in off-load times. Here the MySQL-version is 4.0.17. PS: Also mysql can't cleanly be shutdown. It doesn't respond to connects or a clean shutdown. Only killing it helps :-(
[17 May 2004 23:34]
Steve Meyers
Our experience seems to agree with what has been posted. Specifically, we did not have the problem when running 4.0.17 on RH 7.3. We upgraded to Fedora Core 1, and MySQL 4.0.18. We started having the problem approximately every one and a half weeks. Whenever the hang happens, after we kill it, we end up with database corruption. Fortunately, we use replication, and have always been able to recover. The problem has only ever happened on master or slave servers. One interesting side note is that if you strace the right process, the system will recover. However, we have still had database corruption when we did this. We currently have a spare replicated server live for the express purpose of recovering from this specific failure quickly. We would be glad to leave it running next time we experience this issue, to let someone have a look at it. One last thing - we have experienced both under our heaviest load, and under (relatively) light load.
[24 Jun 2004 21:01]
[ name withheld ]
Could this be the same problem as http://www.blackdown.org/java-linux/java-linux@java.blackdown.org/java-linux-msg00089.html ?
[14 Feb 2005 22:54]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[29 May 2006 18:06]
Valeriy Kravchuk
All reporters: Does anybody still have similar problems with 2.6.x kernels, modern versions of glibc/NPTL and latest versions of MySQL server (3.23.58, 4.0.27 or newer)?
[29 Jun 2006 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[30 Oct 2006 17:52]
jocelyn fournier
Hi, I'm experiencing exactly the same issue on a x86-64 server on Suse 10.1 + Glibc 2.4 (NPTL). Under high load / QPS, all the queries accumulates in the processlist with a NULL status, and only a few are stuck in update/end status. The problem has been reproduced with 5.0.26 and 5.1.11-beta. Regards, Jocelyn
[30 Oct 2006 17:59]
jocelyn fournier
Here is a show full processlist when the server is stuck : Id User Host db Command Time State Info 1 event_scheduler localhost NULL Connect 28735 Suspended NULL 409 slave 192.168.222.5:40909 NULL Binlog Dump 25050 Has sent all binlog to slave; waiting for binlog to be updated NULL [...] LEFT OUTER JOIN connectors c ON c.id=i.connectorid WHERE c.language='de' AND i.status<9 ORDER BY i.status DESC,i.created DESC LIMIT 100 22019 wikalsql 192.168.222.1:18035 wikal Query 615 NULL SELECT labelid,label FROM labels WHERE language="fr" AND groupid=21 22364 wikalsql 192.168.222.1:18390 wikal Execute 610 end UPDATE thesaurus SET name='PolÃtica - Partidos politicos - PSOE - Manuel MarÃ',language='ES',description='',keywords='in_title \\"Manuel MarÃn González\\"\\r\\nin_title \\"Manuel MarÃn\\"',industrial='',person='',global=1,created='2006-10-30 14:31:52',createdby='Wikio',modified='2006-10-30 17:17:52',modifiedby='phermouet' WHERE id=58324 22607 wikalsql 192.168.222.1:18741 wikal Query 613 NULL SELECT labelid,label FROM labels WHERE language="fr" AND groupid=21 22635 wikalsql 192.168.222.1:18764 wikal Query 615 NULL SELECT labelid,label FROM labels WHERE language="fr" AND groupid=21 22636 wikalsql 192.168.222.1:18766 wikal Query 617 NULL SELECT labelid,label FROM labels WHERE language="fr" AND groupid=6 22652 wikalsql 192.168.222.64:19765 wikal Query 617 NULL SELECT id, lastCapture FROM packages_totreat where status=0 ORDER BY priority ASC, dateCreated ASC LIMIT 4 22735 wikalsql 192.168.222.1:18968 wikal Query 615 NULL SELECT id FROM blacklist WHERE bltype=4 AND mask='192.168.222.2' 22739 wikalsql 192.168.222.1:18974 wikal Query 617 NULL SELECT id FROM blacklist WHERE bltype=4 AND mask='192.168.222.2' 22740 wikalsql 192.168.222.1:18979 wikal Query 616 NULL SELECT labelid,label FROM labels WHERE language="fr" AND groupid=21 22742 wikalsql 192.168.222.1:18980 wikal Query 614 NULL SELECT id FROM blacklist WHERE bltype=4 AND mask='192.168.222.2' 22829 wikalsql 192.168.222.6:11652 wikal Query 617 NULL SELECT id FROM blacklist WHERE bltype=1 AND mask="http://www.crunkrockradio.com" 22831 wikalsql 192.168.222.6:5836 wikal Query 614 NULL SELECT id FROM blacklist WHERE bltype=1 AND mask="http://www.lernzeit.de" 22902 wikalsql 192.168.222.6:4054 wikal Execute 560 end UPDATE servers_infos set value='314' where id=6 and param='nbConnectorsCaptured(300000ms)' 22939 wikalsql 192.168.222.1:19292 wikal Query 615 NULL SELECT labelid,label FROM labels WHERE language="fr" AND groupid=23 22940 wikalsql 192.168.222.1:19293 wikal Query 617 update INSERT INTO infos_misc (infoid,defcateg) VALUES (7655554,4093) 23001 wikalsql 192.168.222.64:9414 wikal Query 617 NULL SELECT categ FROM packages WHERE id=65890 23141 wikalsql 192.168.222.6:29052 wikal Query 617 NULL SELECT id FROM blacklist WHERE bltype=1 AND mask="http://www.wfp.org" 23197 wikalsql 192.168.222.6:25226 wikal Query 617 NULL SELECT id FROM blacklist WHERE bltype=1 AND mask="http://www.conservative.ca" 23205 wikalsql 192.168.222.6:18383 wikal Query 616 NULL SELECT id FROM blacklist WHERE bltype=1 AND mask="http://paris-photographie.com" 23209 wikalsql 192.168.222.6:9266 wikal Query 617 NULL SELECT id FROM blacklist WHERE bltype=1 AND mask="http://googlesystem.blogspot.com" 23279 wikalsql 192.168.222.1:19799 wikal Query 615 NULL SELECT labelid,label FROM labels WHERE language="fr" AND groupid=21 23281 wikalsql 192.168.222.1:19802 wikal Query 616 NULL SELECT id FROM blacklist WHERE bltype=4 AND mask='192.168.222.2' 23282 wikalsql 192.168.222.1:19803 wikal Query 615 NULL SELECT id FROM blacklist WHERE bltype=4 AND mask='192.168.222.2' 23287 wikalsql 192.168.222.6:20136 wikal Query 617 NULL SELECT id FROM blacklist WHERE bltype=1 AND mask="http://feeds.feedburner.com" 23296 wikalsql 192.168.222.64:10980 wikal Query 616 NULL SELECT categ FROM packages WHERE id=41724 23298 wikalsql 192.168.222.64:20514 wikal Query 617 NULL SELECT categ FROM packages WHERE id=41014 23309 wikalsql 192.168.222.6:17609 wikal Query 617 NULL SELECT lastcapture FROM connectors_stats WHERE connectorid=22648 23312 wikalsql 192.168.222.6:16215 wikal Query 23561 root localhost NULL Query 0 NULL show full processlist No clue about how to reproduce it, this seems to occur randomly under high load. Thanks, Jocelyn
[30 Oct 2006 18:31]
MySQL Verification Team
Jocelyn, We need very much a repeatable test case for this. Can you use sysbench, mysqlslap or similar tools in order to create one ?? We would be very gratefull if that could be done .... Sinisa Milivojevic
[31 Oct 2006 9:27]
jocelyn fournier
Hi Sinisa, I failed to reproduce the problem with sysbench with 300 // threads running on a table with 1M lines. I'll try to see if I can modify sysbench to run queries used by the application. (ideally it would be great if sysbench was able to parse a mysql log file to generate random queries based on what it has read in the log). We'll try also to replay the binary log until the failing point, but since it doesn't replay SELECT and it's not in // thread, I think it will not fail. Thanks, Jocelyn
[1 Dec 2006 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".