| Bug #868 | on linux with NPTL, mysqld hangs under high load | ||
|---|---|---|---|
| Submitted: | 17 Jul 2003 15:44 | Modified: | 30 Nov 2006 13:56 |
| Reporter: | elaine forbes | ||
| Status: | No Feedback | ||
| Category: | Server | Severity: | S2 (Serious) |
| Version: | 3.23.54-log, current RH/mysql rpms | OS: | Linux (Redhat 9.0, Lunar linux) |
| Assigned to: | Target Version: | ||
[17 Jul 2003 15:44]
elaine forbes
[21 Jul 2003 7:52]
Alexander Keremidarski
Please provide as much details as possible so we can repeat this problem. Did you tried the same test with RedHat hack which is supposed to turn off NPTL? export LD_ASSUME_KERNEL=2.2.5; mysqld_safe &
[11 Sep 2003 9:41]
elaine forbes
Appologies for the delay in getting back to you on this. I've not had the time to reboot this box to redhat, however I'm sure that your suggested work-around of: export LD_ASSUME_KERNEL=2.2.5; mysqld_safe & would work, as the issue replicated more or less exactly on a 2.5 kernel with NPTL. I would *like* to be running/testing mysql fully in an NPTL enabled environment however thus far I've not had much success building mysql from source against NPTL headers and libraries. Mysql(binary) does run a good bit faster on NPTL, and I assume that once it's compiled to specifically use NPTL the performance gain will be better. I see you've marked this as 'reproduced' so unless you ask I'm not going to attach the php+apache+mysql configuration in which I found the problem.
[21 Mar 2004 9:45]
[ name withheld ]
Seems we got a similar problem here. MySQL randomly hangs on a SMP-system (dual Xeon) with Fedora Core 1. Afaik this also features the NPTL-threads, since it's the successor of RedHat 9. The times it hangs are not reproducable however, and also occur in off-load times. Here the MySQL-version is 4.0.17. PS: Also mysql can't cleanly be shutdown. It doesn't respond to connects or a clean shutdown. Only killing it helps :-(
[18 May 2004 1:34]
Steve Meyers
Our experience seems to agree with what has been posted. Specifically, we did not have the problem when running 4.0.17 on RH 7.3. We upgraded to Fedora Core 1, and MySQL 4.0.18. We started having the problem approximately every one and a half weeks. Whenever the hang happens, after we kill it, we end up with database corruption. Fortunately, we use replication, and have always been able to recover. The problem has only ever happened on master or slave servers. One interesting side note is that if you strace the right process, the system will recover. However, we have still had database corruption when we did this. We currently have a spare replicated server live for the express purpose of recovering from this specific failure quickly. We would be glad to leave it running next time we experience this issue, to let someone have a look at it. One last thing - we have experienced both under our heaviest load, and under (relatively) light load.
[24 Jun 2004 23:01]
[ name withheld ]
Could this be the same problem as http://www.blackdown.org/java-linux/java-linux@java.blackdown.org/java-linux-msg00089.html ?
[14 Feb 2005 23:54]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[29 May 2006 20:06]
Valeriy Kravchuk
All reporters: Does anybody still have similar problems with 2.6.x kernels, modern versions of glibc/NPTL and latest versions of MySQL server (3.23.58, 4.0.27 or newer)?
[30 Jun 2006 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[30 Oct 2006 18:52]
jocelyn fournier
Hi, I'm experiencing exactly the same issue on a x86-64 server on Suse 10.1 + Glibc 2.4 (NPTL). Under high load / QPS, all the queries accumulates in the processlist with a NULL status, and only a few are stuck in update/end status. The problem has been reproduced with 5.0.26 and 5.1.11-beta. Regards, Jocelyn
[30 Oct 2006 18:59]
jocelyn fournier
Here is a show full processlist when the server is stuck :
Id User Host db Command Time State Info
1 event_scheduler localhost NULL Connect 28735
Suspended NULL
409 slave 192.168.222.5:40909 NULL Binlog Dump 25050
Has sent all binlog to slave;
waiting for binlog to be updated NULL
[...]
LEFT OUTER JOIN connectors c ON c.id=i.connectorid WHERE c.language='de' AND
i.status<9 ORDER BY i.status DESC,i.created DESC LIMIT 100
22019 wikalsql 192.168.222.1:18035 wikal Query 615
NULL SELECT labelid,label FROM
labels WHERE language="fr" AND groupid=21
22364 wikalsql 192.168.222.1:18390 wikal Execute 610
end UPDATE thesaurus SET
name='PolÃtica - Partidos politicos - PSOE - Manuel
MarÃ',language='ES',description='',keywords='in_title \\"Manuel MarÃn
González\\"\\r\\nin_title \\"Manuel
MarÃn\\"',industrial='',person='',global=1,created='2006-10-30
14:31:52',createdby='Wikio',modified='2006-10-30 17:17:52',modifiedby='phermouet'
WHERE id=58324
22607 wikalsql 192.168.222.1:18741 wikal Query 613
NULL SELECT labelid,label FROM
labels WHERE language="fr" AND groupid=21
22635 wikalsql 192.168.222.1:18764 wikal Query 615
NULL SELECT labelid,label FROM
labels WHERE language="fr" AND groupid=21
22636 wikalsql 192.168.222.1:18766 wikal Query 617
NULL SELECT labelid,label FROM
labels WHERE language="fr" AND groupid=6
22652 wikalsql 192.168.222.64:19765 wikal Query 617
NULL SELECT id, lastCapture FROM
packages_totreat where status=0 ORDER BY priority ASC, dateCreated ASC LIMIT 4
22735 wikalsql 192.168.222.1:18968 wikal Query 615
NULL SELECT id FROM blacklist
WHERE bltype=4 AND mask='192.168.222.2'
22739 wikalsql 192.168.222.1:18974 wikal Query 617
NULL SELECT id FROM blacklist
WHERE bltype=4 AND mask='192.168.222.2'
22740 wikalsql 192.168.222.1:18979 wikal Query 616
NULL SELECT labelid,label FROM
labels WHERE language="fr" AND groupid=21
22742 wikalsql 192.168.222.1:18980 wikal Query 614
NULL SELECT id FROM blacklist
WHERE bltype=4 AND mask='192.168.222.2'
22829 wikalsql 192.168.222.6:11652 wikal Query 617
NULL SELECT id FROM blacklist
WHERE bltype=1 AND mask="http://www.crunkrockradio.com"
22831 wikalsql 192.168.222.6:5836 wikal Query 614
NULL SELECT id FROM blacklist
WHERE bltype=1 AND mask="http://www.lernzeit.de"
22902 wikalsql 192.168.222.6:4054 wikal Execute 560
end UPDATE servers_infos set
value='314' where id=6 and param='nbConnectorsCaptured(300000ms)'
22939 wikalsql 192.168.222.1:19292 wikal Query 615
NULL SELECT labelid,label FROM
labels WHERE language="fr" AND groupid=23
22940 wikalsql 192.168.222.1:19293 wikal Query 617
update INSERT INTO infos_misc
(infoid,defcateg) VALUES (7655554,4093)
23001 wikalsql 192.168.222.64:9414 wikal Query 617
NULL SELECT categ FROM packages
WHERE id=65890
23141 wikalsql 192.168.222.6:29052 wikal Query 617
NULL SELECT id FROM blacklist
WHERE bltype=1 AND mask="http://www.wfp.org"
23197 wikalsql 192.168.222.6:25226 wikal Query 617
NULL SELECT id FROM blacklist
WHERE bltype=1 AND mask="http://www.conservative.ca"
23205 wikalsql 192.168.222.6:18383 wikal Query 616
NULL SELECT id FROM blacklist
WHERE bltype=1 AND mask="http://paris-photographie.com"
23209 wikalsql 192.168.222.6:9266 wikal Query 617
NULL SELECT id FROM blacklist
WHERE bltype=1 AND mask="http://googlesystem.blogspot.com"
23279 wikalsql 192.168.222.1:19799 wikal Query 615
NULL SELECT labelid,label FROM
labels WHERE language="fr" AND groupid=21
23281 wikalsql 192.168.222.1:19802 wikal Query 616
NULL SELECT id FROM blacklist
WHERE bltype=4 AND mask='192.168.222.2'
23282 wikalsql 192.168.222.1:19803 wikal Query 615
NULL SELECT id FROM blacklist
WHERE bltype=4 AND mask='192.168.222.2'
23287 wikalsql 192.168.222.6:20136 wikal Query 617
NULL SELECT id FROM blacklist
WHERE bltype=1 AND mask="http://feeds.feedburner.com"
23296 wikalsql 192.168.222.64:10980 wikal Query 616
NULL SELECT categ FROM packages
WHERE id=41724
23298 wikalsql 192.168.222.64:20514 wikal Query 617
NULL SELECT categ FROM packages
WHERE id=41014
23309 wikalsql 192.168.222.6:17609 wikal Query 617
NULL SELECT lastcapture FROM
connectors_stats WHERE connectorid=22648
23312 wikalsql 192.168.222.6:16215 wikal Query
23561 root localhost NULL Query 0 NULL
show full processlist
No clue about how to reproduce it, this seems to occur randomly under high load.
Thanks,
Jocelyn
[30 Oct 2006 19:31]
Sinisa Milivojevic
Jocelyn, We need very much a repeatable test case for this. Can you use sysbench, mysqlslap or similar tools in order to create one ?? We would be very gratefull if that could be done .... Sinisa Milivojevic
[31 Oct 2006 10:27]
jocelyn fournier
Hi Sinisa, I failed to reproduce the problem with sysbench with 300 // threads running on a table with 1M lines. I'll try to see if I can modify sysbench to run queries used by the application. (ideally it would be great if sysbench was able to parse a mysql log file to generate random queries based on what it has read in the log). We'll try also to replay the binary log until the failing point, but since it doesn't replay SELECT and it's not in // thread, I think it will not fail. Thanks, Jocelyn
[1 Dec 2006 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
