Bug #8555 | INSERT queries hang indefinitely on AMD64 | ||
---|---|---|---|
Submitted: | 16 Feb 2005 19:29 | Modified: | 19 Jul 2005 0:16 |
Reporter: | Michel Buijsman | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S1 (Critical) |
Version: | 4.1.10-x86-64 | OS: | Linux (Debian Linux 3.1 pure64) |
Assigned to: | Sergei Golubchik | CPU Architecture: | Any |
[16 Feb 2005 19:29]
Michel Buijsman
[18 Feb 2005 21:02]
Jorge del Conde
Hi Can you please attatch your perl test script to this issue, as well as the show-create statements of your tables so that we can try and reproduce this problem ? Thanks
[19 Feb 2005 21:33]
MySQL Verification Team
HI! We truly require a repeatable test case in order to be able to verify and / or fix this bug.
[21 Feb 2005 14:25]
Michel Buijsman
mysqldump of 2 tables incl data
Attachment: dump.sql.gz (application/x-gzip, text), 161.62 KiB.
[21 Feb 2005 14:26]
Michel Buijsman
Test script
Attachment: stresstest.pl (text/plain), 1.08 KiB.
[21 Feb 2005 14:26]
Michel Buijsman
Attached my test script, and a gzipped mysqldump of the test case including some data. Sometimes just running the test script will hang mysql all by itself, but most of the time it needs a few runs of "repair table testing" before it hangs.
[23 Feb 2005 15:21]
MySQL Verification Team
Hi, Thank you for the report, but I can't reproduce it. Can you upload your my.cnf file?
[23 Feb 2005 16:25]
Michel Buijsman
This is it. I've been toying with a few things, but this is the one I used with the test case I uploaded here. [mysqld] server-id = 10 #master-host = x.x.x.x #master-port = x #master-user = x #master-password = x #master-connect-retry = 30 #report-host = x.x.x.x #replicate-do-db = x relay-log = /var/lib/mysql/relay.log log-error=/var/log/mysql.error user=mysql socket=/tmp/mysql.sock skip-locking set-variable = key_buffer=2G set-variable = table_cache=1024 set-variable = sort_buffer=16M set-variable = read_buffer=16M set-variable = max_allowed_packet=10M set-variable = thread_cache=64 set-variable = thread_stack=512K set-variable = tmp_table_size=16M set-variable = interactive_timeout=600 set-variable = wait_timeout=600 set-variable = max_connections=1024 set-variable = query_cache_type=2 set-variable = query_cache_size=100M set-variable = join_buffer_size=8M set-variable = thread_concurrency=4 set-variable = myisam_sort_buffer_size=64M
[23 Feb 2005 18:48]
Sander Hollaar
The two systems are identical. Here are the specs: Tyan Transport GX28 2x AMD Opteron 250 4x Seagate Cheetah 15K.3 36GB 8x 1GB ECC Kernel: 2.6.11-rc2 64bit only OS: Debian pure64 gcc-3.4 (up to date) /boot is raid1, ext2 / is raid10, reiserfs What hw/sw specs do you guys use when you try to reproduce? Next thing we will try is run all this on one disk ext2 (no raid, no reiserfs), and if that doesn't work, we'll try kernel 2.4 if possible.
[11 Mar 2005 22:02]
Joe Stump
Has any progress been made on this? I have a dual 2.4GHz Opteron sitting idle until this comes online. Is there a fix available to supported or licensed users? Has anyone found a possible work around? Does running it within a debootstrap in i386 mode fix this problem (just a stab, it's next on my list of things to try). Please email me at joe at joestump dot net if anyone has made progress on this.
[11 Mar 2005 22:19]
Michel Buijsman
We're now running in 32 bit mode on the amd64, and that seems to work fine. One of the servers is now running as a replication slave with live data and it's actually replicating which is more than I got when running a 64 bit OS. I've done some head-to-head speed tests (2 identical servers, one with a 64 bit install and one with a regular 32 bit debian, both running mysql 4.1.10 with identical configs) and the 32 bit version is about 10% slower.
[24 Mar 2005 5:03]
Jorge del Conde
Hi, I was still unable to reproduce these results in our AMD64 platform using 4.1.11 from bk. Is your raid software based or hardware based ?
[9 Apr 2005 22:33]
Frederik Schüler
Hello, > I've tried 4.0.23, 4.1.9, 4.1.10 (all as x86-64 binary, also some of them > compiled) on two identical dual opterons with 8 gig ram running Debian 3.1 > pure64 gcc3.4. Tried kernels 2.6.11-rc1-mm1 and 2.6.11-rc2. The servers are/were > running as a replication master/slave pair. I have a couple of questions: 1. Are you aware the gcc-3.4/gcc-4.0 based debian.amd64 port is highly experimental and __not recommended__ for any kind of production system, considering it is compiled with an unreleased compiler version known to produce buggy code? 2. Why did you not send a mail to the debian-amd64@lists.debian.org about this topic? 3. Which are the corresponding bug numbers in the debian bug tracking system against the concerned mysql versions? I did not find any. 4. Can you give pure64 (the gcc-3.3 based one) a try, and check if the bug is present there too? Kind regards Frederik Schueler
[11 Apr 2005 14:31]
Michel Buijsman
> 1. Are you aware the gcc-3.4/gcc-4.0 based debian.amd64 port is highly > experimental and __not recommended__ for any kind of production system, > considering it is compiled with an unreleased compiler version known to produce > buggy code? I wasn't, no. > 2. Why did you not send a mail to the debian-amd64@lists.debian.org about this > topic? Since the OS itself was rock solid and it was mysql that was acting up in strange ways, the thought didn't really cross my mind that it might be an operating system bug. > 3. Which are the corresponding bug numbers in the debian bug tracking system > against the concerned mysql versions? I did not find any. There aren't any. I'm not using a debian package for mysql. > 4. Can you give pure64 (the gcc-3.3 based one) a try, and check if the bug is > present there too? I'd love to, but the servers in question were customer machines and I was already past the deadline on that project. So they've gone into production, running a regular 32 bit debian install...
[23 Apr 2005 22:15]
Stefan Praszalowicz
I seem to have this bug too :| The machine is a DELL 2850: Dual Xeon EM64T w/ hyperthreading, 4Go RAM, hw raid (megaraid 2) Linux version: 2.6.11.7 (compiled with gcc 3.4) Debian version: Sarge pure64 (gcc 3.3) Tested with mysql 4.0.24 AMD64, both gcc and icc. I encountered the bug while attempting to set this host as a slave, so I have a full binary DB copy with which I can reproduce the bug. The bug strikes when I start the replication thread, and the server is otherwise idle: there is no concurent access to any table, only the slave thread is running. The bug seems to strike on two tables from what I could see, ie. sometimes it would lock on a first table, sometimes past that one and lock on another. I checked and repaire the tables with various myisamchk options. 'ps' makes it look like the mysql processes aren't doing anything (apart from the relay thread picking up binlog from the master from time to time), ie processes are Sleeping. I'm available for more testing / reproducing as required. Stefan
[25 Apr 2005 19:37]
Frederik Schüler
hello, I had the opportunity to run a long-time stress test of mysql-4.1.11 as packaged in debian/sid the past weekend: 10 clients (4G dual xeons with debian-amd64 sarge) running the sql-bench suite against a quad opteron with 16G ram (debian-pure64 sarge too). The suite run approx 3 days (friday morning to monday morning), each run took between 200 and 215 minutes per client node, and there where no failed tests. The database run a total of 497M queries. We did some testing with replication too, having the quad opteron replicate to a dual opteron and a dual xeon while being treated by 10 concurrent sql-bench runs, without experiencing problems here. A long-time run of this test is still to be done. Kind regards Frederik Schueler
[27 Apr 2005 6:24]
stephane
Just to let you know that it also happens with another distrib! Precisely on a DELL PowerEdge 8500 (dual xeon64, no raid just ATA) with Mandrake 10.2 for x86_64 and MySQL-4.1.11.x86_64. All requests (including Select, Create temporary table, ...) were hanging, as if all databases were "locked", and so on until I kill-9 mysqld (since I failed with mysqladmin)... BTW, we did *not* encounter the problem with Mandrake 10.1 for x86-64 and MySQL-4.0.20.x86-64, but the tables were rebuild from latin1 to utf-8 for 4.1.11 We have various Insert queries from JDBC and PHP, it happens randomly but at least 1/week and up to 2/day! I will try the i586 version in a few days hopping it is better...
[2 Jun 2005 19:10]
Hartmut Holzgraefe
I have been able to verify this on a dual cpu debian system (at least i think its the same issue), on this system the easiest way to reproduce it is to run "make test", this will hang after the "archive" test as the "auto_increment" test never returns i was able to track it down to myisam/mi_write.c:107 (4.1.12 source): rw_wrlock(&share->key_root_lock[i]); this call never returned this machine had NTPL as thread library only, LinuxThreads were not installed replacing the mysqld binary build on the machine itself with one build on another macheine and statically linked against LinuxThreads made it pass the auto_increment test just fine so this is a mysqld<->NTPL problem that shows up on AMD64 (maybe on multi-CPU systems only) with NTPL. given that the mysql codebase passes the auto_increment test on a lot of systems with different pthread implementations i think its NTPL that is to blame, not mysql, but i'm not 100% sure about it
[3 Jun 2005 5:50]
stephane
I've replaced Mysql x86_64 version by the i586 version and there is no hanging anymore (working fine since 1 month), so if it is a NTPL problem then it only happens with the 64bit mysql binary...
[16 Jun 2005 2:04]
Stewart Smith
This looks like the glibc bug that JimW blogged about and that I mentioned in Bug#7254. As for other distros, my bet is they too have this glibc bug.
[19 Jul 2005 0:16]
Jim Winstead
This is a bug with glibc for x86_64 on Debian, Ubuntu, and almost certainly the other distributions. You can track the Ubuntu bug here: https://bugzilla.ubuntu.com/show_bug.cgi?id=11730 The Debian bug is here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=314408 You can verify the bug using the MySQL test suite, you don't have to rely on it to crop up in your own application. The auto_increment test in the test suite is the first one that will hang due to this issue.
[1 Aug 2006 14:17]
Moritz Mertinkat
Well, it might be glibc problem, but it seems to me, that it also has something to do with the database. The problem is connected to myisam tables - no problems occur with innodb tables. However the locks on myisam INSERTs and UPDATEs only occur when the MYI file is corrupted. I had no problem with 4.1.10 or 4.1.11 (don't remember that exactly); the trouble started with an update to mysql 4.1.15 on Debian Linux AMD64 (Linux xxx 2.6.15-1-em64t-p4-smp #2 SMP Tue Mar 7 21:06:43 UTC 2006 x86_64 GNU/Linux). I'm not sure whether these myisam tables were broken before I upgraded, but afterwards I fixed them nearly twice a day for not running into LOCKING troubles. Any ideas? Thank you, Maurice.
[11 Jan 2007 16:09]
morten kallesoee
I have to agree with Maurice, there is also something mysql specific in this issue. We are currently running with version 4.1.11 from the debian package system (64bit bin) and have no problems with that version, but when we try to use the newste version 4.1.22 from mysql.com we have those insert that just hangs, We are running with a replication setup aswell, since we are running production on the system, its hard to get the time needed for testing. Although my next shot will be to upgrade glibc on both machines. maybe try the 32bit version of the mysql server. Morten
[30 May 2007 21:44]
Jonathan Schatz
We're seeing this issue as well, and we've narrowed it down somewhat. We're running on a 8 x Opteron system, but we're running in 32 bit mode (inside of a vm). Changing some search code from "INSTR" to "MATCH AGAINST" triggers hangs on INSERT, SELECT, etc. Like everyone else, this is something that we cannot reproduce in testing, but it exhibits itself in production within 12-24 hours. Once this occurs all new and open db handles become unresponsive. The only solution is to kill -9 all the mysqld processes. This happened both with 4.0.21 and 5.0.27. We're using myisam tables. I haven't tried disabling concurrent inserts yet. Is there any update on this issue, or any other bug I should be following?
[30 May 2007 21:55]
[ name withheld ]
While we used Debian 3.1 and MySQL/4 from the backports, the bug occured. Upgrading libc6 to a newer version from testing fixed the problem (only update libc6 not the rest of the system!). For Debian 4.0, the problem has gone completely. I think that this bug of course has to do something "with the database" - the newer MySQL versions probably make use of libc functions which cause the bug to happen while the older ones don't.