Bug #10472 | InnoDB deadlock in srv0srv.c causing server crash and restart | ||
---|---|---|---|
Submitted: | 9 May 2005 12:13 | Modified: | 13 Feb 2007 16:54 |
Reporter: | N.C. Rozemeijer | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S1 (Critical) |
Version: | 4.1.9-max-log | OS: | Linux (Red Hat Linux release 9, Windows XP SP2) |
Assigned to: | Heikki Tuuri | CPU Architecture: | Any |
[9 May 2005 12:13]
N.C. Rozemeijer
[10 May 2005 9:52]
Marko Mäkelä
Not enough information was provided for us to be able to handle this bug. Please re-read the instructions at http://bugs.mysql.com/how-to-report.php If you can provide more information, feel free to add it to this bug and change the status back to 'Open'. Thank you for your interest in MySQL. Additional info: The crash in the InnoDB master thread is intentional. If someone has been waiting for a lock for over 10 minutes, something is wrong, and the situation is corrected by restarting the server. You should find the cause of this bug. Is the server heavily loaded during the crash? Are several queries trying to update the same rows (perhaps via cascaded foreign key operations)? CHECK TABLE is known to cause this, but there the timeout has been increased by 2 hours.
[10 May 2005 12:48]
N.C. Rozemeijer
I have seen this problem several times and in each of these cases I could not find queries that would lock the same table or the same row. Most cases there is a SELECT running on one table and an INSERT on another table that is not related in any way (not even through cascaded foreign keys). Even if two queries would lock the same table, I would expect the dead-lock detection in InnoDB to rollback one of the queries and return a timeout after waiting for innodb_lock_wait_timeout. Or am I wrong there? I have also not been able to trace the issue back to heavy system loads. Most occurrences that I have witnessed, there was hardly any load on the system. What puzzles me most is that the crash always happens in the same threadID. I'm not sure how the threadID is determined (i.e. is this assigned by the O/S or thread library or is it an internal ID that InnoDB or MySQL administrates?). This ThreadID seems to be the constant factor in case of this problem, not the queries that are running at the time.
[10 May 2005 14:02]
Marko Mäkelä
Yes, it's always the master thread that crashes: it is an intentional crash when the master thread notices that a lock wait has lasted for over ten minutes. It is not necessarily caused by a row lock or a table lock. The lock waiting code is also used for some InnoDB internal mutexes related to memory-based data structures, such as the data dictionary. The bug could be caused by InnoDB forgetting to release a mutex. Database related locks should be automatically released at transaction commit or rollback. Have you perhaps forgot a transaction open for a long time? Can you send us the full lock monitor output? Also, could you enable innodb_status_file and send a copy of the file when it crashes next time? Could you perhaps also enable query logging?
[11 May 2005 6:17]
N.C. Rozemeijer
* Have attached the full (unedited) error log with monitor info. * Have restarted the server with innodb_status_file enabled; will send the status file next time the server crashes. * I'm reluctant to start with full query logging since the server is running at about 20 to 30 queries per second average and I don't think I have sufficient disk space to log all these queries for several days. * We are running almost all transactions with autocommit=1. Actually we never set autocommit=0. There are only a very few transactions of a couple of statements and for these we always use begin...commit or begin...rollback. I therefore think that it is unlikely that there are (unintentionally) very long and / or large transactions. I have never noticed anything like this in our logs. Also, most processes that are accessing the database are short-life, i.e. couple of seconds to a few minutes. We are also running a couple of daemons, but these are 'polling' and connect and disconnect to the server for each poll cycle. Having a transaction stuck there (i.e. missing a commit or rollback for a long time) seems unlikely too.
[24 May 2005 13:03]
N.C. Rozemeijer
Contents of the innodb_status logfile when a crash occurs: ===================================== 050523 7:46:21 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 16 seconds ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 105906, signal count 104408 Mutex spin waits 41474103, rounds 28658527, OS waits 38158 RW-shared spins 57613, OS waits 15404; RW-excl spins 30984, OS waits 2031 ------------ TRANSACTIONS ------------ Trx id counter 0 25018760 Purge done for trx's n:o < 0 25018380 undo n:o < 0 0 History list length 16 Total number of lock structs in row lock hash table 0 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0 25018759, not started, process no 19040, OS thread id 1105022256 MySQL thread id 1568809, query id 16482525 localhost tmcfdb ---TRANSACTION 0 25018590, not started, process no 19040, OS thread id 1104821552 MySQL thread id 1568807, query id 16481827 localhost tmcfdb ---TRANSACTION 0 25018360, not started, process no 19040, OS thread id 1474338096 MySQL thread id 1568804, query id 16480966 localhost tmcfdb ---TRANSACTION 0 25018176, not started, process no 19040, OS thread id 1106705712 MySQL thread id 1568800, query id 16480211 localhost tmcfdb ---TRANSACTION 0 25018612, not started, process no 19040, OS thread id 1106906416 MySQL thread id 1568415, query id 16482043 localhost tmcfdb -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: waiting for i/o request (read thread) I/O thread 3 state: waiting for i/o request (write thread) ev set Pending normal aio reads: 0, aio writes: 17, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 11197 OS file reads, 2549614 OS file writes, 112551 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 1.19 writes/s, 0.19 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf for space 0: size 1, free list len 0, seg size 2, is empty Ibuf for space 0: size 1, free list len 0, seg size 2, 567 inserts, 568 merged recs, 408 merges Hash table size 1106407, used cells 303047, node heap has 348 buffer(s) 12322.23 hash searches/s, 374.91 non-hash searches/s --- LOG --- Log sequence number 0 1734031831 Log flushed up to 0 1734031831 Last checkpoint at 0 1734029649 0 pending log writes, 0 pending chkp writes 2075905 log i/o's done, 1.12 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total memory allocated 303985821; in additional pool allocated 2292608 Buffer pool size 16384 Free buffers 976 Database pages 15060 Modified db pages 17 Pending reads 0 Pending writes: LRU 0, flush list 17, single page 0 Pages read 12344, created 2716, written 596445 0.00 reads/s, 0.00 creates/s, 0.00 writes/s Buffer pool hit rate 1000 / 1000 -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue Main thread process no. 19040, id 1487379760, state: flushing buffer pool pages Number of rows inserted 380029, updated 1893624, deleted 0, read 1682634038 1.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 11368.79 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================
[1 Jun 2005 16:07]
Heikki Tuuri
Hi! Looks like file reads or writes end up in a pending state. This might be memory corruption, bad hardware, an OS bug, or an unknown InnoDB bug. I am putting this bug report to 'Can't repeat' state until we get more similar reports. Regards, Heikki FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: waiting for i/o request (read thread) ev set I/O thread 3 state: waiting for i/o request (write thread) Pending normal aio reads: 57, aio writes: 0, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 983 OS file reads, 23 OS file writes, 19 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s Pending normal aio reads: 0, aio writes: 0, ibuf aio reads: 0, log i/o's: 1, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 3529 OS file reads, 757225 OS file writes, 29313 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s Pending normal aio reads: 1, aio writes: 0, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 3591 OS file reads, 6750 OS file writes, 23 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
[24 Jun 2005 4:21]
Eric Hodel
I am seeing this same bug using MySQL 4.1.12 on FreeBSD 5.4 with libthr. I believe libpthread manifests the same bug but with much more annoying symptoms (140k context switches/sec as it continually tries to find a runnable thread).
[27 Jun 2005 13:36]
Heikki Tuuri
Hi! Please post a few outputs of the InnoDB monitor during the hang. Regards, Heikki
[27 Jul 2005 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[10 Aug 2005 9:29]
Tung Teck Lee
had the same problem on redhat linux 9 kernel 2.4.20-30-smp mysql version: 4.0.24-max one clue: just before this problem happened (multiple times, within 1-2 days intervals), I enabled query cache. trying to remove query cache and see if it goes away. Below are info from db error file and innodb status file. Note that the status file is abt 14 mins before the err file dump... I thought the status file is updated every 15 seconds? ********** db error file extract.. InnoDB: ###### Diagnostic info printed to the standard error stream InnoDB: Warning: a long semaphore wait: --Thread 1500592944 has waited at ../../innobase/trx/../include/trx0sys.ic line 101 for 786.00 seconds the semaphore: X-lock on RW-latch at 0x41bda560 created in file buf0buf.c line 436 a writer (thread id 1500592944) has reserved it in mode wait exclusive number of readers 1, waiters flag 1 Last time read locked in file buf0flu.c line 532 Last time write locked in file ../../innobase/trx/../include/trx0sys.ic line 101 InnoDB: Warning: a long semaphore wait: --Thread 1317522224 has waited at trx0trx.c line 608 for 786.00 seconds the semaphore: Mutex at 0x43136168 created file srv0srv.c line 765, lock var 1 waiters flag 1 InnoDB: Warning: a long semaphore wait: --Thread 1501596464 has waited at trx0trx.c line 608 for 785.00 seconds the semaphore: Mutex at 0x43136168 created file srv0srv.c line 765, lock var 1 waiters flag 1 ... similar lines for a while InnoDB: ###### Diagnostic info printed to the standard error stream InnoDB: Error: semaphore wait has lasted > 600 seconds InnoDB: We intentionally crash the server, because it appears to be hung. 050810 22:50:31InnoDB: Assertion failure in thread 1329539888 in file srv0srv.c line 1795 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/mysql/en/Forcing_recovery.html InnoDB: about forcing recovery. mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=134217728 read_buffer_size=1044480 max_used_connections=170 max_connections=400 threads_connected=96 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 948668 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd=(nil) Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... Cannot determine thread, fp=0x4f3f2734, backtrace may not be correct. Stack range sanity check OK, backtrace follows: 0x80daa90 0x400498f8 (nil) 0x420df147 New value of fp=(nil) failed sanity check, terminating stack trace! ****** inndo status file output (just b4 it crashed) ******* ===================================== 050810 22:36:55 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 16 seconds ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 9243, signal count 8689 Mutex spin waits 12591, rounds 104741, OS waits 4425 RW-shared spins 9275, OS waits 4637; RW-excl spins 119, OS waits 116 ------------ TRANSACTIONS ------------ Trx id counter 0 973758973 Purge done for trx's n:o < 0 973737703 undo n:o < 0 0 Total number of lock structs in row lock hash table 0 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0 973758837, not started, process no 3292, OS thread id 1479580464 MySQL thread id 12185, query id 2892184 [ip details removed] ---TRANSACTION 0 973758807, not started, process no 3292, OS thread id 1478777648 MySQL thread id 12186, query id 2892149 [ip details removed] ---TRANSACTION 0 973758805, not started, process no 3292, OS thread id 1495374640 MySQL thread id 12184, query id 2892245 [ip details removed] .... many more similar items .... -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: waiting for i/o request (read thread) I/O thread 3 state: waiting for i/o request (write thread) ev set Pending normal aio reads: 0, aio writes: 1, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 6727 OS file reads, 21228 OS file writes, 19031 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf for space 0: size 1, free list len 17, seg size 19, 248 inserts, 248 merged recs, 248 merges Hash table size 553253, used cells 281252, node heap has 386 buffer(s) 1.81 hash searches/s, 6.25 non-hash searches/s --- LOG --- Log sequence number 0 1048848988 Log flushed up to 0 1048848988 Last checkpoint at 0 1048848978 0 pending log writes, 0 pending chkp writes 9689 log i/o's done, 0.00 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total memory allocated 179169764; in additional pool allocated 1965440 Buffer pool size 8192 Free buffers 865 Database pages 6941 Modified db pages 1 Pending reads 0 Pending writes: LRU 0, flush list 1, single page 0 Pages read 6936, created 5, written 7044 0.00 reads/s, 0.00 creates/s, 0.00 writes/s Buffer pool hit rate 1000 / 1000 -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue Main thread process no. 3292, id 1337932592, state: flushing buffer pool pages Number of rows inserted 131, updated 49, deleted 121, read 611929 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 2.00 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================
[15 Sep 2006 15:07]
Aleksandar Ristovski
I have the same problem, and the log file looks very much like the other logs in this bug report. We run our usual SQL queries but at some point mysql hung. It did not crash right away, but it was just stuck at a simple SQL involving inserting into a table; after 10 hours I connected to the server to see the status and approx. 10 minutes later it crashed. Final part of the log file: InnoDB: ###### Diagnostic info printed to the standard error stream InnoDB: Warning: a long semaphore wait: --Thread 1306241840 has waited at ../include/btr0btr.ic line 28 for 787.00 seconds the semaphore: X-lock on RW-latch at 0x402b6748 created in file buf0buf.c line 469 a writer (thread id 1306241840) has reserved it in mode wait exclusive number of readers 1, waiters flag 1 Last time read locked in file buf0flu.c line 562 Last time write locked in file ../include/btr0btr.ic line 28 InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info: InnoDB: Pending preads 0, pwrites 0 ===================================== 060915 10:44:21 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 32 seconds ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 18, signal count 17 --Thread 1306241840 has waited at ../include/btr0btr.ic line 28 for 802.00 seconds the semaphore: X-lock on RW-latch at 0x402b6748 created in file buf0buf.c line 469 a writer (thread id 1306241840) has reserved it in mode wait exclusive number of readers 1, waiters flag 1 Last time read locked in file buf0flu.c line 562 Last time write locked in file ../include/btr0btr.ic line 28 Mutex spin waits 17, rounds 180, OS waits 9 RW-shared spins 16, OS waits 8; RW-excl spins 1, OS waits 1 ------------ TRANSACTIONS ------------ Trx id counter 0 1101 Purge done for trx's n:o < 0 0 undo n:o < 0 0 History list length 119 Total number of lock structs in row lock hash table 410 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0 1100, not started, process no 16377, OS thread id 1306643248 MySQL thread id 17, query id 666 aristovski01.klocwork.com 10.0.1.45 root ---TRANSACTION 0 0, not started, process no 16377, OS thread id 1306241840 updating table statistics mysql tables in use 1, locked 1 MySQL thread id 14, query id 592 aristovski01.klocwork.com 10.0.1.45 checking SHOW TABLE STATUS ---TRANSACTION 0 0, not started, process no 16377, OS thread id 1107139376 MySQL thread id 12, query id 589 aristovski01.klocwork.com 10.0.1.45 checking ---TRANSACTION 0 0, not started, process no 16377, OS thread id 1106938672 MySQL thread id 10, query id 543 aristovski01.klocwork.com 10.0.1.45 checking ---TRANSACTION 0 1097, ACTIVE 56772 sec, process no 16377, OS thread id 1105156912 inserting, thread declared inside InnoDB 371 mysql tables in use 1, locked 1 415 lock struct(s), heap size 44352, undo log entries 36146 MySQL thread id 9, query id 455 devlx90c1 10.0.0.145 Sending data insert into problem_arg ( upid, bld_id, arg_no, arg_info) select pdt.upid, @build_id, pat.arg_no, pat.arg_info from CEMSR1400_nortel__shared.problem_dat as pdt, CEMSR1400_nortel__shared.problem_arg_dat as pat where pat.prob_id = pdt.prob_id Trx read view will not see trx with id >= 0 1098, sees < 0 1098 -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: waiting for i/o request (read thread) I/O thread 3 state: waiting for i/o request (write thread) ev set Pending normal aio reads: 0, aio writes: 128, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 0 OS file reads, 586 OS file writes, 585 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf for space 0: size 1, free list len 0, seg size 2, is empty Ibuf for space 0: size 1, free list len 0, seg size 2, 0 inserts, 0 merged recs, 0 merges Hash table size 553253, used cells 21218, node heap has 22 buffer(s) 0.00 hash searches/s, 0.00 non-hash searches/s --- LOG --- Log sequence number 0 6440496 Log flushed up to 0 6440496 Last checkpoint at 0 43634 0 pending log writes, 0 pending chkp writes 202 log i/o's done, 0.00 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total memory allocated 148886906; in additional pool allocated 1048576 Buffer pool size 8192 Free buffers 7444 Database pages 724 Modified db pages 561 Pending reads 0 Pending writes: LRU 0, flush list 129, single page 0 Pages read 0, created 877, written 192 0.00 reads/s, 0.00 creates/s, 0.00 writes/s Buffer pool hit rate 1000 / 1000 -------------- ROW OPERATIONS -------------- 1 queries inside InnoDB, 0 queries in queue Main thread process no. 16377, id 1100888880, state: waiting for server activity Number of rows inserted 37485, updated 0, deleted 0, read 10445 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================ InnoDB: ###### Diagnostic info printed to the standard error stream InnoDB: Error: semaphore wait has lasted > 600 seconds InnoDB: We intentionally crash the server, because it appears to be hung. 060915 10:44:36InnoDB: Assertion failure in thread 1096670000 in file srv0srv.c line 1873 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/mysql/en/Forcing_recovery.html InnoDB: about forcing recovery. mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=33554432 read_buffer_size=520192 max_used_connections=6 max_connections=100 threads_connected=6 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 288367 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd=(nil) Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... Cannot determine thread, fp=0x415dd724, backtrace may not be correct. Stack range sanity check OK, backtrace follows: 0x8135b13 0x4005f8f8 (nil) 0x420df147 New value of fp=(nil) failed sanity check, terminating stack trace! Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved stack trace is much more helpful in diagnosing the problem, so please do resolve it The manual page at http://www.mysql.com/doc/en/Crashing.html contains information that should help you find out what is causing the crash.
[18 Sep 2006 6:19]
N.C. Rozemeijer
Upgrading our O/S to Red Hat Enterprise 4 has solved this issue for us. It looks like a compatibility issue between MySQL and RH9.
[18 Sep 2006 12:48]
Heikki Tuuri
Aleksandar, what Linux distro you were running? Regards, Heikki
[19 Sep 2006 14:26]
Aleksandar Ristovski
Heikki, the crash happened on RH9 and today it happened on Windows XP SP2 (exactly the same symptomps). mysql version is 4.1.10
[22 Sep 2006 13:46]
Heikki Tuuri
Aleksandar, below is the hang you reported from Windows: >===================================== >> 060918 17:36:36 INNODB MONITOR OUTPUT >> ===================================== >> Per second averages calculated from the last 16 seconds >> ---------- >> SEMAPHORES >> ---------- >> OS WAIT ARRAY INFO: reservation count 623, signal count 615 >> --Thread 2856 has waited at <cut>mysqldev\build\mysql-4.1.10-build\mysql-4.1.10\innobase\buf\buf0flu.c line 390 for 811.00 seconds the semaphore: >> Mutex at 02FB6C80 created file <cut>mysql-4.1.10-build\mysql-4.1.10\innobase\trx\trx0sys.c line 103, lock var 1 >> waiters flag 1 >> Mutex spin waits 505, rounds 2777, OS waits 58 >> RW-shared spins 802, OS waits 401; RW-excl spins 152, OS waits 151 but the hang on Linux happened on a different semaphore. My guess is that the Linux hangs are due to some bug in the Linux distro. The Windows hang is on this semaphore: trx0sys.c: /******************************************************************** Creates or initialializes the doublewrite buffer at a database start. */ static void trx_doublewrite_init( /*=================*/ byte* doublewrite) /* in: pointer to the doublewrite buf header on trx sys page */ { trx_doublewrite = mem_alloc(sizeof(trx_doublewrite_t)); /* Since we now start to use the doublewrite buffer, no need to call fsync() after every write to a data file */ #ifdef UNIV_DO_FLUSH os_do_not_call_flush_at_each_write = TRUE; #endif /* UNIV_DO_FLUSH */ mutex_create(&(trx_doublewrite->mutex)); I have never seen a report of a hang on this particular semaphore. This still remains a mystery. You may try an upgrade to the latest 4.1.xx. Regards, Heikki
[5 Oct 2006 19:55]
Connie Carver
I am currently having this problem on Fedora Core 1
[8 Nov 2006 14:42]
Heikki Tuuri
Putting the status to 'Can't repeat' until we get more reports of the hang on trx_doublewrite->mutex. I could not find any obvious bug from buf0flu.c in 4.1.16. The mutex is reserved during calls to fil I/O and flush. An operating system hang in those I/O operations would explain the hang, but Windows has usually given an OS error if it fails, it has not hung. Thus, this remains a mystery. The hang on Linux may be one of the many OS problems people have reported over years.
[13 Feb 2007 16:54]
Heikki Tuuri
Putting the status again to 'Can't repeat' because there is no common factor in the various hang reports that have accumulated here. Some of the problems may be due to OS bugs or faulty hardware, or to InnoDB table corruption.
[15 Jun 2007 7:03]
kris dba
Hi even we are getting the same errors with mysql 4.1.21 on RHEL4. its happening once in every 2 days.. please find the error log report.. key_buffer_size=402653184 read_buffer_size=2093056 max_used_connections=134 max_connections=250 threads_connected=19 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 1416214 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd=0xa4d74f08 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... Cannot determine thread, fp=0xa4f91bf4, backtrace may not be correct. Stack range sanity check OK, backtrace follows: 0x812b880 0x400397c8 (nil) 0x8126c83 0x815b222 0x8181169 0x8142d30 0x81445dd 0x813dcab 0x813d946 0x813d11f 0x40034332 0x420de867 New value of fp=(nil) failed sanity check, terminating stack trace! Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved stack trace is much more helpful in diagnosing the problem, so please do resolve it Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x9bfda70 = UPDATE Test_table TC, test_cc_ad_info_txt EX SET EX.test_click_no=UC.no, EX.temp_no=NULL WHERE UC.temp_no IS NOT NULL AND UC.temp_no=EX.temp_no thd->thread_id=1878119 The manual page at http://www.mysql.com/doc/en/Crashing.html contains information that should help you find out what is causing the crash. Number of processes running now: 0 070615 01:07:54 mysqld restarted 070615 1:08:01 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 070615 1:08:02 InnoDB: Starting log scan based on checkpoint at InnoDB: log sequence number 119 2107948288. InnoDB: Doing recovery: scanned up to log sequence number 119 2107949309 070615 1:08:03 InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percents: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 InnoDB: Apply batch completed InnoDB: In a MySQL replication slave the last master binlog file InnoDB: position 0 299139144, file name server3-bin.000176 InnoDB: Last MySQL binlog file position 0 89345055, file name ./server-bin.000799 070615 1:08:04 InnoDB: Flushing modified pages from the buffer pool... 070615 1:08:04 InnoDB: Started; log sequence number 119 2107949309 070615 1:08:04 [Warning] mysql.user table is not updated to new password format; Disabling new password usage until mysql_fix_privilege_tables is run /usr/sbin/mysqld: ready for connections. please let us know how to proceed. Regards, krishna
[15 Jun 2007 7:10]
kris dba
Hi, even we are getting the same errors with mysql 4.1.21 on RHEL4. its happening once in every 2 days.. please find the error log report.. key_buffer_size=402653184 read_buffer_size=2093056 max_used_connections=134 max_connections=250 threads_connected=19 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 1416214 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd=0xa4d74f08 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... Cannot determine thread, fp=0xa4f91bf4, backtrace may not be correct. Stack range sanity check OK, backtrace follows: 0x812b880 0x400397c8 (nil) 0x8126c83 0x815b222 0x8181169 0x8142d30 0x81445dd 0x813dcab 0x813d946 0x813d11f 0x40034332 0x420de867 New value of fp=(nil) failed sanity check, terminating stack trace! Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved stack trace is much more helpful in diagnosing the problem, so please do resolve it Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x9bfda70 = UPDATE Test_table TC, test_cc_ad_info_txt EX SET EX.test_click_no=UC.no, EX.temp_no=NULL WHERE UC.temp_no IS NOT NULL AND UC.temp_no=EX.temp_no thd->thread_id=1878119 The manual page at http://www.mysql.com/doc/en/Crashing.html contains information that should help you find out what is causing the crash. Number of processes running now: 0 070615 01:07:54 mysqld restarted 070615 1:08:01 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 070615 1:08:02 InnoDB: Starting log scan based on checkpoint at InnoDB: log sequence number 119 2107948288. InnoDB: Doing recovery: scanned up to log sequence number 119 2107949309 070615 1:08:03 InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percents: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 InnoDB: Apply batch completed InnoDB: In a MySQL replication slave the last master binlog file InnoDB: position 0 299139144, file name server3-bin.000176 InnoDB: Last MySQL binlog file position 0 89345055, file name ./server-bin.000799 070615 1:08:04 InnoDB: Flushing modified pages from the buffer pool... 070615 1:08:04 InnoDB: Started; log sequence number 119 2107949309 070615 1:08:04 [Warning] mysql.user table is not updated to new password format; Disabling new password usage until mysql_fix_privilege_tables is run /usr/sbin/mysqld: ready for connections. please let us know how to proceed. Regards, krishna
[15 Jun 2007 7:21]
Tung Teck Lee
Just to share on our experience, the issue went away when we moved to mysql 5.x, and added more memory to the system. Unfortunately, we are not conclusive which of the above changes fixed the issue.