| Bug #3559 | Replication crash when slave under high load | ||
|---|---|---|---|
| Submitted: | 25 Apr 2004 21:53 | Modified: | 15 May 2004 12:44 | 
| Reporter: | Dave OReilly | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S2 (Serious) | 
| Version: | 4.0.18-log | OS: | Linux (Redhat linux 9) | 
| Assigned to: | Heikki Tuuri | CPU Architecture: | Any | 
   [25 Apr 2004 23:39]
   Heikki Tuuri        
  Dave, please show the complete .err log so that we see what has stuck and where. Looks like a log write never completes. Usually, these file write hangs are bugs in the OS or its drivers. Regards, Heikki
   [26 Apr 2004 14:03]
   Dave OReilly        
  Complete error log
Attachment: mysql.slave.err (text/plain), 167.84 KiB.
   [30 Apr 2004 2:11]
   Dave OReilly        
  Complete log file has been attached, regards, daveor
   [15 May 2004 12:44]
   Heikki Tuuri        
  Hi! This looks like the OS is stuck doing a pwrite() to an InnoDB log file. The situation stays like below for an extended time. This is probably a bug in the OS/drivers/hardware, not in InnoDB. Regards, Heikki -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: doing file i/o (log thread) I/O thread 2 state: waiting for i/o request (read thread) I/O thread 3 state: waiting for i/o request (write thread) Pending normal aio reads: 0, aio writes: 0, ibuf aio reads: 0, log i/o's: 1, sync i/o's: 0 Pending flushes (fsync) log: 1; buffer pool: 0 70 OS file reads, 846 OS file writes, 812 OS fsyncs 0 pending preads, 1 pending pwrites 0.00 reads/s, 0 avg bytes/read, 0.87 writes/s, 0.87 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf for space 0: size 1, free list len 0, seg size 2, 0 inserts, 0 merged recs, 0 merges Hash table size 34679, used cells 1, node heap has 1 buffer(s) 0.87 hash searches/s, 0.00 non-hash searches/s --- LOG --- Log sequence number 0 1998465 Log flushed up to 0 1998403 Last checkpoint at 0 1934642 1 pending log writes, 1 pending chkp writes 793 log i/o's done, 0.87 log i/o's/second


Description: I have been testing replication and came up against the following issue: 1) Set up replication between master and slave 2) Start a process that inserts 1,000,000 rows into an Innodb table on the master 3) Start a process on the slave to push the load high (while(1);). Mysql appears to crash with the following errors in the log (on the slave): InnoDB: Warning: a long semaphore wait: --Thread 2963024816 has waited at log0log.c line 1876 for 625.00 seconds the semaphore: S-lock on RW-latch at b51d19a0 created in file log0log.c line 749 a writer (thread id 2963024816) has reserved it in mode exclusive number of readers 0, waiters flag 1 Last time read locked in file log0log.c line 1876 Last time write locked in file log0log.c line 1678 InnoDB: Error: semaphore wait has lasted > 600 seconds InnoDB: We intentionally crash the server, because it appears to be hung. 040426 14:46:11 InnoDB: Assertion failure in thread 2973514672 in file sync0arr.c line 934 InnoDB: Failing assertion: 0 InnoDB: We intentionally generate a memory trap. InnoDB: Send a detailed bug report to mysql@lists.mysql.com mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=402653184 read_buffer_size=2093056 max_used_connections=0 max_connections=100 threads_connected=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 802415 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd=(nil) Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... Cannot determine thread, fp=0xb13c355c, backtrace may not be correct. Stack range sanity check OK, backtrace follows: 0x80f7893 0xb756ae48 (nil) 0xb7564dac 0xb7424a8a New value of fp=(nil) failed sanity check, terminating stack trace! Please read http://www.mysql.com/doc/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved stack trace is much more helpful in diagnosing the problem, so please do resolve it The manual page at http://www.mysql.com/doc/en/Crashing.html contains information that should help you find out what is causing the crash. Number of processes running now: 0 040426 14:46:11 mysqld restarted 040426 14:46:11 InnoDB: Database was not shut down normally. InnoDB: Starting recovery from log files... InnoDB: Starting log scan based on checkpoint at InnoDB: log sequence number 0 1964179 InnoDB: Doing recovery: scanned up to log sequence number 0 2007620 040426 14:46:11 InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percents: 98 99 InnoDB: Apply batch completed InnoDB: In a MySQL replication slave the last master binlog file InnoDB: position 0 159031633, file name jimbo-bin.009 InnoDB: Last MySQL binlog file position 0 362, file name ./wonko-bin.022 040426 14:46:12 InnoDB: Flushing modified pages from the buffer pool... 040426 14:46:13 InnoDB: Started 040426 14:46:13 Slave I/O thread: connected to master 'repl@10.xx.xx.xx:3306', replication started in log 'jimbo-bin.009' at position 195395122 /usr/local/libexec/mysqld: ready for connections. Version: '4.0.18-log' socket: '/tmp/mysql.sock' port: 3306 ERROR: 1062 Duplicate entry '908' for key 1 040426 14:46:13 Slave: Error 'Duplicate entry '908' for key 1' on query 'insert into davetest1(name) values('hello')'. Default database: 'repmaster', Error_code: 1062 040426 14:46:13 Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'jimbo-bin.009' position 159031522 How to repeat: see above Suggested fix: don't know