Bug #3559 Replication crash when slave under high load
Submitted: 25 Apr 2004 21:53 Modified: 15 May 2004 12:44
Reporter: Dave OReilly Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:4.0.18-log OS:Linux (Redhat linux 9)
Assigned to: Heikki Tuuri CPU Architecture:Any

[25 Apr 2004 21:53] Dave OReilly
Description:
I have been testing replication and came up against the following issue:

1) Set up replication between master and slave
2) Start a process that inserts 1,000,000 rows into an Innodb table on the master
3) Start a process on the slave to push the load high (while(1);).

Mysql appears to crash with the following errors in the log (on the slave):

InnoDB: Warning: a long semaphore wait:
--Thread 2963024816 has waited at log0log.c line 1876 for 625.00 seconds the semaphore:
S-lock on RW-latch at b51d19a0 created in file log0log.c line 749
a writer (thread id 2963024816) has reserved it in mode exclusive
number of readers 0, waiters flag 1
Last time read locked in file log0log.c line 1876
Last time write locked in file log0log.c line 1678
InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
040426 14:46:11  InnoDB: Assertion failure in thread 2973514672 in file sync0arr.c line 934
InnoDB: Failing assertion: 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Send a detailed bug report to mysql@lists.mysql.com
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=402653184
read_buffer_size=2093056
max_used_connections=0
max_connections=100
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 802415 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
thd=(nil)
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xb13c355c, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x80f7893
0xb756ae48
(nil)
0xb7564dac
0xb7424a8a
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://www.mysql.com/doc/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.

Number of processes running now: 0
040426 14:46:11  mysqld restarted
040426 14:46:11  InnoDB: Database was not shut down normally.
InnoDB: Starting recovery from log files...
InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 0 1964179
InnoDB: Doing recovery: scanned up to log sequence number 0 2007620
040426 14:46:11  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 98 99
InnoDB: Apply batch completed
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 159031633, file name jimbo-bin.009
InnoDB: Last MySQL binlog file position 0 362, file name ./wonko-bin.022
040426 14:46:12  InnoDB: Flushing modified pages from the buffer pool...
040426 14:46:13  InnoDB: Started
040426 14:46:13  Slave I/O thread: connected to master 'repl@10.xx.xx.xx:3306',  replication started in log 'jimbo-bin.009' at position 195395122
/usr/local/libexec/mysqld: ready for connections.
Version: '4.0.18-log'  socket: '/tmp/mysql.sock'  port: 3306
ERROR: 1062  Duplicate entry '908' for key 1
040426 14:46:13  Slave: Error 'Duplicate entry '908' for key 1' on query 'insert into davetest1(name) values('hello')'. Default database: 'repmaster', Error_code: 1062
040426 14:46:13  Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'jimbo-bin.009' position 159031522

How to repeat:
see above

Suggested fix:
don't know
[25 Apr 2004 23:39] Heikki Tuuri
Dave,

please show the complete .err log so that we see what has stuck and where. Looks like a log write never completes.

Usually, these file write hangs are bugs in the OS or its drivers.

Regards,

Heikki
[26 Apr 2004 14:03] Dave OReilly
Complete error log

Attachment: mysql.slave.err (text/plain), 167.84 KiB.

[30 Apr 2004 2:11] Dave OReilly
Complete log file has been attached, 
regards, 
daveor
[15 May 2004 12:44] Heikki Tuuri
Hi!

This looks like the OS is stuck doing a pwrite() to an InnoDB log file. The situation stays like below for an extended time. This is probably a bug in the OS/drivers/hardware, not in InnoDB.

Regards,

Heikki

--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: doing file i/o (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
 ibuf aio reads: 0, log i/o's: 1, sync i/o's: 0
Pending flushes (fsync) log: 1; buffer pool: 0
70 OS file reads, 846 OS file writes, 812 OS fsyncs
0 pending preads, 1 pending pwrites
0.00 reads/s, 0 avg bytes/read, 0.87 writes/s, 0.87 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf for space 0: size 1, free list len 0, seg size 2,
0 inserts, 0 merged recs, 0 merges
Hash table size 34679, used cells 1, node heap has 1 buffer(s)
0.87 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 0 1998465
Log flushed up to   0 1998403
Last checkpoint at  0 1934642
1 pending log writes, 1 pending chkp writes
793 log i/o's done, 0.87 log i/o's/second