Bug #64300 master crashes with sigfault as slave connects
Submitted: 11 Feb 2012 12:57 Modified: 26 Jun 2012 10:41
Reporter: Marius Schwarz Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.5.19 OS:Linux (FC15 32 bit)
Assigned to: CPU Architecture:Any
Tags: binlog_dump, crash, mysql-bin, slave_server

[11 Feb 2012 12:57] Marius Schwarz
Description:

This caused it somehow:

The Master was moved in the datacenter so it was shutdown normaly and was down for a few hours. It started up normaly afaik.

The normal connections to the database via mysql.sock worked fine. The application took care of the constant unnoticed restarts of the server, so 
the webserver did not notice any problems. In the aftermath , the users noted, that the app was running slow from time to time. 

This timeframe was the crash recovery restart, which happend every minute, caused by the slave doing his retry every 60 seconds. 

The first line of the log is the important one :

120211 12:36:50 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)

the mysql-bin file isn't that big, so there could not be a position 16278:

-rw-rw---- mysql/mysql    2415 2012-02-08 10:36 mysql/mysql-bin.000004

The bin file itself looked pretty ok, "mysqlbinlog" did not find any problem with it and i did not see one either.

I believe some function trusts the request position more than the actual binfile. 

I FIXED IT with resyncing the master to the slave and resetting the replication from scratch. 

+++++++++++++++++++++ LOG FILE ++++++++++++++++++++++++++++++

120211 12:36:50 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120211 12:36:50 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=24
max_threads=400
thread_count=3
connection_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 881510 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0xa280a20
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0xa67822cc thread_stack 0x30000
/usr/libexec/mysqld(my_print_stacktrace+0x42)[0x83db342]
/usr/libexec/mysqld(handle_segfault+0x564)[0x8142a54]
[0x47d400]
/usr/libexec/mysqld(_Z17mysql_binlog_sendP3THDPcyt+0x611)[0x81cf191]
/usr/libexec/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xcdc)[0x81be61c]
/usr/libexec/mysqld(_Z10do_commandP3THD+0xdd)[0x81bf59d]
/usr/libexec/mysqld(_Z24do_handle_one_connectionP3THD+0x140)[0x8254fd0]
/usr/libexec/mysqld(handle_one_connection+0x50)[0x8255090]
/lib/libpthread.so.0(+0x6a2e)[0x4f4a2e]
/lib/libc.so.6(clone+0x5e)[0x5e189e]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query ((nil)): is an invalid pointer
Connection ID (thread ID): 97
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

120211 12:36:50 mysqld_safe Number of processes running now: 0
120211 12:36:50 mysqld_safe mysqld restarted
120211 12:36:50 [Note] Plugin 'FEDERATED' is disabled.
120211 12:36:50 InnoDB: The InnoDB memory heap is disabled
120211 12:36:50 InnoDB: Mutexes and rw_locks use GCC atomic builtins
120211 12:36:50 InnoDB: Compressed tables use zlib 1.2.5
120211 12:36:50 InnoDB: Using Linux native AIO
120211 12:36:50 InnoDB: Initializing buffer pool, size = 128.0M
120211 12:36:50 InnoDB: Completed initialization of buffer pool
120211 12:36:50 InnoDB: highest supported file format is Barracuda.
120211 12:36:50  InnoDB: Waiting for the background threads to start
120211 12:36:51 InnoDB: 1.1.8 started; log sequence number 1650713
120211 12:36:51 [Note] Recovering after a crash using mysql-bin
120211 12:36:51 [Note] Starting crash recovery...
120211 12:36:51 [Note] Crash recovery finished.
120211 12:36:51 [Note] Event Scheduler: Loaded 0 events
120211 12:36:51 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.19-log'  socket: '/opt/root/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

+++++++++++++++++++++++++++++++++++++++++++

normal activities before the server move : 

120207 21:50:02 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000001, 17373797)
120207 23:44:42 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000002, 89664)
120208  0:45:42 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000003, 107)

and from here it failes every minute : 

120208 10:36:10 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:37:10 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:38:10 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:39:19 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:40:19 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:41:19 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:42:19 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:43:19 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:44:19 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:45:20 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:46:20 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:47:20 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:48:20 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:49:20 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:50:20 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:51:20 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:52:20 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:53:20 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
120208 10:54:21 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000004, 16278)
1

How to repeat:
I whish I knew.

if you wanne have the mysql-bin.000004 file , tell me where you want it.

Suggested fix:
Add a sanitycheck to the master binlog_dump() function.
[11 Feb 2012 13:08] Valeriy Kravchuk
Looks like a duplicate of Bug #64035. Please, check.
[11 Feb 2012 14:58] Marius Schwarz
the produced error is different, so i would tend to say, it's another function call, but caused by the same underlying problem: no sanity check. 

As soon as 5.5.21 is available for fedora, i could check it.
[11 Feb 2012 16:12] Valeriy Kravchuk
Please, share the results of your check with 5.5.21 when available.
[12 Mar 2012 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[29 Mar 2012 13:34] Adam Bardonski
Hallo Hr. Schwarz, 

falls Sie der Marius Schwarz von Power-Netz sind, so würde ich mich über Ihre neuen Kontaktdaten freuen. VG, Adam Bardonski, stream24
[29 Mar 2012 21:08] Adam Bardonski
Hallo Hr. Schwarz, 

mit Bezug auf unser Telefonat: Ihre Email ist noch nicht angekommen, bitte noch einmal an info [@] stream24 [punkt] de versuchen.
[26 May 2012 8:50] puja pachchigar
120525 11:58:54 [Note] Start binlog_dump to slave_server(1), pos(mysql-bin.000447, 19326968)
120525 11:58:57 [Note] Start binlog_dump to slave_server(7), pos(mysql-bin.000447, 19326968)
*** glibc detected *** /usr/sbin/mysqld: malloc(): memory corruption: 0x00007f75ec192380 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x750c6)[0x7f78f93ea0c6]
/lib64/libc.so.6(+0x78808)[0x7f78f93ed808]
/lib64/libc.so.6(__libc_malloc+0x6d)[0x7f78f93ee48d]
/usr/sbin/mysqld(my_malloc+0x32)[0x79eae2]
/usr/sbin/mysqld(alloc_root+0xcb)[0x7986cb]
/usr/sbin/mysqld(_Z10sql_callocm+0x18)[0x605fd8]
/usr/sbin/mysqld(_ZN12Item_func_spC1EP23Name_resolution_contextP7sp_nameR4ListI4ItemE+0x82)[0x6c7a72]
/usr/sbin/mysqld(_ZN14Create_sp_func6createEP3THD19st_mysql_lex_stringS2_bP4ListI4ItemE+0x14b)[0x6b08db]
/usr/sbin/mysqld(_Z10MYSQLparsePv+0xe6c0)[0x64e400]
/usr/sbin/mysqld(_Z9parse_sqlP3THDP12Parser_stateP19Object_creation_ctx+0x94)[0x57e4f4]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x79)[0x588ae9]
/usr/sbin/mysqld(_ZN15Query_log_event14do_apply_eventEPK14Relay_log_infoPKcj+0xce8)[0x722d18]
/usr/sbin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x12d)[0x5263ed]
/usr/sbin/mysqld[0x52b1b2]
/usr/sbin/mysqld(handle_slave_sql+0x971)[0x52c5c1]
/lib64/libpthread.so.0(+0x77f1)[0x7f78fa1e57f1]
/lib64/libc.so.6(clone+0x6d)[0x7f78f945a70d]
======= Memory map: ========
00400000-00d05000 r-xp 00000000 fd:00 20449843                           /usr/sbin/mysqld
00f05000-0101c000 rw-p 00905000 fd:00 20449843                           /usr/sbin/mysqld
0101c000-01048000 rw-p 00000000 00:00 0
01c21000-152f2000 rw-p 00000000 00:00 0                                  [heap]
7f7458000000-7f7459c0a000 rw-p 00000000 00:00 0
7f7459c0a000-7f745c000000 ---p 00000000 00:00 0
7f7460000000-7f746180f000 rw-p 00000000 00:00 0
7f746180f000-7f7464000000 ---p 00000000 00:00 0
7f7464000000-7f746580f000 rw-p 00000000 00:00 0
7f746580f000-7f7468000000 ---p 00000000 00:00 0
7f7468000000-7f7469428000 rw-p 00000000 00:00 0
7f7469428000-7f746c000000 ---p 00000000 00:00 0
7f746c000000-7f746dc0a000 rw-p 00000000 00:00 0

06:30:00 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

key_buffer_size=8388608
read_buffer_size=4194304
max_used_connections=99
max_threads=500
thread_count=94
connection_count=94
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4109895 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f75ec000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f760de24838 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7a30b5]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x67c204]
/lib64/libpthread.so.0(+0xf4a0)[0x7f78fa1ed4a0]
/lib64/libc.so.6(gsignal+0x35)[0x7f78f93a7885]
/lib64/libc.so.6(abort+0x175)[0x7f78f93a9065]
/lib64/libc.so.6(+0x6f7a7)[0x7f78f93e47a7]
/lib64/libc.so.6(+0x750c6)[0x7f78f93ea0c6]
/lib64/libc.so.6(+0x78808)[0x7f78f93ed808]
/lib64/libc.so.6(__libc_malloc+0x6d)[0x7f78f93ee48d]
/usr/sbin/mysqld(my_malloc+0x32)[0x79eae2]
/usr/sbin/mysqld(alloc_root+0xcb)[0x7986cb]
/usr/sbin/mysqld(_Z10sql_callocm+0x18)[0x605fd8]
/usr/sbin/mysqld(_ZN12Item_func_spC1EP23Name_resolution_contextP7sp_nameR4ListI4ItemE+0x82)[0x6c7a72]
/usr/sbin/mysqld(_ZN14Create_sp_func6createEP3THD19st_mysql_lex_stringS2_bP4ListI4ItemE+0x14b)[0x6b08db]
/usr/sbin/mysqld(_Z10MYSQLparsePv+0xe6c0)[0x64e400]
/usr/sbin/mysqld(_Z9parse_sqlP3THDP12Parser_stateP19Object_creation_ctx+0x94)[0x57e4f4]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x79)[0x588ae9]
/usr/sbin/mysqld(_ZN15Query_log_event14do_apply_eventEPK14Relay_log_infoPKcj+0xce8)[0x722d18]
/usr/sbin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x12d)[0x5263ed]
/usr/sbin/mysqld[0x52b1b2]
/usr/sbin/mysqld(handle_slave_sql+0x971)[0x52c5c1]
/lib64/libpthread.so.0(+0x77f1)[0x7f78fa1e57f1]
/lib64/libc.so.6(clone+0x6d)[0x7f78f945a70d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f75ec192190): is an invalid pointer
Connection ID (thread ID): 2
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
120525 12:00:00 mysqld_safe Number of processes running now: 0
120525 12:00:00 mysqld_safe mysqld restarted
120525 12:00:00 InnoDB: The InnoDB memory heap is disabled
120525 12:00:00 InnoDB: Mutexes and rw_locks use GCC atomic builtins
120525 12:00:00 InnoDB: Compressed tables use zlib 1.2.3
120525 12:00:00 InnoDB: Using Linux native AIO
120525 12:00:00 InnoDB: Initializing buffer pool, size = 10.0G
120525 12:00:01 InnoDB: Completed initialization of buffer pool
120525 12:00:01 InnoDB: highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 1211375317464
120525 12:00:01  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 1211377058147
120525 12:00:01  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
InnoDB: Apply batch completed
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 5132519, file name mysql-bin.031309
InnoDB: Last MySQL binlog file position 0 2153318, file name /db/mysql-bin-log/mysql-bin.000448
120525 12:00:02  InnoDB: Waiting for the background threads to start
120525 12:00:03 InnoDB: 1.1.8 started; log sequence number 1211377058147
120525 12:00:03 [Note] Recovering after a crash using /db/mysql-bin-log/mysql-bin
120525 12:00:03 [Note] Starting crash recovery...
120525 12:00:03 [Note] Crash recovery finished.
120525 12:00:03 [Note] Slave I/O thread: connected to master 'replication@192.168.3.4:3306',replication started in log 'mysql-bin.064178' at position 5858411
120525 12:00:03 [Note] Event Scheduler: Loaded 0 events
120525 12:00:03 [Note] /usr/sbin/mysqld: ready for connections.
[26 May 2012 10:41] Valeriy Kravchuk
We still need to know if this is repeatable with recent 5.5.x versions.
[27 Jun 2012 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".