Bug #58063 rpl_server_uuid segfaults on win
Submitted: 8 Nov 2010 20:59 Modified: 30 Nov 2010 22:17
Reporter: Andrei Elkin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.6-next-mr-wl2540 OS:Any
Assigned to: Luis Soares CPU Architecture:Any
Tags: wl2540

[8 Nov 2010 20:59] Andrei Elkin
Description:
The failure happens sporadically on Win builds of checksum-capable server.
The latter has specifics in that slave and master negotiate the checksum
capabilities that adds a piece of code in get_master_version_and_clock().
As logs show, slave can't find just have been set value of its session
@master_binlog_checksum to get back NULL as the value instead.
That results in a segfault as the following lines witness:

https://central.sun.net/http://pb2.norway.sun.com/?action=archive_download&archive_id=2498...

rpl.rpl_server_uuid 'mix' [ fail ]
        Test ended at 2010-11-08 14:10:11

CURRENT_TEST: rpl.rpl_server_uuid
mysqltest: In included file ".\include\stop_slave.inc": At line 18: query 'STOP SLAVE' failed: 2013: Lost connection to MySQL server during query

The result from queries just before the failure was:
< snip >
# equal to slave's UUID unless --replicate-same-server-id # option is set.
-----------------------------------------------------------------------------
START SLAVE IO_THREAD;

# slave2 is running with --replicate-same-server-id option
CHANGE MASTER TO MASTER_HOST= '127.0.0.1',
MASTER_PORT= MASTER_PORT,
MASTER_USER= 'root',
MASTER_LOG_FILE='master-bin.000001';
include/start_slave.inc
include/stop_slave.inc

# Case 4: # When executing 'CHANGE MASTER ...', master's UUID and server_id will be # cleared if master_host or/and master_port are changed. # Master's UUID and server_id will not be cleared if both master_port # and master_host are not changed.
-----------------------------------------------------------------------------
START SLAVE IO_THREAD;

More results from queries before failure can be found in H:\pb2\test\sb_2-2496646-1289217987.85\mysql-5.6.99-m5-win-x86-test\mysql-test\var-rpl_binlog_checksum\log\rpl_server_uuid.log

Server [mysqld.2 - pid: 6192, winpid: 6192, exit: 1280] failed during test run
Server log from this test:
101108 16:09:36 [Note] Plugin 'FEDERATED' is disabled.
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use Windows interlocked functions
InnoDB: Compressed tables use zlib 1.2.3
101108 14:09:36  InnoDB: highest supported file format is Barracuda.
101108 14:09:36 InnoDB 1.2.0 started; log sequence number 1595675
101108 16:09:36 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 6ff1e8c6-eb39-11df-ab62-001e6849abdd.
101108 16:09:36 [Note] Event Scheduler: Loaded 0 events
101108 16:09:36 [Note] H:/pb2/test/sb_2-2496646-1289217987.85/mysql-5.6.99-m5-win-x86-test/sql/Debug/mysqld.exe: ready for connections.
Version: '5.6.99-m5-debug-log'  socket: ''  port: 16611  Source distribution
101108 16:09:38 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='16610', master_log_file='', master_log_pos='4'.
101108 16:09:38 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log '.\slave-relay-bin.000001' position: 4
101108 16:09:38 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16610',replication started in log 'FIRST' at position 4
101108 16:09:38 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
101108 16:09:38 [Note] Slave I/O thread killed while reading event
101108 16:09:38 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 2414
101108 16:09:38 [Note] Error reading relay log event: slave SQL thread was killed
101108 16:09:38 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 2414, relay log '.\slave-relay-bin.000003' position: 2572
101108 16:09:38 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16610',replication started in log 'master-bin.000001' at position 2414
101108 16:09:38 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
101108 16:09:38 [Note] Slave I/O thread killed while reading event
101108 16:09:38 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 2414
101108 16:09:38 [Note] Error reading relay log event: slave SQL thread was killed
101108 16:09:49 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16610',replication started in log 'master-bin.000001' at position 2414
101108 16:09:49 [ERROR] Slave I/O: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server UUIDs; these UUIDs must be different for replication to work. Error_code: 1593
101108 16:09:49 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 2414
101108 16:10:10 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16610',replication started in log 'master-bin.000001' at position 2414
101108 16:10:10 - mysqld got exception 0xc0000005 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=2
max_threads=151
thread_count=2
connection_count=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60173 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x5096678
...

How to repeat:
See the description; check PB2 failures with the test.

Suggested fix:
Using this frame to leave the rest of the log that can hint on how to fix.

Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
00E4842D    mysqld.exe!find_type()[typelib.c:91]
00C7E7D3    mysqld.exe!get_master_version_and_clock()[rpl_slave.cc:1746]
00C7C7D9    mysqld.exe!handle_slave_io()[rpl_slave.cc:3231]
00E0F3EB    mysqld.exe!pfs_spawn_thread()[pfs.cc:1078]
00E26988    mysqld.exe!pthread_start()[my_winthread.c:61]
00F01FB3    mysqld.exe!_callthreadstartex()[threadex.c:348]
00F01F54    mysqld.exe!_threadstartex()[threadex.c:331]
778CD0E9    kernel32.dll!BaseThreadInitThunk()
77DC19BB    ntdll.dll!RtlInitializeExceptionChain()
77DC198E    ntdll.dll!RtlInitializeExceptionChain()
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 00000000=(null)
thd->thread_id=9
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
Minidump written to H:\pb2\test\sb_2-2496646-1289217987.85\mysql-5.6.99-m5-win-x86-test\mysql-test\var-rpl_binlog_checksum\mysqld.2\data\mysqld.dmp

 - saving 'H:/pb2/test/sb_2-2496646-1289217987.85/mysql-5.6.99-m5-win-x86-test/mysql-test/var-rpl_binlog_checksum/log/rpl.rpl_server_uuid-mix/' to 'H:/pb2/test/sb_2-2496646-1289217987.85/mysql-5.6.99-m5-win-x86-test/mysql-test/var-rpl_binlog_checksum/log/rpl.rpl_server_uuid-mix/'
 - found 'mysqld.dmp' (0/5)

Trying 'cdb' to get a backtrace
OS debug symbols will be downloaded and stored in C:\cdb_symbols.
You can control the location of symbol cache with _NT_SYMBOL_PATH
environment variable. Please refer to Microsoft KB article
http://support.microsoft.com/kb/311503  for details about _NT_SYMBOL_PATH
-------------------------------------------------------------------------
Output from cdb follows. Faulting thread is printed twice,with and without function parameters
Search for STACK_TEXT to see the stack trace of 
the faulting thread. Callstacks of other threads are printed after it.

Microsoft (R) Windows Debugger Version 6.11.0001.402 X86
Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [H:\pb2\test\sb_2-2496646-1289217987.85\mysql-5.6.99-m5-win-x86-test\mysql-test\var-rpl_binlog_checksum\log\rpl.rpl_server_uuid-mix\mysqld.2\data\mysqld.dmp]
User Mini Dump File: Only registers, stack and portions of memory are available

Symbol search path is: C:\Windows\System32;H:\pb2\test\sb_2-2496646-1289217987.85\mysql-5.6.99-m5-win-x86-test\sql\Debug;.;srv*C:\cdb_symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: C:\Windows\System32;H:\pb2\test\sb_2-2496646-1289217987.85\mysql-5.6.99-m5-win-x86-test\sql\Debug;.
Windows Server 2008/Windows Vista Version 6002 (Service Pack 2) MP (8 procs) Free x86 compatible
Product: Server, suite: Enterprise TerminalServer SingleUserTS
Machine Name:
Debug session time: Mon Nov  8 14:10:10.000 2010 (GMT+1)
System Uptime: not available
Process Uptime: 0 days 0:00:34.000
..............................
Tmysqld!find_type:
         movsx   edx,byte ptr [ecx]         ds:0023:00000000=??
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mysqld!find_type [g:\pb2\build\sb_0-2496415-1289215914.56\mysql-5.6.99-m5-win-x86\mysys\typelib.c @ 91]
char * x = 0x00000000 ""
struct st_typelib * typelib = 0x011c9c0c
unsigned int full_name = 1
int pos = 0
char * j = 0x0108df54 "NONE"
int find = 0
struct _db_stack_frame_ _db_stack_frame_ = struct _db_stack_frame_
char * i = 0x00000000 ""
int findpos = 0
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mysqld!get_master_version_and_clock [g:\pb2\build\sb_0-2496415-1289215914.56\mysql-5.6.99-m5-win-x86\sql\rpl_slave.cc @ 1746]
char [54] query = char [54] "SET @master_binlog_checksum= @@global.binlog_checksum"
int rc = 0
struct st_mysql * mysql = 0x05083178
class Master_info * mi = 0x0502b768
struct st_mysql_res * master_res = 0x04fd3220
struct _db_stack_frame_ _db_stack_frame_ = struct _db_stack_frame_
int err_code = 0
char * errmsg = 0x00000000 ""
char ** master_row = 0x05102640
char [1024] err_buff = char [1024] "???"
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mysqld!handle_slave_io [g:\pb2\build\sb_0-2496415-1289215914.56\mysql-5.6.99-m5-win-x86\sql\rpl_slave.cc @ 3231]
void * arg = 0x0502b768
class Master_info * mi = 0x0502b768
char [22] llbuff = char [22] "2414"
bool suppress_warnings = true
unsigned int retry_count_event = 0
int ret = -858993460
struct _db_stack_frame_ _db_stack_frame_ = struct _db_stack_frame_
class Relay_log_info * rli = 0x04febfd8
struct st_mysql * mysql = 0x05083178
unsigned int retry_count = 0
unsigned int retry_count_reg = 0
class THD * thd = 0x05096678
unsigned int retry_count_dump = 0
[8 Nov 2010 21:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/123156

3194 Andrei Elkin	2010-11-08
      Bug #58063  	rpl_server_uuid segfaults on win
      
      This is an intermediate patch to get more details out of the failure.
     @ sql/log_event.cc
        compilation warning is fixed.
     @ sql/rpl_slave.cc
        Printing out more info in case Win run of rpl_server_id discovers unexpected NULL
        as the value of @master_binlog_checksum.
[9 Nov 2010 6:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/123169

3195 Andrei Elkin	2010-11-09
      Bug #58063
      
      The 2nd commit disables rpl_delayed_slave due to
      Bug@57514  rpl_delayed_slave fails sporadically in pb
[11 Nov 2010 2:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/123512

3196 Luis Soares	2010-11-11
      BUG#58063: rpl_server_uuid segfaults on win
      
      While connecting to the master the slave thread sets a session
      variable (@master_binlog_checksum). Then, later, it queries the
      master for this variable content. However, in this test, the
      master is restarted a few times, which can cause the slave to
      reconnect between the time it sets and the time it queries the 
      session var @master_binlog_checksum on the master. 
      
      Ultimately, this can cause the slave to receive NULL when
      SELECTing the var after a master restart, causing the slave to 
      crash.
      
      We fix this by checking whether the value is NULL or not before
      taking any action based on the var value.
      
      Other changes in this cset:
      
      1. Fix for rpl_checksum_cache failure in valgrind
      
         The test takes too long when using RBR and times out because
         the slave takes too long time to replay the row events for a
         table without a primary key. We fix it by adding a PK to the
         table, which makes the slave to find the rows to delete.
      
      2. rpl_corruption fails with errors in valgrind
      
         This is a bug uncovered while testing WL#2540. Details are
         available in BUG#58129. We workaround it by replacing the
         offending part of the test (ie, do not use TEXT type and use
         VARCHAR instead).
[11 Nov 2010 23:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/123672

3198 Luis Soares	2010-11-11
      BUG#58063: rpl_server_uuid segfaults on win
      
      Follow up patch removing unused thread_id variable.
[11 Nov 2010 23:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/123673

3198 Luis Soares	2010-11-11
      BUG#58063: rpl_server_uuid segfaults on win
      
      Follow up patch removing unused thread_id variable.
      
      Fixed also in this cset a couple more compile warnings.
[29 Nov 2010 11:12] Bugs System
Pushed into mysql-trunk 5.6.1-m5 (revid:alexander.nozdrin@oracle.com-20101129111021-874if2qsp0i8d5ze) (version source revid:alexander.nozdrin@oracle.com-20101129111021-874if2qsp0i8d5ze) (merge vers: 5.6.1-m5) (pib:23)
[30 Nov 2010 22:17] Jon Stephens
Issue does not appear in a release; no changelog entry required. Closed.