MySQL Bugs: #65152: MySQL can't start if relay logs are removed and skip_slave

Bug #65152	MySQL can't start if relay logs are removed and skip_slave_start = 0
Submitted:	30 Apr 2012 5:39	Modified:	4 Jun 2012 10:21
Reporter:	Joffrey MICHAIE	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.6.5m8	OS:	Linux (RHEL 6.2 64bits)
Assigned to:		CPU Architecture:	Any
Tags:	regression, remove relay logs, skip_slave_start, start failure

Description:
If you remove the relay-logs, or change the relay-log parameter, while a replication is enabled, mysql fails to restart (and loops if you use mysqld_safe) with the following error :

120430  1:30:04 InnoDB: The InnoDB memory heap is disabled
120430  1:30:04 InnoDB: Mutexes and rw_locks use GCC atomic builtins
120430  1:30:04 InnoDB: Compressed tables use zlib 1.2.3
120430  1:30:04 InnoDB: Using Linux native AIO
120430  1:30:04 InnoDB: CPU supports crc32 instructions
120430  1:30:04 InnoDB: Initializing buffer pool, size = 64.0M
120430  1:30:04 InnoDB: Completed initialization of buffer pool
120430  1:30:04 InnoDB: highest supported file format is Barracuda.
120430  1:30:04 InnoDB: 128 rollback segment(s) are active.
120430  1:30:04 InnoDB: Waiting for the background threads to start
120430  1:30:04 InnoDB: 1.2.5 started; log sequence number 13152792
120430  1:30:04 [Note] Recovering after a crash using mysql-bin
120430  1:30:04 [Note] Starting crash recovery...
120430  1:30:04 [Note] Crash recovery finished.
120430  1:30:04 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0
05:30:04 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=0
max_threads=100
thread_count=0
connection_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 47920 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f0df8000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f0e1458a810 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8920a5]
120430  1:30:04 [Warning] Storing MySQL user name or password information in the master.info repository is not secure and is therefore not recommended. Please see the MySQL Manual for more about this issue and possible alternatives.
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x644214]
120430  1:30:04 [ERROR] Slave I/O: error connecting to master 'replic@10.52.250.245:3306' - retry-time: 60  retries: 1, Error_code: 2003
/lib64/libpthread.so.0(+0xf4a0)[0x7f0e17a534a0]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG12find_log_posEP11st_log_infoPKcb+0x153)[0x844f33]
/usr/sbin/mysqld(_ZN14Relay_log_info18init_relay_log_posEPKcybPS1_b+0x191)[0x86c811]
120430  1:30:04 [Note] Event Scheduler: Loaded 0 events
/usr/sbin/mysqld(handle_slave_sql+0x7b9)[0x863ee9]
/lib64/libpthread.so.0(+0x77f1)[0x7f0e17a4b7f1]
/lib64/libc.so.6(clone+0x6d)[0x7f0e167b5ccd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 1
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
120430  1:30:04 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.5-m8-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)
120430 01:30:04 mysqld_safe Number of processes running now: 0
120430 01:30:04 mysqld_safe mysqld restarted

How to repeat:
1) Enable replication between 2x MySQL 5.6 servers (config file and commands present in bug http://bugs.mysql.com/bug.php?id=65151)

2) Stop MySQL
[root@nowhere ~]# mysqladmin shutdown

3) Remove relaylogs
[root@nowhere ~]# rm -rf /var/lib/mysql/*relay*

4) Start MySQL
[root@ip-10-54-79-221 ~]# /etc/init.d/mysql start
Starting MySQL....                                         [  OK  ]

5) Try to connect to mysql ...
[root@nowhere ~]# mysql
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

6) Stop the safe_mysqld looping script (dangerous command)
[root@nowhere ~]# killall -9 mysqld_safe

7) Add skip_slave_start in [mysqld] section of /etc/mysql/my.cnf

8) Start MySQL
/etc/init.d/mysql start
9) Start Replication
mysql> START SLAVE;

Note : I didn't try with converting the mysql.*repl* table to InnoDB (as it will be in 5.6.6), but don't think it will help

Suggested fix:
changing the relay-log parameter (or removing relay-log files) should not make mysql crash if skip_slave_start is not set

Thank you for the report.

Verified as described.

Backtrace in my case is a bit different:

mysqld: /home/ssmirnov/blade12/src/mysql-trunk/sql/rpl_slave.cc:5156: void* handle_slave_sql(void*): Assertion `rli->inited' failed.
09:09:16 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
o
...
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/home/ssmirnov/blade12/src/mysql-trunk/sql/mysqld(my_print_stacktrace+0x35)[0xabd669]
/home/ssmirnov/blade12/src/mysql-trunk/sql/mysqld(handle_fatal_signal+0x412)[0x755eee]
/lib64/libpthread.so.0[0x3429e0dd40]
/lib64/libc.so.6(gsignal+0x35)[0x3429230015]
/lib64/libc.so.6(abort+0x110)[0x3429231980]
/lib64/libc.so.6(__assert_fail+0xf6)[0x3429229726]
/home/ssmirnov/blade12/src/mysql-trunk/sql/mysqld(handle_slave_sql+0xcf)[0xa763d0]
/home/ssmirnov/blade12/src/mysql-trunk/sql/mysqld(pfs_spawn_thread+0x16b)[0xc95568]
/lib64/libpthread.so.0[0x3429e061b5]
/lib64/libc.so.6(clone+0x6d)[0x34292cd39d]

Problem is not repeatable with version 5.5

Fixed in 5.6. Documented as follows in the MySQL 5.6.6 changelog:

      If the relay logs were removed after the server was stopped, without
      stopping replication first, the server could not be started correctly.

Closed.