MySQL Bugs: #69902: slave asserts after 5.6 upgrade

Bug #69902	slave asserts after 5.6 upgrade
Submitted:	1 Aug 2013 18:37	Modified:	21 Oct 2013 15:48
Reporter:	George Lorch	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S1 (Critical)
Version:	5.6.12, 5.6.13	OS:	Linux
Assigned to:		CPU Architecture:	Any
Tags:	debug build

Description:
In place upgrade of slave from MySQL community 5.5.32 to 5.6.12 fails with assertion:

[Warning] Neither -relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '-relay-log=mysql-relay-bin' to avoid this problem.
mysqld: /home/glorch/dev/ps-5.6/Percona-Server/sql/rpl_rli.cc:2133: virtual bool Relay_log_info::read_info(Rpl_info_handler*): Assertion `lines < LINES_IN_RELAY_LOG_INFO_WITH_ID || (lines >= LINES_IN_RELAY_LOG_INFO_WITH_ID && internal_id == 1)' failed.
...
stack_bottom = 0 thread_stack 0x40000
mysqld(my_print_stacktrace+0x35)[0xab4878]
mysqld(handle_fatal_signal+0x42e)[0x72cfea]
/lib64/libpthread.so.0[0x360dc0f500]
/lib64/libc.so.6(gsignal+0x35)[0x360d8328a5]
/lib64/libc.so.6(abort+0x175)[0x360d834085]
/lib64/libc.so.6[0x360d82ba1e]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x360d82bae0]
mysqld(_ZN14Relay_log_info9read_infoEP16Rpl_info_handler+0x349)[0xa82eab]
mysqld(_ZN14Relay_log_info13rli_init_infoEv+0x77c)[0xa824e4]
mysqld(_Z16global_init_infoP11Master_infobi+0x178)[0xa62f37]
mysqld(_Z10init_slavev+0x10e)[0xa62a25]
mysqld(_Z11mysqld_mainiPPc+0x9a9)[0x62814c]
mysqld(main+0x20)[0x61d4d4]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x360d81ecdd]
mysqld[0x61d3f9]

How to repeat:
Using the upgrade procedure outlined here https://dev.mysql.com/doc/refman/5.6/en/replication-upgrade.htm

- create master and slave server instances with all default options except for master with log_bin=a, start both servers
- execute 'FLUSH TABLE WITH READ LOCK; SHOW MASTER STATUS' on master then execute 'CHANGE MASTER TO ...' and 'START SLAVE' on slave
- load sakila database on master and verify that it is fully replicated to slave
- shutdown master and slave servers
- upgrade/change slave binaries to 5.6.12
- start slave with skip-grant-tables and skip-slave-start
- run mysql_upgrade against slave, shutdown slave
- start slave with skip-slave-start (no skip-grant-tables)
- slave asserts on startup as described above

Hello George,

Thank you for the bug report.
I'm able to repeat the reported behavior only with the "debug" build where as release build didn't assert.

Could you try with Oracle/MySQL builds and see if it is repeatable? Please let us know.

Thanks,
Umesh

How to repeat:

- create master and slave server instances with all default options except for master with log_bin=a, start both servers
- execute 'FLUSH TABLE WITH READ LOCK; SHOW MASTER STATUS' on master then execute 'CHANGE MASTER TO ...' and 'START SLAVE' on slave
- load sakila database on master and verify that it is fully replicated to slave
- shutdown master and slave servers
- upgrade/change slave binaries to 5.6.12
- start slave with skip-grant-tables and skip-slave-start
- run mysql_upgrade against slave, shutdown slave
- start slave with skip-slave-start (no skip-grant-tables)
- slave asserts on startup as described above
 
 ^^ I observed that "slave" asserts only when *debug* build is used and didn't assert for release build.
 
 
// Started slave with release build.. It doesn't assert (5.6.13)
 
 [root@cluster-repo mysql-5.6.13]# bin/mysqld --defaults-file=./my_slave.cnf --user=mysql --skip-slave-start &
 [1] 19731
 [root@cluster-repo mysql-5.6.13]# 2013-08-18 15:38:10 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
 2013-08-18 15:38:10 19731 [Note] Plugin 'FEDERATED' is disabled.
 2013-08-18 15:38:10 19731 [Note] InnoDB: The InnoDB memory heap is disabled
 2013-08-18 15:38:10 19731 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
 2013-08-18 15:38:10 19731 [Note] InnoDB: Compressed tables use zlib 1.2.3
 2013-08-18 15:38:10 19731 [Note] InnoDB: Using Linux native AIO
 2013-08-18 15:38:10 19731 [Note] InnoDB: Not using CPU crc32 instructions
 2013-08-18 15:38:10 19731 [Note] InnoDB: Initializing buffer pool, size = 128.0M
 2013-08-18 15:38:10 19731 [Note] InnoDB: Completed initialization of buffer pool
 2013-08-18 15:38:10 19731 [Note] InnoDB: Highest supported file format is Barracuda.
 2013-08-18 15:38:10 19731 [Note] InnoDB: The log sequence numbers 8564353 and 8564353 in ibdata files do not match the log sequence number 8691617 in the ib_logfiles!
 2013-08-18 15:38:10 19731 [Note] InnoDB: Database was not shutdown normally!
 2013-08-18 15:38:10 19731 [Note] InnoDB: Starting crash recovery.
 2013-08-18 15:38:10 19731 [Note] InnoDB: Reading tablespace information from the .ibd files...
 2013-08-18 15:38:10 19731 [Note] InnoDB: Restoring possible half-written data pages
 2013-08-18 15:38:10 19731 [Note] InnoDB: from the doublewrite buffer...
 2013-08-18 15:38:10 19731 [Note] InnoDB: 128 rollback segment(s) are active.
 2013-08-18 15:38:10 19731 [Note] InnoDB: Waiting for purge to start
 2013-08-18 15:38:10 19731 [Note] InnoDB: 5.6.13 started; log sequence number 8691617
 2013-08-18 15:38:10 19731 [Note] Server hostname (bind-address): '*'; port: 3307
 2013-08-18 15:38:10 19731 [Note] IPv6 is available.
 2013-08-18 15:38:10 19731 [Note]   - '::' resolves to '::';
 2013-08-18 15:38:10 19731 [Note] Server socket created on IP: '::'.
 2013-08-18 15:38:10 19731 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=cluster-repo-relay-bin' to avoid this problem.
 2013-08-18 15:38:10 19731 [Note] Event Scheduler: Loaded 0 events
 2013-08-18 15:38:10 19731 [Note] bin/mysqld: ready for connections.
 Version: '5.6.13'  socket: '/tmp/slave/mysql.sock'  port: 3307  MySQL Community Server (GPL)
 
 [root@cluster-repo mysql-5.6.13]# ps aux|grep mysqld
 mysql    19731  8.9  7.3 972252 451604 pts/3   Sl   15:38   0:00 bin/mysqld --defaults-file=./my_slave.cnf --user=mysql --skip-slave-start
 root     19757  0.0  0.0 103244   836 pts/3    S+   15:38   0:00 grep mysqld
 [root@cluster-repo mysql-5.6.13]# kill -9 19731
 

// Started slave with debug build.. it asserts (5.6.13)

[root@cluster-repo mysql-5.6.13]# bin/mysqld-debug --defaults-file=./my_slave.cnf --user=mysql --skip-slave-start &
[2] 19759
[root@cluster-repo mysql-5.6.13]# 2013-08-18 15:39:15 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).

..
..
2013-08-18 15:39:16 19759 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=cluster-repo-relay-bin' to avoid this problem.
mysqld-debug: /export/home/pb2/build/sb_0-9714019-1373474833.36/mysql-5.6.13/sql/rpl_rli.cc:2140: virtual bool Relay_log_info::read_info(Rpl_info_handler*): Assertion `lines < LINES_IN_RELAY_LOG_INFO_WITH_ID || (lines >= LINES_IN_RELAY_LOG_INFO_WITH_ID && internal_id == 1)' failed.
10:09:16 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=0
max_threads=151
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68229 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
bin/mysqld-debug(my_print_stacktrace+0x32)[0xa8fb70]
bin/mysqld-debug(handle_fatal_signal+0x47a)[0x70295a]
/lib64/libpthread.so.0[0x3b2560f500]
/lib64/libc.so.6(gsignal+0x35)[0x3b24e328a5]
/lib64/libc.so.6(abort+0x175)[0x3b24e34085]
/lib64/libc.so.6[0x3b24e2ba1e]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x3b24e2bae0]
bin/mysqld-debug(_ZN14Relay_log_info9read_infoEP16Rpl_info_handler+0x307)[0xa567c1]
bin/mysqld-debug(_ZN14Relay_log_info13rli_init_infoEv+0x7bd)[0xa5ad33]
bin/mysqld-debug(_Z16global_init_infoP11Master_infobi+0x16c)[0xa4ed54]
bin/mysqld-debug(_Z10init_slavev+0x101)[0xa52469]
bin/mysqld-debug(_Z11mysqld_mainiPPc+0x969)[0x5fa70b]
bin/mysqld-debug(main+0x1b)[0x5eb1a3]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3b24e1ecdd]
bin/mysqld-debug(__gxx_personality_v0+0x321)[0x5eb0d9]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

[2]+  Exit 1                  bin/mysqld-debug --defaults-file=./my_slave.cnf --user=mysql --skip-slave-start

// Started slave with debug build.. it asserts (5.6.12)

2013-08-18 15:06:56 19085 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=cluster-repo-relay-bin' to avoid this problem.
mysqld-debug: /export/home/pb2/build/sb_0-9211521-1368487198.59/mysql-5.6.12/sql/rpl_rli.cc:2133: virtual bool Relay_log_info::read_info(Rpl_info_handler*): Assertion `lines < LINES_IN_RELAY_LOG_INFO_WITH_ID || (lines >= LINES_IN_RELAY_LOG_INFO_WITH_ID && internal_id == 1)' failed.
09:36:56 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=0
max_threads=151
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68226 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
bin/mysqld-debug(my_print_stacktrace+0x32)[0xa8a218]
bin/mysqld-debug(handle_fatal_signal+0x47a)[0x6ff20a]
/lib64/libpthread.so.0[0x3b2560f500]
/lib64/libc.so.6(gsignal+0x35)[0x3b24e328a5]
/lib64/libc.so.6(abort+0x175)[0x3b24e34085]
/lib64/libc.so.6[0x3b24e2ba1e]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x3b24e2bae0]
bin/mysqld-debug(_ZN14Relay_log_info9read_infoEP16Rpl_info_handler+0x307)[0xa51251]
bin/mysqld-debug(_ZN14Relay_log_info13rli_init_infoEv+0x7bd)[0xa557b1]
bin/mysqld-debug(_Z16global_init_infoP11Master_infobi+0x16c)[0xa499f0]
bin/mysqld-debug(_Z10init_slavev+0x101)[0xa4cf45]
bin/mysqld-debug(_Z11mysqld_mainiPPc+0x96b)[0x5f716d]
bin/mysqld-debug(main+0x1b)[0x5e7c33]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3b24e1ecdd]
bin/mysqld-debug(__gxx_personality_v0+0x309)[0x5e7b69]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Being an assertion failure it sure does only bail out in debug builds ...

So the real question is: either the assert check is bogus, in that case it should be removed, or it is indeed pointing out some serious inconsistency, then the situation that leads to the assertion failure needs to be fixed 

Suggesting the non-debug build as a workaround might be ok if the assert condition is considered non-serious, but there is no sense in suggesting to try to reproduce the error on a non-debug build as all assert() checks are simply not compiled into these ...

Fixed in 5.6.15 (and merged to 5.7.3). No changelog entry required. Closed.

5.6$ bzr log -r 5544
------------------------------------------------------------
revno: 5544
committer: Venkatesh Duggirala<venkatesh.duggirala@oracle.com>
branch nick: mysql-5.6
timestamp: Mon 2013-10-21 11:20:23 +0530
message:
  Bug #17324516 SLAVE ASSERTS AFTER 5.6 UPGRADE
  
  Problem:
  In place upgrade of slave from mysql-5.5 to mysql-5.6.12
  fails with assertion (lines >= LINES_IN_RELAY_LOG_INFO_WITH_ID
  && internal_id == 1).
  
  Analysis: mysql-5.5's relay-log.info format is different
  from mysql-5.6's format. mysql-5.5 version does not have
  some information like the number of lines in the file(1st line),
  sql_delay (6th line), number of parallel workers which is
  used when MTS is enabled (7th line), internal_id which is also
  used when MTS is enabled (8th line).
  
  While a mysql-5.6 slave server is coming up, it parses
  relay-log.info file (if already available) and regenerates
  a new content if the format is old (i.e., mysql-5.5's format).
  In the process of regeneration, a local variable temp_internal_id
  with a default value 0 is declared. If the file content is
  new format, then the value was read into temp_internal_id and
  Rpl_info::internal_id is assigned with this local temp_internal_id.
  If the file content is old format, then the default value '0'
  was assigned to Rpl_info::internal_id which is wrong value.
  
  Fix: Initialize temp_internal_id with Rpl_info::internal_id
  which is having the default value '1'.