Description:
The background I/O thread invokes the bug that occurs when the ibuffer cursor is reset.Here's the details.
slave related configuration
innodb_buffer_pool_instances: 8
innodb_buffer_pool_size: 8.0G
innodb_file_per_table: ON
innodb_log_file_size: 128.0M
innodb_log_files_in_group: 2
innodb_flush_log_at_trx_commit: 1
sync_binlog: 1
mysql> show variables like '%slave_para%';
+------------------------+---------------+
| Variable_name | Value |
+------------------------+---------------+
| slave_parallel_type | LOGICAL_CLOCK |
| slave_parallel_workers | 16 |
+------------------------+---------------+
slave error_log
2018-03-21T03:09:55.027096+08:00 1 [Note] Multi-threaded slave: Coordinator has waited 312251 times hitting slave_pending_jobs_size_max; current event size = 8176.
2018-03-21T03:09:55.030636+08:00 1 [Note] Multi-threaded slave: Coordinator has waited 312261 times hitting slave_pending_jobs_size_max; current event size = 8176.
2018-03-21T03:09:55.034202+08:00 1 [Note] Multi-threaded slave: Coordinator has waited 312271 times hitting slave_pending_jobs_size_max; current event size = 8176.
2018-03-21T03:10:03.687119+08:00 0 [ERROR] InnoDB: ibuf cursor restoration fails!. ibuf record inserted to page 2464:1799
2018-03-21T03:10:03.687154+08:00 0 [ERROR] InnoDB: Submit a detailed bug report to http://bugs.mysql.com
PHYSICAL RECORD: n_fields 6; 1-byte offsets; info bits 0
0: len 4; hex 0000098a; asc ;;
1: len 1; hex 00; asc ;;
2: len 4; hex 00079cd7; asc ;;
3: len 16; hex 000001010dfe000300210c0f00608021; asc ! ` !;;
4: len 1; hex 31; asc 1;;
5: len 30; hex 363630313230313830323232303035313332353338343338363234373131; asc 660120180222005132538438624711; (total 31 bytes);
PHYSICAL RECORD: n_fields 6; 1-byte offsets; info bits 0
0: len 4; hex 0000098a; asc ;;
1: len 1; hex 00; asc ;;
2: len 4; hex 00079cd7; asc ;;
3: len 16; hex 000001010dfe000300210c0f00608021; asc ! ` !;;
4: len 1; hex 31; asc 1;;
5: len 30; hex 363630313230313830323232303035313332353338343338363234373131; asc 660120180222005132538438624711; (total 31 bytes);
DATA TUPLE: 3 fields;
0: len 4; hex 000009a0; asc ;;
1: len 1; hex 00; asc ;;
2: len 4; hex 00000707; asc ;;
PHYSICAL RECORD: n_fields 7; 1-byte offsets; info bits 32
0: len 4; hex 000009a0; asc ;;
1: len 1; hex 00; asc ;;
2: len 4; hex 00000707; asc ;;
3: len 22; hex 01ca01010c0f000c0021860a00030000860800088000; asc ! ;;
4: len 4; hex 4c494c49; asc LILI;;
5: len 3; hex 8fc474; asc t;;
6: len 8; hex 8000000000350b6e; asc 5 n;;
2018-03-21T03:10:03.702262+08:00 0 [ERROR] [FATAL] InnoDB: Failed to restore ibuf position.
2018-03-21 03:10:03 0x7f4d239f7700 InnoDB: Assertion failure in thread 139969286862592 in file ut0ut.cc line 916
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
19:10:03 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
key_buffer_size=16777216
read_buffer_size=2097152
max_used_connections=79
max_threads=3000
thread_count=30
connection_count=12
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 12344415 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x35)[0xf1dff5]
/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x4a4)[0x79d3b4]
/lib64/libpthread.so.0[0x358c00f7e0]
/lib64/libc.so.6(gsignal+0x35)[0x358bc325e5]
/lib64/libc.so.6(abort+0x175)[0x358bc33dc5]
/usr/local/mysql/bin/mysqld[0x1159d65]
/usr/local/mysql/bin/mysqld(_ZN2ib5fatalD1Ev+0xb3)[0x115e8b3]
/usr/local/mysql/bin/mysqld[0x102b4d1]
/usr/local/mysql/bin/mysqld[0x102f531]
/usr/local/mysql/bin/mysqld(_Z29ibuf_merge_or_delete_for_pageP11buf_block_tRK9page_id_tPK11page_size_tm+0xb09)[0x1033b29]
/usr/local/mysql/bin/mysqld(_Z20buf_page_io_completeP10buf_page_tb+0x4e1)[0x11a59a1]
/usr/local/mysql/bin/mysqld(_Z12fil_aio_waitm+0x15b)[0x1200afb]
/usr/local/mysql/bin/mysqld(io_handler_thread+0xc8)[0x110db48]
/lib64/libpthread.so.0[0x358c007aa1]
/lib64/libc.so.6(clone+0x6d)[0x358bce8aad]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
How to repeat:
The test environment
slave
Centos 7.4
mysql5.7.19
mysql> show variables like '%slave_para%';
+------------------------+---------------+
| Variable_name | Value |
+------------------------+---------------+
| slave _parallel_ type | LOGICAL _CLOCK |
| slave_parallel _workers | 8 |
+------------------------+---------------+
select count(*) from sbtest1;
500000
master perform
UPDATE sbtest1 SET c=REPEAT(UUID(),2) where id<100000;
slave error_log
Note] Multi-threaded slave: Coordinator has waited 208341 times hitting slave_pending_jobs_size_max; current event size = 8044
Note] Multi-threaded slave: Coordinator has waited 208351 times hitting slave_pending_jobs_size_max; current event size = 8044
Note] Multi-threaded slave: Coordinator has waited 208361 times hitting slave_pending_jobs_size_max; current event size = 8044
Note] Multi-threaded slave: Coordinator has waited 208371 times hitting slave_pending_jobs_size_max; current event size = 8044
There are a lot of logs in similar production environments from the library, but there is no simulation of automatic restart from the library.
Suggested fix:
Parallel replication from the library should not occur automatically restart, although the master has a large transaction.