Bug #83900 mysql server is crashed by binlog dump thread
Submitted: 21 Nov 2016 8:04 Modified: 28 Jul 2017 7:08
Reporter: h h Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.7.12-log OS:CentOS (6.5)
Assigned to: CPU Architecture:Any

[21 Nov 2016 8:04] h h
Description:
it happened several time ,i think it is a bug belongs to multi-thread replication

the info in error log is
2016-11-21T14:30:49.781131+08:00 3233373 [Note] Aborted connection 3233373 to db: 'tqdw' user: 'bi_rw_all' host: '10.162.51.135' (Got an error readi
ng communication packets)
2016-11-21T14:30:49.781192+08:00 3233354 [Note] Aborted connection 3233354 to db: 'tqdw' user: 'bi_rw_all' host: '10.162.51.135' (Got an error readi
ng communication packets)
2016-11-21T14:32:35.682284+08:00 2 [Note] Multi-threaded slave statistics for channel '': seconds elapsed = 136; events assigned = 180213761; worker
 queues filled over overrun level = 26410731; waited due a Worker queue full = 427957; waited due the total size = 8578121; waited at clock conflict
s = 0 waited (count) when Workers occupied = 78465919 waited when Workers occupied = 0
2016-11-21T14:35:21.651942+08:00 2 [Note] Multi-threaded slave statistics for channel '': seconds elapsed = 166; events assigned = 180216833; worker
 queues filled over overrun level = 26410731; waited due a Worker queue full = 427957; waited due the total size = 8578121; waited at clock conflict
s = 0 waited (count) when Workers occupied = 78465919 waited when Workers occupied = 0
2016-11-21T14:38:04.750920+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4038ms. The settings might not be optimal. (flushed=24 and
 evicted=0, during the time.)
2016-11-21T14:38:37.802900+08:00 2 [Note] Multi-threaded slave statistics for channel '': seconds elapsed = 196; events assigned = 180221953; worker
 queues filled over overrun level = 26410731; waited due a Worker queue full = 427957; waited due the total size = 8578121; waited at clock conflict
s = 0 waited (count) when Workers occupied = 78465919 waited when Workers occupied = 0
2016-11-21T14:38:42.716638+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4062ms. The settings might not be optimal. (flushed=8 and
evicted=0, during the time.)
2016-11-21T14:40:41.639514+08:00 2 [Note] Multi-threaded slave statistics for channel '': seconds elapsed = 124; events assigned = 180231169; worker
 queues filled over overrun level = 26410731; waited due a Worker queue full = 427957; waited due the total size = 8578121; waited at clock conflict
s = 0 waited (count) when Workers occupied = 78465919 waited when Workers occupied = 0
2016-11-21T14:41:16.463106+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4016ms. The settings might not be optimal. (flushed=413 an
d evicted=0, during the time.)
2016-11-21T14:41:41.896990+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4647ms. The settings might not be optimal. (flushed=216 an
d evicted=0, during the time.)
2016-11-21T14:42:32.148825+08:00 6 [Warning] Slave SQL for channel '': Worker 3 failed executing transaction 'ANONYMOUS' at master log mysql-bin.000
589, end_log_pos 353308146; Could not execute Update_rows event on table venus.venus_order_goods; Lock wait timeout exceeded; try restarting transac
tion, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql-bin.000589, end_log_pos 353308146, Error_code: 1205
2016-11-21T14:42:49.418231+08:00 3234439 [Note] Aborted connection 3234439 to db: 'unconnected' user: 'zabbix_agent' host: '127.0.0.1' (Got an error
 reading communication packets)
2016-11-21T14:42:49.418342+08:00 3234440 [Note] Aborted connection 3234440 to db: 'unconnected' user: 'zabbix_agent' host: '127.0.0.1' (Got an error
 reading communication packets)
2016-11-21T14:42:49.418344+08:00 3234416 [Note] Aborted connection 3234416 to db: 'unconnected' user: 'zabbix_agent' host: '127.0.0.1' (Got an error
 reading communication packets)
2016-11-21T14:42:49.418412+08:00 3234415 [Note] Aborted connection 3234415 to db: 'unconnected' user: 'zabbix_agent' host: '127.0.0.1' (Got an error reading communication packets)

06:42:50 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=52428800
read_buffer_size=524288
max_used_connections=126
max_threads=3000
thread_count=21
connection_count=15
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3163207 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f64000008c0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f646ee18e28 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0xf3b0b5]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x7cc734]
/lib64/libpthread.so.0[0x353320f710]
/lib64/libc.so.6[0x3532f3357f]
/usr/sbin/mysqld(_ZN14Relay_log_info29add_channel_to_relay_log_nameEPcjPKc+0x46)[0xf0d926]
/usr/sbin/mysqld(_ZN14Relay_log_info24relay_log_number_to_nameEjPc+0x3d)[0xf0da0d]
/usr/sbin/mysqld(_ZN12Slave_worker21read_and_apply_eventsEjyjy+0x64)[0xf179f4]
/usr/sbin/mysqld(_ZN12Slave_worker17retry_transactionEjyjy+0x1b1)[0xf18011]
/usr/sbin/mysqld(_Z27slave_worker_exec_job_groupP12Slave_workerP14Relay_log_info+0x1f9)[0xf1a2d9]
/usr/sbin/mysqld(handle_slave_worker+0x363)[0xf066f3]
/usr/sbin/mysqld(pfs_spawn_thread+0x174)[0x12436d4]
/lib64/libpthread.so.0[0x35332079d1]
/lib64/libc.so.6(clone+0x6d)[0x3532ee88fd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 6
Status: NOT_KILLED

How to repeat:
i do not know how to repeat
[19 Dec 2016 14:53] MySQL Verification Team
Thank you for taking the time to report a problem.  Unfortunately you are not using a current version of the product you reported a problem with, 5.7.12 is old version and since then many bugs fixed i.e the problem might already be fixed. Please download a new version from http://www.mysql.com/downloads/

If you are able to reproduce the bug with one of the latest versions, please change the version on this bug report to the version you tested and change the status back to "Open".  Again, thank you for your continued support of MySQL.
[26 Jul 2017 6:17] Wang Wei
MySQL Version:	5.7.19-log
2017-07-26T10:05:59.844124+08:00 415180 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2017-07-26T10:05:59.847051+08:00 415180 [Note] Slave I/O thread for channel '': connected to master 'replicater@172.20.129.81:3358',replication started in log 'mysql-bin.000003' at position 278724291
2017-07-26T10:05:59.860621+08:00 415181 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'mysql-bin.000003' at position 278724291, relay log './relay-log.000005' position: 278724504
2017-07-26T11:08:29.034086+08:00 415182 [Warning] 'user' entry 'root@localhost' ignored in --skip-name-resolve mode.
2017-07-26T11:08:29.034124+08:00 415182 [Warning] 'user' entry 'backup@localhost' ignored in --skip-name-resolve mode.
2017-07-26T11:08:29.034254+08:00 415182 [Warning] 'user' entry 'jd_mha_manager@localhost' ignored in --skip-name-resolve mode.
2017-07-26T11:08:29.034320+08:00 415182 [Warning] 'user' entry 'mysql.session@localhost' ignored in --skip-name-resolve mode.
2017-07-26T11:08:29.034721+08:00 415182 [Warning] 'db' entry 'sys mysql.sys@localhost' ignored in --skip-name-resolve mode.
2017-07-26T11:08:29.034740+08:00 415182 [Warning] 'db' entry 'performance_schema mysql.session@localhost' ignored in --skip-name-resolve mode.
2017-07-26T11:08:29.034756+08:00 415182 [Warning] 'proxies_priv' entry '@ root@localhost' ignored in --skip-name-resolve mode.
2017-07-26T11:08:29.034892+08:00 415182 [Warning] 'tables_priv' entry 'user mysql.session@localhost' ignored in --skip-name-resolve mode.
2017-07-26T11:08:29.034906+08:00 415182 [Warning] 'tables_priv' entry 'sys_config mysql.sys@localhost' ignored in --skip-name-resolve mode.
2017-07-26T11:08:29.771172+08:00 418967 [Note] Start binlog_dump to master_thread_id(418967) slave_server(1000), pos(mysql-bin.000003, 291716667)
03:08:29 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=67108864
read_buffer_size=262144
max_used_connections=2
max_threads=2000
thread_count=8
connection_count=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 860223 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f42d8012230
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f430c0b1e28 thread_stack 0x40000
/export/servers/mysql/bin/mysqld(my_print_stacktrace+0x35)[0xf33f95]
/export/servers/mysql/bin/mysqld(handle_fatal_signal+0x4a4)[0x7bb5f4]
/lib64/libpthread.so.0[0x3bb040f710]
/export/servers/mysql/bin/mysqld(find_type+0x38)[0xf37d68]
/export/servers/mysql/bin/mysqld(_ZN13Binlog_sender17init_checksum_algEv+0x99)[0xeeac59]
/export/servers/mysql/bin/mysqld(_ZN13Binlog_sender4initEv+0x1f7)[0xeeaf87]
/export/servers/mysql/bin/mysqld(_ZN13Binlog_sender3runEv+0x2b)[0xeec5bb]
/export/servers/mysql/bin/mysqld(_Z17mysql_binlog_sendP3THDPcyP8Gtid_setj+0x32)[0xee6f62]
/export/servers/mysql/bin/mysqld(_Z15com_binlog_dumpP3THDPcm+0xfc)[0xee8a8c]
/export/servers/mysql/bin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0xb5b)[0xd06f9b]
/export/servers/mysql/bin/mysqld(_Z10do_commandP3THD+0x194)[0xd084b4]
/export/servers/mysql/bin/mysqld(handle_connection+0x29c)[0xdd828c]
/export/servers/mysql/bin/mysqld(pfs_spawn_thread+0x171)[0x123f811]
/lib64/libpthread.so.0[0x3bb04079d1]
/lib64/libc.so.6(clone+0x6d)[0x3bb00e88fd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 418967
Status: NOT_KILLED
[26 Jul 2017 6:19] Wang Wei
2017-07-25T21:52:13.628226+08:00 66 [Note] Multi-threaded slave statistics for channel '': seconds elapsed = 125; events assigned = 818177; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 6472281900 waited (count) when Workers occupied = 0 waited when Workers occupied = 0
2017-07-25T21:56:41.444267+08:00 66 [Note] Multi-threaded slave statistics for channel '': seconds elapsed = 268; events assigned = 819201; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 6484416400 waited (count) when Workers occupied = 0 waited when Workers occupied = 0
2017-07-25T21:58:41.476851+08:00 66 [Note] Multi-threaded slave statistics for channel '': seconds elapsed = 120; events assigned = 854017; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 6729978200 waited (count) when Workers occupied = 0 waited when Workers occupied = 0
2017-07-25T22:00:46.617634+08:00 66 [Note] Multi-threaded slave statistics for channel '': seconds elapsed = 125; events assigned = 880641; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 6913160600 waited (count) when Workers occupied = 0 waited when Workers occupied = 0
2017-07-25T22:02:46.298771+08:00 66 [Note] Multi-threaded slave statistics for channel '': seconds elapsed = 120; events assigned = 910337; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 7014508100 waited (count) when Workers occupied = 0 waited when Workers occupied = 0
2017-07-26T10:05:59.842178+08:00 66 [Note] Error reading relay log event for channel '': slave SQL thread was killed
2017-07-26T10:05:59.842939+08:00 65 [Note] Slave I/O thread killed while reading event for channel ''
2017-07-26T10:05:59.842964+08:00 65 [Note] Slave I/O thread exiting for channel '', read up to log 'mysql-bin.000003', position 278724291
[27 Jul 2017 1:26] h h
it seems that Oracle dose not  pay enough attention to this bug which is hard to reproduce :(
[28 Jul 2017 7:08] Libing Song
client or slave sets @master_binlog_checksum to tell binlog sender thread if it supports checksum. when the variable is set to null(SET @master_binlog_checksump=null), server will be crashed
[20 Oct 2019 10:08] Arie Skliarouk
Still happens from time to time on 5.7.24-0ubuntu0.16.04.1 as shipped with ubuntu 16.04
On a production db with 10k+ queries/sec with many transactions.

How can I help you to reproduce the problem?