MySQL Bugs: #90365: mysqld hangs / slave io thread hang in system lock

Bug #90365	mysqld hangs / slave io thread hang in system lock
Submitted:	10 Apr 2018 10:07	Modified:	10 Apr 2018 11:53
Reporter:	Georgi Iovchev	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server	Severity:	S1 (Critical)
Version:	5.6.27	OS:	CentOS (7.4)
Assigned to:		CPU Architecture:	Any (3.10.0-693.17.1.el7.x86_64)

Description:
We have issue in production environment with mysql 5.6.27 running on centos7. On random interval mysqld process hangs and becomes unresponsive. The issue happens to different servers in our environment. Last time it happened to non critical delayed slave only machine with no application connections, only monitoring.

The servers are vmware virtual machine guests running centos 7 x64 and mysql 5.6.27 community edition.
When mysqld is that state I can connect to mysql server and I can select from information_schema and performance_schema, but if I query any other database the session hangs. When I try to execute show engine innodb status or show slave status the session hangs.
There is nothing in the error log file.

Process list looks like this:
ID USER HOST DB COMMAND TIME STATE INFO
1 system user NULL Connect 2501013 System lock NULL
2 system user db1 Connect 326681 updating UPDATE ApiActivityLog SET responseTimestamp = '2018-04-06 15:51:04.009', status = 'SUCCESS', responseBody = ... WHERE (id = 468622878)
216036 dashboard 192.168.122.121:52394 NULL Query 283455 init SHOW SLAVE STATUS
216035 dashboard 192.168.122.121:52392 NULL Query 283456 init SHOW SLAVE STATUS
216037 sc_monitor_user 192.168.114.17:43960 NULL Query 283422 init SHOW SLAVE STATUS
216039 sc_monitor_user 192.168.114.17:44514 NULL Query 283356 init SHOW SLAVE STATUS
...
239303 root localhost NULL Killed 834 init show engine innodb status
239345 sc_monitor_user 192.168.114.17:45458 NULL Killed 0 login NULL
239346 dashboard 192.168.122.121:59290 NULL Killed 0 login NULL
239347 dashboard 192.168.122.121:59332 NULL Killed 0 login NULL
...
239407 root localhost NULL Query 0 executing select * from information_schema.processlist

My guess is that this is due to slave io thread hang in system lock state.
Looking at the timestamp of the files I see that the last modification time of ib data, ib logs, binlogs and relay logs is the time when mysqld has hung.
It looks like all mysqld io activity has suddenly stopped.

After submitting the bug I will attach files with all information gathered - full process list, gdb backtrace, lsof and some queries from performance_schema.

How to repeat:
-

Suggested fix:
-

processlist

Attachment: processlist.txt (text/plain), 8.12 KiB.

performance_schema.threads

Attachment: threads.txt (text/plain), 8.33 KiB.

performance_schema.file_instances

Attachment: file_instances.txt (text/plain), 51.01 KiB.

performance_schema.events_statements_current

Attachment: events_statements_current.txt (text/plain), 12.89 KiB.

lsof

Attachment: lsof.txt (text/plain), 43.25 KiB.

gdb backtrace threads

Attachment: gdb_bt.txt (text/plain), 640.62 KiB.

Thank you for taking the time to report a problem.  Unfortunately you
are not using a current version of the product you reported a problem
with (current version is 5.6.39) -- the problem might already be fixed. Please download a new version from http://www.mysql.com/downloads/.

Also, there is no test case provided in the bug report and hence there
is nothing we can verify here.  If you are able to reproduce the bug
with one of the latest versions, please attach the exact reproducible
test case and change the version on this bug report to the version you
tested and change the status back to "Open".  Again, thank you for your
continued support of MySQL.

The problem can not be reproduced - it happens random - once a month or two.
I have already upgraded some of the instances, but can not be sure if this fixed the issue, because as I say it is absolutely random.