Bug #79815 Innodb freeze. Mutex deadlocking.
Submitted: 30 Dec 2015 20:20 Modified: 1 Feb 2016 11:17
Reporter: Alejandro Martinez Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Storage Engines Severity:S1 (Critical)
Version:mysql-community-server-5.6.28-1.fc22.x86 OS:Linux (Linux db-shard3 4.0.4-301.fc22.x86_64 #1 SMP Thu May 21 13:10:33 UTC 2015 x86_64 x86_64 x86_64 GNU/L)
Assigned to: CPU Architecture:Any
Tags: deadlock, freeze, innodb, mutex

[30 Dec 2015 20:20] Alejandro Martinez
Description:
We've had this issue several times, with several of our shards. Sometimes with many days in between and sometimes more than once in the same day. Although we don't know yet how to reproduce it.

The setup is: AWS, with EBS, SSD EBS (LVM raid)

Under normal operation, we do 500 to 2000 ops/s, on to 50 databases. Mostly batch replace queries. Then some crons do some batch processing (mostly selects and more replaces). Some tables are configured as row_format = Dynamic and some row_format = Compressed

The issue doesn't seem to correlate to any of those batches running. When it happens, the cpu becomes idle (0% user, 0% io wait CPU). Here are attached the two stacktraces for the freezes we suffered today.

When it happens, mysqld would respond to a shutdown and it will write to the log that a shutdown was requested but innodb doesn't write anything to tablespace. After a kill -KILL, it would startup immediatelly and innodb recovery works correctly (we are discarding disk problems).

Unfortunately, we didn't run any of SHOW ENGINE INNODB STATUS, SHOW STATUS or SHOW PROCESSLIST when it happened, but we will and attach those the next time it happens.

The only similar bug i found was http://bugs.mysql.com/bug.php?id=79185 although stacktraces provided by other users in that thread match between themselves but don't match with ours. It looks to be something different.

How to repeat:
Non known yet.

Suggested fix:
Don't know.
[30 Dec 2015 20:32] Alejandro Martinez
threads

Attachment: Screen Shot 2015-12-30 at 12.37.50 PM.png (image/png, text), 59.71 KiB.

[30 Dec 2015 20:32] Alejandro Martinez
cpu usage

Attachment: Screen Shot 2015-12-30 at 12.38.16 PM.png (image/png, text), 129.43 KiB.

[30 Dec 2015 20:33] Alejandro Martinez
my.cnf

Attachment: my.cnf (application/octet-stream, text), 2.84 KiB.

[30 Dec 2015 20:36] Alejandro Martinez
pmp for trace1

Attachment: trace1.out (application/octet-stream, text), 8.55 KiB.

[30 Dec 2015 20:36] Alejandro Martinez
pmp for trace2

Attachment: trace2.out (application/octet-stream, text), 8.79 KiB.

[1 Feb 2016 11:17] Shaohua Wang
The base bug is bug#79185.

This bug is fixed as of the 5.5.49, 5.6.30, 5.7.12, and 5.8.0 releases.