Bug #76402 High CPU usage issue/stuck queries with 5.7.6-m16
Submitted: 20 Mar 2015 4:43 Modified: 21 Mar 2015 6:48
Reporter: Roel Van de Paar (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.7.6-m16 OS:Any
Assigned to: CPU Architecture:Any

[20 Mar 2015 4:43] Roel Van de Paar
Description:
htop output:

  Load average: 3832.70 3832.31 3828.13
  1  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]     
  5  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
  2  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]     
  6  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
  3  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]     
  7  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
  4  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]     
  8  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]

mysql client status:

mysql> show full processlist\G
*************************** 1. row ***************************
     Id: 7
   User: root
   Host: localhost
     db: test
Command: Query
   Time: 448525515   <------------------------------- (!)
  State: query end
   Info: INSERT INTO t1 VALUES ('aa'),
('bb')
*************************** 2. row ***************************
     Id: 9
   User: root
   Host: localhost
     db: mysql
Command: Connect
   Time: 11231       <------------------------------- normal, few hours 
  State: Opening tables
   Info: SET @id:=(SELECT id FROM INFORMATION_SCHEMA.PROCESSLIST WHERE ID<>CONNECTION_ID() AND STATE<>'killed' AND TIME>90 ORDER BY TIME DESC LIMIT 1)
[... more of the same (it's just some periodical checks getting stuck due to query above)...]

How to repeat:
Not sure yet. Researching. 5-7.6-m16 seems to have some serious problems with locking/semaphores, most notably on shutdown (ref bug #76346). Issues may be related.
[20 Mar 2015 7:12] Roel Van de Paar
This completely floored a high-end 8 thread i7 machine running nothing else then 10x mysqld with one client thread each (+sheduler thread for checkups, shown above)
[20 Mar 2015 7:31] Roel Van de Paar
mysql> show processlist;
+----+-----------------+-----------+------+---------+------------+-----------------------------+----------------
| Id | User            | Host      | db   | Command | Time       | State                       | Info           
+----+-----------------+-----------+------+---------+------------+-----------------------------+----------------
|  1 | event_scheduler | localhost | NULL | Daemon  |         13 | Waiting for next activation | NULL                    |
|  7 | root            | localhost | test | Query   | 1426835205 | creating table              | CREATE TABLE mt1 (a tinyint not null primary key, b tinyint not null, key (b)) |
| 24 | root            | localhost | NULL | Query   |          0 | starting                    | show processlist
+----+-----------------+-----------+------+---------+------------+-----------------------------+-----------------
3 rows in set (0.00 sec)

(Server has been up for few minutes only)
[20 Mar 2015 7:32] Roel Van de Paar
Even so, the count is steadily increasing

mysql> show processlist;
[...]
|  7 | root            | localhost | test | Query   | 1426835274 | creating table              | CREATE TABLE mt1 (a tinyint not null primary key, b tinyint
[...]
mysql> show processlist;
[...]
|  7 | root            | localhost | test | Query   | 1426835277 | creating table              | CREATE TABLE mt1 (a tinyint not null primary key, b tinyint
[...]
[20 Mar 2015 7:33] Roel Van de Paar
Also seeing things like this

[roel@localhost ~]$ ps -ef | grep mysqld
roel      2279  2149  0 18:14 pts/0    00:00:06 [mysqld] <defunct>
[roel@localhost ~]$ kill -9 2279
[roel@localhost ~]$ ps -ef | grep mysqld
roel      2279  2149  0 18:14 pts/0    00:00:06 [mysqld] <defunct>
[20 Mar 2015 7:38] Shane Bester
you sure you're not setting the session timestamp in  your tests?
[20 Mar 2015 7:43] Roel Van de Paar
|  3 | root            | localhost | NULL | Query   |   -1 | starting                    | show processlist |

|  7 | root            | localhost | test | Query   | 386513139 | User sleep                  | SELECT SLEEP(30) |
[20 Mar 2015 7:44] Roel Van de Paar
Shane, ouch good point. But then why the high cpu utilization? Let me filter timestamps from the sql.
[20 Mar 2015 7:49] Roel Van de Paar
Yep, so even though SET TIMESTAMP seems to be causing the spurious times (bug 73999) the SQL still takes exceptionally long while maxing out the CPU (this server can normally handle 40-50 mysqld's quite okay, but now there is only 10)

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+-----------------------------+--------------------------------+
| Id | User            | Host      | db   | Command | Time | State                       | Info                           |
+----+-----------------+-----------+------+---------+------+-----------------------------+--------------------------------+
|  1 | event_scheduler | localhost | NULL | Daemon  |   17 | Waiting for next activation | NULL                           |
|  7 | root            | localhost | test | Query   |  166 | creating table              | CREATE TABLE t1 (a mediumtext) |
| 17 | root            | localhost | NULL | Query   |    0 | starting                    | show processlist               |
+----+-----------------+-----------+------+---------+------+-----------------------------+--------------------------------+
3 rows in set (0.00 sec)

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+-----------------------------+--------------------------------+
| Id | User            | Host      | db   | Command | Time | State                       | Info                           |
+----+-----------------+-----------+------+---------+------+-----------------------------+--------------------------------+
|  1 | event_scheduler | localhost | NULL | Daemon  |   20 | Waiting for next activation | NULL                           |
|  7 | root            | localhost | test | Query   |  169 | creating table              | CREATE TABLE t1 (a mediumtext) |
| 17 | root            | localhost | NULL | Query   |    0 | starting                    | show processlist               |
+----+-----------------+-----------+------+---------+------+-----------------------------+--------------------------------+
3 rows in set (0.00 sec)

[...]

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+-----------------------------+--------------------------------+
| Id | User            | Host      | db   | Command | Time | State                       | Info                           |
+----+-----------------+-----------+------+---------+------+-----------------------------+--------------------------------+
|  1 | event_scheduler | localhost | NULL | Daemon  |    8 | Waiting for next activation | NULL                           |
|  7 | root            | localhost | test | Query   |  357 | creating table              | CREATE TABLE t1 (a mediumtext) |
| 17 | root            | localhost | NULL | Query   |    0 | starting                    | show processlist               |
+----+-----------------+-----------+------+---------+------+-----------------------------+--------------------------------+
3 rows in set (0.00 sec)
[20 Mar 2015 7:55] Shane Bester
if create table takes > 5 mins, I can only think there is some hardware/system issue.   Can upload PMP/gdb stack traces for this instance?
[20 Mar 2015 8:03] Roel Van de Paar
Yes, was thinking the same (gdb)

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+--------------------
| Id | User            | Host      | db   | Command | Time | State     | Info
+----+-----------------+-----------+------+---------+------+--------------------
|  1 | event_scheduler | localhost | NULL | Daemon  |    8 | Waiting for next activation | NULL  
|  7 | root            | localhost | test | Query   |  187 | creating table              | CREATE TABLE t928 (c1 VARCHAR(10)) |
| 17 | root            | localhost | NULL | Query   |    0 | starting                    | show processlist 
+----+-----------------+-----------+------+---------+------+--------------------
3 rows in set (0.00 sec)

<break-in-with-gdb>
mysql> show processlist;
<hangs, correct>
[20 Mar 2015 8:04] Roel Van de Paar
Second break-in (gdb_full.txt) was at this point;

|  7 | root            | localhost | test | Query   |  437 | creating table              | CREATE TABLE t928 (c1 VARCHAR(10)) |
[20 Mar 2015 8:05] Roel Van de Paar
gdb_std.txt

Attachment: gdb_std.txt (text/plain), 30.96 KiB.

[20 Mar 2015 8:05] Roel Van de Paar
gdb_full.txt

Attachment: gdb_full.txt (text/plain), 87.34 KiB.

[20 Mar 2015 8:07] Roel Van de Paar
This is consistently reproducible and tests are running on /dev/shm tmpfs
[20 Mar 2015 8:13] Shane Bester
I hope you're also excluding the variables from random values?
innodb_spin_wait_delay
innodb_thread_sleep_delay
[20 Mar 2015 8:14] Shane Bester
and innodb_sync_spin_loops!
[20 Mar 2015 8:19] Roel Van de Paar
If extra tests were added for this, they may have made it into our testing. Checking.
[20 Mar 2015 8:40] Roel Van de Paar
Bingo, looks like this was the cause for the high cpu issue - a few options (not given as --mysqld but rather as sql) were in the testing sql (innodb_spin_wait_delay and innodb_sync_spin_loops).

Bug 76346 does not look yet resolved by this, but I also notice innodb_adaptive_max_sleep_delay which may be the cause. Checking further & closing this one.
[20 Mar 2015 8:42] Roel Van de Paar
Still one thing odd here is the "[mysqld] <defunct>" - any change that may have caused this? Sure enough, if the parent process (screen/shell session) is removed, the processes are terminated, but this is new/not previously seen behavior.
[20 Mar 2015 9:00] Roel Van de Paar
Just reopening this; these options were previously in our test SQL also, and they never played up (5.6 or 5.7). 

To confirm I tested the same testcase used for this bug earlier against 5.7.5-m15 and the issue does NOT show there. Thus, some difference exists between 5.7.5-m15 and 5.7.6-m16 and the issue may or may not be a problem in 5.7.6-m16.
[20 Mar 2015 9:02] Roel Van de Paar
I am also seeing a fair few of these;

mysqld: /bzr/mysql-5.7.6-m16_dbg/sql/unireg.cc:263: bool mysql_create_frm(THD*, const char*, const char*, const char*, HA_CREATE_INFO*, List<Create_field>&, uint, KEY*, handler*): Assertion `thd->mdl_context.owns_equal_or_stronger_lock( MDL_key::TABLESPACE, "", create_info->tablespace, MDL_INTENTION_EXCLUSIVE)' failed.
08:46:30 UTC - mysqld got signal 6 ;

Maybe I'll create a testcase for that one first, if it's not a know issue (Google search gave nothing)
[20 Mar 2015 9:18] Roel Van de Paar
(Removing innodb_adaptive_max_sleep_delay and other options here did also not stop Bug 76346 from occurring, though frequency is seemingly less, TBC)
[21 Mar 2015 6:45] Roel Van de Paar
mdl_context.owns_equal_or_stronger_lock bug; took ages to get it right. see bug #76424. Does not look related.
[21 Mar 2015 6:48] Roel Van de Paar
This issue turned into too much discussion. The problem still exists, but more testing is needed. It may be that an earlier version did not handle spin etc. options correctly (and now it does), or there may be a problem with the spin options in this release. I'll close this one and if needed I'll log a new one. Shane, if you want to have a run with spin etc. options that would be good too. Please post any updates you see from your end. Thanks for your help as always.