Bug #41804 purge stored procedure cache causes mysterious hang for many minutes
Submitted: 31 Dec 2008 9:43 Modified: 8 Feb 15:43
Reporter: Shane Bester
Status: Verified
Category:Server: SP Severity:S3 (Non-critical)
Version:5.1.30 OS:Microsoft Windows
Assigned to: Target Version:
Tags: hang, stored procedure cache
Triage: Triaged: D3 (Medium)

[31 Dec 2008 9:43] Shane Bester
Description:
On a moderately loaded system with many stored routines purging the stored program cache
can take a long time.  During this time, queries appear to be hung, and there's no
indication what is happening:

Client program was doing 80qps before purge.  During purge:
0121191 successful queries.  0009234 failed queries (0.000000 QPS).
0121191 successful queries.  0009234 failed queries (0.000000 QPS).
0121191 successful queries.  0009234 failed queries (0.000000 QPS).
0121193 successful queries.  0009234 failed queries (0.166667 QPS).
0121193 successful queries.  0009234 failed queries (0.000000 QPS).
<cut>

Query   |  277 | NULL      | CALL ` ...
Query   |  277 | NULL      | CALL ` ...
Query   |  131 | NULL      | CALL ` ...
Query   |  279 | NULL      | CALL ` ...
Query   |  191 | NULL      | CALL ` ...
Query   |  225 | NULL      | CALL ` ...
Query   |  223 | NULL      | CALL ` ...
Query   |  277 | NULL      | CALL ` ...
Query   |  279 | NULL      | CALL ` ...
Query   |  277 | NULL      | CALL ` ...

In this case it took a full 10 minutes to purge the cache.  After the purge is complete,
you'd notice memory consumption of mysqld is much lower.

How to repeat:
create any 1000 stored routines.
call them all in multiple threads continuously.

after a few minutes, in a client session, purge the cache by running:
"drop procedure if exists p1;create procedure p1()begin end;"

now watch processlist and check if queries are hanging with NULL status.

Suggested fix:
Implement a faster way to purge SPs from the cache.  Query cache and innodb adaptive hash
had same problems in the passed - not scaling to larger numbers of items during purge...
[31 Dec 2008 10:14] Shane Bester
another symptom of this hang is when client connections quit (and they had called many
stored routines).  they look like this in processlist:

+------+---------+------+-------+------------------+
| db   | Command | Time | State | Info             |
+------+---------+------+-------+------------------+
| test | Killed  |   41 | NULL  | NULL             |
| test | Killed  |   41 | NULL  | NULL             |
| test | Killed  |   36 | NULL  | NULL             |
| test | Killed  |   41 | NULL  | NULL             |
| test | Killed  |   41 | NULL  | NULL             |
| test | Killed  |   41 | NULL  | NULL             |
| test | Killed  |   41 | NULL  | NULL             |
| test | Killed  |   37 | NULL  | NULL             |
| test | Killed  |   38 | NULL  | NULL             |
| test | Killed  |   41 | NULL  | NULL             |
| test | Query   |    0 | NULL  | show processlist |
[8 Feb 15:35] Shane Bester
testcase

Attachment: bug41804.sql (application/unknown, text), 263.59 KiB.

[8 Feb 15:43] Shane Bester
I attached a testcase that shows the problem.  Paste it into mysql client.
It drop/create 3000 SPs.   Then it calls them all once, to populate the SP cache.

Then, it drops p1, which now has to invalidate entire cache.  So, here's the last piece
of output:

mysql> select now();
+---------------------+
| now()               |
+---------------------+
| 2009-02-08 16:33:19 |
+---------------------+
1 row in set (0.03 sec)

mysql> drop procedure if exists `p0`;
Query OK, 0 rows affected (0.03 sec)

mysql> select now();
+---------------------+
| now()               |
+---------------------+
| 2009-02-08 16:33:19 |
+---------------------+
1 row in set (1 min 17.75 sec)

mysql> create procedure `p0`() select now();
Query OK, 0 rows affected (0.00 sec)

See how "select now()" took > 1 minute to execute !  Show processlist is *not* helpful:

mysql> show processlist;
+----+------+------+------+---------+------+-------+------------------+
| Id | User | Host | db   | Command | Time | State | Info             |
+----+------+------+------+---------+------+-------+------------------+
| 30 | root |      | test | Query   |    0 | NULL  | show processlist |
| 67 | root |      | test | Query   |   34 | NULL  | select now()     |
+----+------+------+------+---------+------+-------+------------------+
2 rows in set (0.02 sec)

If you don't see a problem or have super-fast machine, try with 20000 SPs.
So, either sp cache should be optimized a bit, or some useful diagnostic info can be
printed somewhere ...

This means that on a busy system with many SP's, simply creating/dropping an SP can cause
problems, due to all queries freezing for a while.

Last of all, full debug binary shows the problem many times worse than release binary.