Bug #1088 SET GLOBAL key_buffer_size=N w/DELAYED INSERT
Submitted: 18 Aug 2003 21:05 Modified: 25 Aug 2003 7:03
Reporter: Nick Gaugler Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:4.0.14 OS:Linux (Linux 2.4.x)
Assigned to: Michael Widenius CPU Architecture:Any

[18 Aug 2003 21:05] Nick Gaugler
Description:
Earlier today I attempted to increase my key_buffer_size on my db server and instead of modifying the value, MySQL went into an infinite loop and locked all of my threads.  After looking at output from PS, vmstat and show processlist I determined this happened this occurred when a delayed_insert thread was attempting to process inserts.  I feel this bug is critical because anyone who uses it while this is happening will have to force shut down their mysqld with a -9 signal and can receive corrupted tables.  On my live server I received a corrupted table that was being modified by the Delayed_insert thread, in my test environments I have not received a corrupted table.

How to repeat:
Example show processlist:
| 269001 | DELAYED |               | mem    | Delayed_insert | 228   | upgrading lock     |                                                                                                      
| 274756 | root    | localhost     | mysql  | Query          | 225   | NULL               | set global key_buffer_size=100663296                                                                 |

Now because this situation only happens when you are running select statements, you have to have a select which is holding a lock on the table while the Delayed_insert attempts to obtain the lock.

I have created a reproducible scenario, although sometimes its hard to reproduce.  I have only tested this on Linux 2.4.x kernels with MySQL 4.0.14, standard and debug.  Below is an example on how to recreate:

Table:
CREATE TABLE `t1` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `c1` varchar(25) NOT NULL default '',
  PRIMARY KEY  (`id`),
  KEY `c1` (`c1`)
) TYPE=MyISAM

test.pl:
#!/usr/bin/perl

use DBI;
use strict;

my $sql = DBI->connect('DBI:mysql:test:localhost','root','');

for (my $i = 0; $i <= 50000000; $i++) {
        my $c;
        if ($i % 2) {
                $c = 'abc';
        } elsif ($i % 3) {
                $c = 'bac';
        } elsif ($i % 5) {
                $c = 'asdf';
        } elsif ($i % 7) {
                $c = 'asdfasd';
        }
        $sql->do("INSERT DELAYED INTO t1 values ('','$c')");
}

test2.pl:
#!/usr/bin/perl

use DBI;
use strict;

my $sql = DBI->connect('DBI:mysql:test:localhost','root','');

while(1) {
        $sql->do("SELECT * FROM t1 LEFT JOIN t1 as t2 USING(c1) LIMIT 50");
}

How to recreate:

./test.pl &; ./test2.pl &

Once you get them start just log into MySQLd and continuously run the command over and over again to modify the key_buffer_size: 

set global key_buffer_size=100663296;

You may have to run it multiple times, it is hard to recreate although on my live server it happened I believe on the first or second time I changed it. I was trying to find the right value.  You may have to CTRL Z your mysql client and kill the two perl scripts if they get too far ahead; you will see what's going on with show processlist if the table handler is too far behind.

I have also uploaded a mysqld.trace file from when I created this.  This file is fairly long, if you look at the end you will see where it starts to loop.  I cut off the end of it to keep is shortened. It will infinitely loop over the following lines if you let it run:

T@8195 : | | | | enter: file: -1  blocks_used: 29  blocks_changed: 4
T@8195 : | | | <flush_key_blocks_int
T@8195 : | | | >flush_key_blocks_int

http://mybox.ngworld.net/~nickg/mysqld.trace.gz

Suggested fix:
Investigate the function that handles key_buffer_cache as well as the delayed_insert thread (any any other possible areas that would result in the same situation) and fix the proper locks so it does not recreate.
[20 Aug 2003 4:33] Alexander Keremidarski
Nick,
First of all thank you for excelent test case.

I was unable to repeat the problem even whith much higher load. Still I think we must investigate your mysqld.trace carefully.

Can you confirm that both standard and debug mysqld you have tested are our binariies? This is how I read "I have only tested this on Linux 2.4.x kernels with MySQL 4.0.14, standard and debug"

Can you also describe what exactly in SHOW PROCESSLIST lets you think MySQL went into endless loop? (no comment about mysqld.trace yet)

I spent lot of time to test it and even with terrible slowdown I was able to achieve MySQL was still able to finish all tasks - SET, INSERT DELAYED and SELECTs

Initially I tested it exactly as you suggested. I tried various values for key_buffer_size and there were nothign wrong everytime.

Then I made sure that table gets fragmented in order to prevent concurrent INSERT+SELECT. Just delete several rows. Still no luck.

At the end I have started 10 copies of each script. test.pl and test2.pl
This resulted to 20 threads 10 running INSERTs and 10 SELECTs plus DELAYED thread.

At certain amount of memory allocated MySQL slows down dramatically, but it is quite understandable with regard of RAM available on machine. I have 512Mb RAM so when I allocated 400Mb to key_buffer_size SET statement "finishes" after about 4 minutes, but still everything with INSERTs and SELECTs works Ok. Finally I set key_buffer_size to 900 Mb which ofcourse knocked down my machine. It took me more than hour and a half before I was able to check if INSERTs are doing well, but even then they did!

Meanwhile I saw some quite strange values like load average 25.67 or following vmstat output (note cs):

   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 3  1  1 530104  14412   5340  44360    4   11    16    74   27    76 27  4 69
 5  0  1 530104  14792   5420  44332  268  356   416  1348 4908 64331 84 16  0
 4  0  0 530104  14772   5516  44308   24   36    36   240 1412  8663 87 13  0
 4  2  1 530100  14808   5656  44400   36   40    52   432 2818 28823 83 17  0
 6  0  0 530104   6432   5656  44288    4   12     4    32  254  1178 87 13  0
11  0  0 530104  10736   5656  44288    0    0     0     0  101  5470 75 25  0

But mysqld survived even this test.
Exactly as you said "its hard to reproduce"

I tested it under RedHat 9 2.4.20-19.9, but I don't think if this bug exists it should be kernel dependant.

Any more information you can provide will be helpfull.
[20 Aug 2003 4:55] Nick Gaugler
Yes, I can confirm these are the binary downloads of 4.0.14 and the 4.0.14-debug.  bin/mysqlbug shows the following:

VERSION="4.0.14-standard"
COMPILATION_COMMENT="Official MySQL-standard binary"

VERSION="4.0.14-debug"
COMPILATION_COMMENT="Official MySQL-debug binary"

My OS varied only a little, on the standard server, the live server, it was a RedHat 7.3 initial install, however I immediately upgraded it to a kernel.org Linux 2.4.20 kernel.  The server I had available not running MySQL to be able to install the debug version was also a RedHat 7.3 install and it is running the default 2.4.18-5smp kernel.  Both of these kernels are SMP, I forgot to note that earlier.

I tried to recreate it just a few minutes ago and I could not get it to recreate, it's complicated to do.  The key is to get the Delayed_insert thread to be in the "upgrading lock" status while you are attempting to change the variable.  If you can think of a better way to make this more happen, it may be easier than my test.pl and test2.pl scripts.

When this happens show processlist shows the values I previously posted, the disk activity is 0 (since nothing else will occur) and the CPU usage is pinned at 100%, since the thread that is attempting to change the value is infinitely looping.  I do not think that the amount of size you set the Key_buffer to matters since all MySQL appears to do is reset what you currently have and chaneg the internal limitation.  I tried several times with increasing my current value or decreasing it to see if MySQL would just "grow" the buffer pool or what, but it never seemed to do this.  I will try and do some more investigating and see if I can find some more stuff.  I do not know why it took your system 4 minutes to an hour and a half.  Possibly mine would have finished had I left it?  The odd thing is first, MySQL does not create the buffer the first time, and second, I was able to run the set command many times in a row without problems before I was able to "catch" it at the right spot where it locks.  Please let me know if there is anything else I can do.
[20 Aug 2003 5:07] Nick Gaugler
I just made it happen on the debug daemon/server.  Below are the results of show processlist.  This caught the Delayed_insert at a different state then all of my previous tests:

+----+---------+-----------+------+----------------+------+--------------------+----+---------+-----------+------+----------------+------+----------------------------+---------------------------------------------------------+
| Id | User    | Host      | db   | Command        | Time | State                      | Info                                                    |
+----+---------+-----------+------+----------------+------+----------------------------+---------------------------------------------------------+
| 15 | root    | localhost | test | Query          | 554  | NULL                       | set global key_buffer_size=500000                       |
| 16 | root    | localhost | test | Query          | 554  | waiting for handler insert | INSERT DELAYED INTO t1 values ('','bac')                |
| 17 | DELAYED |           | test | Delayed_insert | 554  | insert                     |                                                         |
| 18 | root    | localhost | test | Query          | 554  | Locked                     | SELECT * FROM t1 LEFT JOIN t1 as t2 USING(c1) LIMIT 500 |
| 19 | root    | localhost | NULL | Query          | 0    | NULL                       | show processlist                                        |
+----+---------+-----------+------+----------------+------+----------------------------+---------------------------------------------------------+
[20 Aug 2003 5:14] Nick Gaugler
I forgot, the first live server was not SMP.  Below is a 4.0.12-standard install, binary from MySQL.com, that recreated instantly.  Notice that there is another query stuck in this example from a completely different database.  It's stuck in the statistics entry:

+--------+---------+----------------+---------+----------------+-------+----------------------------+--------------------------------------------------------------+
| Id     | User    | Host           | db      | Command        | Time  | State                      | Info                                                         |
+--------+---------+----------------+---------+----------------+-------+----------------------------+--------------------------------------------------------------+
|     12 | DELAYED | localhost:5952 | proteus | Delayed_insert | 11423 | Waiting for INSERT         |                                                              |
| 147997 | bitchx  | localhost      | misc    | Query          | 58    | statistics                 | SELECT BotRoomID FROM botroom WHERE (BotRoomName = '#mysql') |
| 163306 | root    | localhost      | test    | Query          | 68    | NULL                       | set global key_buffer_size=8388608                           |
| 163309 | root    | localhost      | test    | Query          | 68    | Locked                     | SELECT * FROM t1 LEFT JOIN t1 as t2 USING(c1) LIMIT 500      |
| 163310 | root    | localhost      | test    | Query          | 67    | waiting for handler insert | INSERT DELAYED INTO t1 values ('','abc')                     |
| 163311 | DELAYED | localhost      | test    | Delayed_insert | 68    | insert                     |                                                              |
| 163313 | root    | localhost      | NULL    | Query          | 0     | NULL                       | show processlist                                             |
+--------+---------+----------------+---------+----------------+-------+----------------------------+--------------------------------------------------------------+
[20 Aug 2003 6:27] Alexander Keremidarski
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Fix will be included in 4.0.15
[22 Aug 2003 5:24] Alexander Keremidarski
Thanks to Nick I was able to repeat deadlock and acknowlege it is not fixed yet as I thought before.
[25 Aug 2003 7:03] Michael Widenius
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

The bug was that the code that flushed the key cache before doing a resize didn't work in all cases if the key cache was in active use.
(In other words, this can also happen with normal inserts, not just INSERT DELAYED).

This bug should not affect other things than when you set key_buffer_size from the SQL prompt.

Fix will appear in 4.0.15 and 4.1.1

Regards,
Monty