Bug #17332 | changing key_buffer_size on a running server can crash under load | ||
---|---|---|---|
Submitted: | 12 Feb 2006 21:38 | Modified: | 18 Dec 2009 20:28 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: MyISAM storage engine | Severity: | S2 (Serious) |
Version: | 4.1BK,5.0.18, 5.0.19-bk,5.0.21-bk | OS: | Any (Linux,Windows) |
Assigned to: | Ingo Strüwing | CPU Architecture: | Any |
Tags: | crash, key_buffer_size, load index, myisam |
[12 Feb 2006 21:38]
Shane Bester
[12 Feb 2006 21:39]
MySQL Verification Team
insert statements
Attachment: thread1.zip (application/zip, text), 4.24 KiB.
[12 Feb 2006 21:42]
MySQL Verification Team
set key_buffer_size
Attachment: thread2.zip (application/zip, text), 190.47 KiB.
[12 Feb 2006 22:43]
MySQL Verification Team
proper backtrace, using 5.0.19-debug on Windows.
Attachment: win_stack_trace.txt (text/plain), 1.46 KiB.
[17 Feb 2006 15:02]
Ingo Strüwing
It took me three attempts with four threads to repeat it. This means that I have to do a lot of tests until I can feel positive about a fix. It is a keycache locking problem. When it crashed for me, two threads were flushing the cache. One for its inserts, one for the cache resizing. Both had a list of modified blocks (probably the same list) and tried to write them out and free each block. When freeing they assumed that every block from the list is not freed yet... I need to learn more about the keycache architecture until I can propose a reasonable fix.
[6 Mar 2006 14:08]
Ingo Strüwing
After fixing a couple of crash paths I have now infinite loops. Need more time.
[14 Mar 2006 12:46]
Ingo Strüwing
I need more time to fix this satisfactorily. When I started I could quickly repeat a crash. I saw almost instantly what happened and found a way to fix it. But then it crashed at another place. After fixing about four crash paths, I decided to rework the key cache locking more fundamentally. After this I was no longer able to produce any crashes. But now I saw index corruptions. But only after about 100 of the proposed tests on a 4-CPU machine with 4 threads competing on the key cache. This is pretty good, but I'm not satisfied yet. Unfortunately, I don't know if the corruptions result from my changes, or if they were present before, just masked by the crashes. Finding the corruptions is much more complicated than finding the crashes. While the crashes leave a stack backtrace behind, which points directly to the place in code where a problem exists, the corruption is only deteced after a command is complete. There is no hint, where in the code it had been corrupted. I plan to instrument the code with a lot of counters, hoping that I would see if one or more counters were increased before the corruption, but not increased for other commands. But this takes some time.
[5 Apr 2006 6:59]
Ingo Strüwing
Other, higher priority tasks are stepping in. And I still have no clue what could cause the index corruptions. It is difficult to track down due to the high load required to repeat them.
[20 Apr 2006 21:28]
MySQL Verification Team
5.0.21-bk, crash when INSERT and key_cache_block_size changes.
Attachment: key_cache_block_size.stack.txt (text/plain), 2.09 KiB.
[20 Apr 2006 21:34]
MySQL Verification Team
Ingo, I have a C load testing program to generate these crashes within ~1 minute. Attached is another crash, but instead of using key_buffer_size, I run the following to provoke it: 50 threads: SET GLOBAL key_cache_block_size={rand(4096,1048576)} 10 threads: INSERT INTO <table> with primary key. Crash looks similiar to the others reported. !unlink_block Line 1011 + 0xb C !free_block Line 2226 + 0xd C !flush_cached_blocks Line 2303 + 0xd C !flush_key_blocks_int Line 2456 + 0x1f C !flush_all_key_blocks Line 2580 + 0x15 C !resize_key_cache) Line 515 + 0x9 C !ha_resize_key_cache Line 2259 + 0x32 C++ !sys_var_key_cache_long::update Line 2434 + 0x9 C++ !set_var::update Line 3101 + 0x1b C++ !sql_set_variables Line 2986 + 0xf C++ !mysql_execute_command Line 3530 + 0x10 C++ !mysql_parse Line 5709 + 0x9 C++ !dispatch_command Line 1719 + 0x1d C++ !do_command Line 1515 + 0x31 C++ !handle_one_connection Line 1158 + 0x9 C++ !pthread_start Line 63 + 0x7 C !_threadstart Line 196 + 0xd C kernel32.dll!_BaseThreadStart@8() + 0x52
[20 Apr 2006 21:35]
MySQL Verification Team
above crash is on today's 5.0.21-bk built on windows.
[26 Apr 2006 10:44]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/5557
[26 Apr 2006 10:47]
Ingo Strüwing
Back to "In progress". The patch is not yet complete. It is just a preview for testing. Shane, if you have time, please apply the patch to a clean 5.0 tree and test with it. Regards, Ingo
[10 May 2006 10:45]
Ingo Strüwing
I dedetcted an unrelated bug (Bug #19604) that I need to fix before I can continue testing.
[1 Jun 2006 13:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/7163
[23 Jun 2006 1:01]
MySQL Verification Team
multithreaded C program. see top of file for compile details
Attachment: bug20540.c (text/x-csrc), 5.37 KiB.
[23 Jun 2006 1:03]
MySQL Verification Team
Ingo, Sorry for delay. compile the attached C program and run like this to kill the server: create table test(id int not null auto_increment primary key); Then, ./bug20540 200 60 10 1 "insert into test values (),(),();set global key_buffer_size=2048675;set global key_buffer_size=1048576;insert into test values (),(),()" 192.168.250.3 3306 test root crash happened within 1 minute on my box, over ethernet.
[26 Jun 2006 10:51]
Ingo Strüwing
The last patch had a problem with FLUSH_IGNORE_CHANGED.
[26 Jun 2006 10:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/8239
[18 Jul 2006 15:10]
Ingo Strüwing
I will split the fix in smaller pieces for easier review.
[1 Sep 2006 10:35]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/11242 ChangeSet@1.2208, 2006-09-01 12:34:55+02:00, istruewing@chilla.local +2 -0 Bug#17332 - changing key_buffer_size on a running server can crash under load Resizing a key cache while it was in heavy use could crash the server. There were several race conditions. I reworked some of the algorithms to fix the race conditions.
[4 Oct 2006 18:07]
RIchard Murphy
I am experiencing what would seem to be the same problem using Ver 14.7 Distrib 4.1.20, for pc-linux-gnu
[1 Nov 2006 15:54]
Ingo Strüwing
It is quite possible that this bug exists in 4.1 too. I'll check if we should fix it there too.
[1 Nov 2006 16:38]
MySQL Verification Team
yes, 4.1.22 is also affected..
Attachment: 4.1_bk.txt (plain/text, text), 1010 bytes.
[2 Nov 2006 14:12]
Ingo Strüwing
In a meeting with Monty, Sanja, and Ingo we came to the following conclusions: - Ingo's last patch solves a couple of critical problems and needs to be added to 5.0 and 5.1. [Comment by Ingo: and probably 4.1, but this has not been decided yet.] - Sanja and Monty have done a full review of every single line of Ingo's last patch and only found one problem (which Ingo will correct). - We don't think that Igor's proposed solution would create a smaller patch. What is sure it would be partly same as Ingo's patch, but less efficient. - Monty, Sanja, and Ingo did an extra review of all code that is not part of the new resize code to ensure that there are no new bugs in normal operation (ie, when we don't do a resize). (As resize is already broken, it's not as critical if there would be new bugs in this code) - Monty found some small cleanup things in the patch that Ingo has promised to fix (mostly to move some repeated code into functions to make the code easier to manage) - Some of the key cache description of Ingos email will be moved to internal docs, some will be moved into mf_keycache.c - Ingo will spend 3-6 hours on running more test with code coverage in an effort to ensure that as many as possible of the changed lines are tested. To ensure we don't break 5.0 (especially as we may not get extensive community testing on this), we come up with the following plan: - Ingo will add the patch in 5.1. If we within one month after the next 5.1 release don't get any bug reports for the new code in the key cache, he will also apply the code in the 5.0 tree. (The alternative would be to disable resize of key cache in 5.0 and 5.1)
[2 Nov 2006 14:23]
Ingo Strüwing
For the purpose of a better test coverage I changed the test script. This revealed deadlocks in the key cache code. I have no fix for it yet. But I guess it will require changes that need another review.
[14 Nov 2006 15:30]
Ingo Strüwing
After almost two weeks of fixing one small synchronization problem after the other and eating all the CPU I could get, I was thrown off the test machine. So I stopped working on this bug for now.
[30 Nov 2006 8:47]
Ingo Strüwing
I do now have test machines for it.
[8 Dec 2006 13:57]
Ingo Strüwing
My current test script. Run on an installed version (BASEDIR, DATADIR).
Attachment: bug17332-8.sh (application/x-sh, text), 24.24 KiB.
[11 Jan 2007 15:12]
Scott Wilson
Hi, I wasn't able to get this test script to work under freebsd (didn't have time to try very hard), but I did replicate the changing key_buffer_size causing server crash on both 5.0.24 and 5.0.32 under freebsd 6.1 64-bit. -scott
[31 Jan 2007 17:45]
Ingo Strüwing
My current test script. Run on an installed version (BASEDIR, DATADIR).
Attachment: bug17332-9.sh (application/x-sh, text), 14.30 KiB.
[31 Jan 2007 17:48]
Ingo Strüwing
Detailed description for todays changeset and some general key cache information.
Attachment: keycache-changes.txt (text/plain), 63.38 KiB.
[31 Jan 2007 17:49]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/19110 ChangeSet@1.2406, 2007-01-31 18:49:07+01:00, istruewing@chilla.local +7 -0 Bug#17332 - changing key_buffer_size on a running server can crash under load Resizing a key cache while it was in heavy use could crash the server. There were several race conditions. I reworked some of the algorithms to fix the race conditions. No test case. Repeating the crashes requires heavy concurrent load on the key cache. A test script is attached to the bug report. More explanations to the changes are contained in a text file attached to the bug report.
[27 Feb 2007 13:39]
Ingo Strüwing
I ran the above mentioned program (bug20540.c) successfully for two hours each on a Pentuim4, a Dual Xeon, and a Quad Itanium (all three with Linux). The latter two could not create 200 threads so I used 100 there.
[27 Feb 2007 19:42]
Ingo Strüwing
I repeated also tests with the original streams from thread1.zip and thread2.zip.
[22 Mar 2007 8:35]
Ingo Strüwing
Again I am having difficulties with the test machines. This will defer the completion of the tests.
[22 Mar 2007 12:13]
MySQL Verification Team
Ingo, the sql statement "load index into cache `idx`" also crashes if run concurrently. Just noting so you can test it too with your fixes. Uploading stack trace now.
[22 Mar 2007 12:15]
MySQL Verification Team
stack trace and variables
Attachment: load_index_crash.txt (text/plain), 2.93 KiB.
[23 Mar 2007 10:53]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/22752 ChangeSet@1.2496, 2007-03-23 11:52:45+01:00, istruewing@chilla.local +2 -0 Bug#17332 - changing key_buffer_size on a running server can crash under load After review fixes
[23 Mar 2007 11:17]
Ingo Strüwing
Shane, thank you. I did already notice a couple of problems with LOAD INDEX during my tests. I believe I have fixed them all.
[10 May 2007 15:22]
Ingo Strüwing
Please remember. This will first go to 5.1 only. If it doesn't cause problems for some time, it will go to 5.0 too.
[14 May 2007 9:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/26580 ChangeSet@1.2516, 2007-05-14 11:33:47+02:00, istruewing@chilla.local +1 -0 Bug#17332 - changing key_buffer_size on a running server can crash under load Post-post-review fixes. Fixed a typo == -> = Optimized normal flush at end of statement (FLUSH_KEEP), but let other flush types be stringent. Added comments. Fixed debugging.
[14 May 2007 9:53]
Ingo Strüwing
Queued to 5.1-engines. Note that it is intended to push it also to 5.0 after a probationary period. Docs Team: This patch lifts an restriction of the LOAD INDEX INTO CACHE statement. Only with the IGNORE LEAVES modifier we need to have the same block size on all indexes in a table. In other words, we can now load the indexes of a table even if they have different block sizes. But only if we load the leaves too. You can change the first sentence of the last paragraph in http://dev.mysql.com/doc/refman/5.1/en/load-index.html to: LOAD INDEX INTO CACHE ... IGNORE LEAVES fails unless all indexes in a table have the same block size.
[14 May 2007 18:52]
James Day
The restriction of the LOAD INDEX INTO CACHE statement is in bug #3705. I'm also adding a note there about the partial removal of the restriction.
[24 May 2007 7:05]
Bugs System
Pushed into 5.1.19-beta
[24 May 2007 15:02]
MySQL Verification Team
I tried the fix in 5.1.19-BK and couldn't crash the server, even when running 5000 'set global key_buffer_size' per second in 50 threads, all that in combination with huge inserts into many tables. Fix looks good for now!
[27 May 2007 18:48]
Paul DuBois
Noted in 5.1.19 changelog. Changing the size of a key buffer that is under heavy use could cause a server crash. The fix partially removes the limitation that LOAD INDEX INTO CACHE fails unless all indexes in a table have the same block size. Now the statement fails only if IGNORE LEAVES is specified. Resetting report to Patch Queued pending push of fix into 5.0.x.
[23 Jan 2008 15:41]
Paul DuBois
Closing this report. No changelog entry for 5.0; the patch may go into 5.0 after 5.1 has been GA for a while.
[31 Dec 2008 6:32]
MySQL Verification Team
i think a safer patch for 5.0 would be to simple disallow the dynamic changing of key_buffer_size.
[3 Aug 2009 12:38]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/79878 2793 Ingo Struewing 2009-08-03 Bug#17332 - changing key_buffer_size on a running server can crash under load Not to be pushed. Internal commit to document progress. This commit proposes a backport of MyISAM keycache changes from 5.1/5.4 to 5.0. It is complete in so far as code differences between the versions are minimal, it compiles and passes the test suite. However, exhaustive stress testing is yet required. The bug report contains a couple of test files and descriptions that can be used to set up the tests. @ include/keycache.h Bug#17332 - changing key_buffer_size on a running server can crash under load Added KEY_CACHE components in_resize and waiting_for_resize_cnt. @ myisam/mi_preload.c Bug#17332 - changing key_buffer_size on a running server can crash under load Added code to allow LOAD INDEX to load indexes of different block size. @ mysys/mf_keycache.c Bug#17332 - changing key_buffer_size on a running server can crash under load Changed resize_key_cache() to not disable the key cache after the flush phase. Changed queue handling to use standard functions. Wake all threads waiting on resize_queue. We can now have read/write threads waiting there (see below). Combined add_to_queue() and the wait loops that were always following it to the new function wait_on_queue(). Combined release_queue() and the condition that was always preceding it to the new function release_whole_queue(). Added code to flag and respect the exceptional situation BLOCK_IN_EVICTION. Rewrote the resize branch of find_key_block(). Added code to the eviction handling in find_key_block() to catch more exceptional cases. Changed key_cache_read(), key_cache_insert() and key_cache_write() so that they lock keycache->cache_lock whenever the key cache is initialized. Checking for a disabled cache and incrementing and decrementing the "resize counter" is always done within the lock. Locking and unlocking as well as counting the "resize counter" is now done once outside the loop. All three functions can now handle a NULL return from find_key_block. This happens in the flush phase of a resize and demands direct file I/O. Care is taken for secondary requests (PAGE_WAIT_TO_BE_READ) to wait in any case. Moved block status changes behind the copying of buffer data. key_cache_insert() does now read the block if the caller did supply less data than a full cache block. key_cache_write() does now take care of parallel running flushes (BLOCK_FOR_UPDATE, BLOCK_IN_FLUSHWRITE). Changed free_block() to un-initialize block variables in the correct order and respect an exceptional BLOCK_IN_EVICTION state. Changed flushing to take care for parallel running writes. Changed flushing to avoid freeing blocks in eviction. Changed flushing to consider that parallel writes can move blocks from the file_blocks hash to the changed_blocks hash. Changed flushing to take care for other parallel flushes. Changed flushing to assure that it ends with everything flushed. Optimized normal flush at end of statement (FLUSH_KEEP), but let other flush types be stringent. Added some comments and debugging statements. @ mysys/my_static.c Bug#17332 - changing key_buffer_size on a running server can crash under load Removed an unused global variable. @ sql/ha_myisam.cc Bug#17332 - changing key_buffer_size on a running server can crash under load Moved an automatic (stack) variable to the scope where it is used. @ sql/sql_table.cc Bug#17332 - changing key_buffer_size on a running server can crash under load Changed TL_READ to TL_READ_NO_INSERT in mysql_preload_keys.
[3 Aug 2009 12:49]
Ingo Strüwing
Back to verified so that someone else can take over. The patch above is not sufficiently tested yet. Please see the revision comment for details.
[13 Aug 2009 12:35]
MySQL Verification Team
just for the record, changing global key_cache_block_size is not safe in 5.0 either because it cause a resize/flush. 005F014A mysqld.exe!flush_all_key_blocks()[mf_keycache.c:2573] 005F0288 mysqld.exe!resize_key_cache()[mf_keycache.c:516] 0044CDAD mysqld.exe!ha_resize_key_cache()[handler.cc:2448] 004DC0CA mysqld.exe!sys_var_key_cache_long::update()[set_var.cc:2668] 004D948C mysqld.exe!set_var::update()[set_var.cc:3434] 004DC18E mysqld.exe!sql_set_variables()[set_var.cc:3319] 0053D40E mysqld.exe!mysql_execute_command()[sql_parse.cc:4124] 00541621 mysqld.exe!mysql_parse()[sql_parse.cc:6441] 0054262E mysqld.exe!dispatch_command()[sql_parse.cc:1963] 00543916 mysqld.exe!do_command()[sql_parse.cc:1646] 00543C18 mysqld.exe!handle_one_connection()[sql_parse.cc:1234] 005F538B mysqld.exe!pthread_start()[my_winthread.c:85] 006E080F mysqld.exe!_threadstart()[thread.c:196] 774FD0E9 kernel32.dll!BaseThreadInitThunk() 773819BB ntdll.dll!RtlInitializeExceptionChain() 7738198E ntdll.dll!RtlInitializeExceptionChain() Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 025C2AB8=set global key_cache_block_size=abs(-4676 thd->thread_id=659 5.1.37 had no such problems.
[14 Sep 2009 7:53]
Ingo Strüwing
The state of the work is short before push to the bugteam trees. The patch from 2009-08-03 passed all the tests I have access to. I merged it locally to 5.1 and pe. A test failure forces me to do investigation, though it is with high probability unrelated to my patch. I hope that I will be able to push this week, at the latest next week. However, since this push will go to the bugteam trees, I have no idea, how long it will take from there into a released version. Another, small risk could come from the fact that I plan to repeat the above mentioned tests on the merged 5.1 and pe trees.
[15 Sep 2009 9:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/83241 3591 Ingo Struewing 2009-09-15 [merge] Bug#17332 - changing key_buffer_size on a running server can crash under load Merge from 5.1, after backport from 5.1/5.4 to 5.0.
[16 Sep 2009 10:55]
Ingo Strüwing
Queued to mysql-5.0-bugteam, mysql-5.1-bugteam, and mysql-pe. Documentation note: This was a backport from 5.1/5.4 to 5.0. No functional change happened to 5.4. 5.1 did just inherit the fixes for: Bug#44068 - RESTORE can disable the MyISAM Key Cache Bug#40944 - Backup: crash after myisampack
[28 Sep 2009 7:37]
Ingo Strüwing
Hi Matt, I wonder, why you bug Lars. I forwarded you an email, which explains that the patch is in the bugteam trees. AFAIK it is not Lars' resposibility to push the bugteam trees into a release build. Regards Ingo
[30 Sep 2009 8:17]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20090929093622-1mooerbh12e97zux) (version source revid:ingo.struewing@sun.com-20090915092532-1mg4gxh6g96tjd57) (merge vers: 5.4.4-alpha) (pib:11)
[30 Sep 2009 8:20]
Bugs System
Pushed into 5.4.5-beta (revid:alik@sun.com-20090925094254-tjl9eajkzwzgthoe) (version source revid:alik@sun.com-20090918152344-nl5pzeugpejb2sth) (merge vers: 5.4.3-beta) (pib:11)
[1 Oct 2009 18:05]
Paul DuBois
Setting report to NDI pending push into 5.0.x, 5.1.x.
[6 Oct 2009 8:57]
Bugs System
Pushed into 5.0.87 (revid:joro@sun.com-20091006073202-rj21ggvo2gw032ks) (version source revid:kristofer.pettersson@sun.com-20090929151855-gvpblm4dnnubypdv) (merge vers: 5.0.87) (pib:11)
[6 Oct 2009 9:01]
Bugs System
Pushed into 5.1.40 (revid:joro@sun.com-20091006073316-lea2cpijh9r6on7c) (version source revid:ingo.struewing@sun.com-20090909151313-7eomhwe40vlgvtkd) (merge vers: 5.1.39) (pib:11)
[7 Oct 2009 1:04]
Paul DuBois
Noted in 5.0.87 changelog. Ignoring push into 5.1.40 because bug was already pushed into 5.1.19.
[20 Nov 2009 11:17]
Jonathan Delizy
Can you tell us in which 5.1 it has been fixed ? 5.1.19 or 5.1.40 ?
[20 Nov 2009 14:53]
Ingo Strüwing
5.1.19. The push to 5.1.40 was just a required upmerge from the backport to 5.0.
[18 Dec 2009 10:39]
Bugs System
Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20091218102229-64tk47xonu3dv6r6) (version source revid:jonas@mysql.com-20091218095730-26gwjidfsdw45dto) (merge vers: 5.1.41-ndb-7.1.0) (pib:15)
[18 Dec 2009 10:55]
Bugs System
Pushed into 5.1.41-ndb-6.2.19 (revid:jonas@mysql.com-20091218100224-vtzr0fahhsuhjsmt) (version source revid:jonas@mysql.com-20091217101452-qwzyaig50w74xmye) (merge vers: 5.1.41-ndb-6.2.19) (pib:15)
[18 Dec 2009 11:10]
Bugs System
Pushed into 5.1.41-ndb-6.3.31 (revid:jonas@mysql.com-20091218100616-75d9tek96o6ob6k0) (version source revid:jonas@mysql.com-20091217154335-290no45qdins5bwo) (merge vers: 5.1.41-ndb-6.3.31) (pib:15)
[18 Dec 2009 11:24]
Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20091218101303-ga32mrnr15jsa606) (version source revid:jonas@mysql.com-20091218064304-ezreonykd9f4kelk) (merge vers: 5.1.41-ndb-7.0.11) (pib:15)