| Bug #17332 | changing key_buffer_size on a running server can crash under load | ||
|---|---|---|---|
| Submitted: | 12 Feb 2006 22:38 | Modified: | 23 Jan 2008 16:41 |
| Reporter: | Shane Bester | ||
| Status: | Closed | ||
| Category: | Server: MyISAM | Severity: | S2 (Serious) |
| Version: | 4.1BK,5.0.18, 5.0.19-bk,5.0.21-bk | OS: | Linux (Linux,Windows) |
| Assigned to: | Ingo Strüwing | Target Version: | |
| Tags: | myisam, load index, key_buffer_size, crash | ||
| Triage: | D1 (Critical) | ||
[12 Feb 2006 22:38]
Shane Bester
[12 Feb 2006 22:39]
Shane Bester
insert statements
Attachment: thread1.zip (application/zip, text), 4.24 KiB.
[12 Feb 2006 22:42]
Shane Bester
set key_buffer_size
Attachment: thread2.zip (application/zip, text), 190.47 KiB.
[12 Feb 2006 23:43]
Shane Bester
proper backtrace, using 5.0.19-debug on Windows.
Attachment: win_stack_trace.txt (text/plain), 1.46 KiB.
[17 Feb 2006 16:02]
Ingo Strüwing
It took me three attempts with four threads to repeat it. This means that I have to do a lot of tests until I can feel positive about a fix. It is a keycache locking problem. When it crashed for me, two threads were flushing the cache. One for its inserts, one for the cache resizing. Both had a list of modified blocks (probably the same list) and tried to write them out and free each block. When freeing they assumed that every block from the list is not freed yet... I need to learn more about the keycache architecture until I can propose a reasonable fix.
[6 Mar 2006 15:08]
Ingo Strüwing
After fixing a couple of crash paths I have now infinite loops. Need more time.
[14 Mar 2006 13:46]
Ingo Strüwing
I need more time to fix this satisfactorily. When I started I could quickly repeat a crash. I saw almost instantly what happened and found a way to fix it. But then it crashed at another place. After fixing about four crash paths, I decided to rework the key cache locking more fundamentally. After this I was no longer able to produce any crashes. But now I saw index corruptions. But only after about 100 of the proposed tests on a 4-CPU machine with 4 threads competing on the key cache. This is pretty good, but I'm not satisfied yet. Unfortunately, I don't know if the corruptions result from my changes, or if they were present before, just masked by the crashes. Finding the corruptions is much more complicated than finding the crashes. While the crashes leave a stack backtrace behind, which points directly to the place in code where a problem exists, the corruption is only deteced after a command is complete. There is no hint, where in the code it had been corrupted. I plan to instrument the code with a lot of counters, hoping that I would see if one or more counters were increased before the corruption, but not increased for other commands. But this takes some time.
[5 Apr 2006 8:59]
Ingo Strüwing
Other, higher priority tasks are stepping in. And I still have no clue what could cause the index corruptions. It is difficult to track down due to the high load required to repeat them.
[20 Apr 2006 23:28]
Shane Bester
5.0.21-bk, crash when INSERT and key_cache_block_size changes.
Attachment: key_cache_block_size.stack.txt (text/plain), 2.09 KiB.
[20 Apr 2006 23:34]
Shane Bester
Ingo, I have a C load testing program to generate these crashes within ~1 minute.
Attached is another crash, but instead of using key_buffer_size, I run the following to
provoke it:
50 threads: SET GLOBAL key_cache_block_size={rand(4096,1048576)}
10 threads: INSERT INTO <table> with primary key.
Crash looks similiar to the others reported.
!unlink_block Line 1011 + 0xb C
!free_block Line 2226 + 0xd C
!flush_cached_blocks Line 2303 + 0xd C
!flush_key_blocks_int Line 2456 + 0x1f C
!flush_all_key_blocks Line 2580 + 0x15 C
!resize_key_cache) Line 515 + 0x9 C
!ha_resize_key_cache Line 2259 + 0x32 C++
!sys_var_key_cache_long::update Line 2434 + 0x9 C++
!set_var::update Line 3101 + 0x1b C++
!sql_set_variables Line 2986 + 0xf C++
!mysql_execute_command Line 3530 + 0x10 C++
!mysql_parse Line 5709 + 0x9 C++
!dispatch_command Line 1719 + 0x1d C++
!do_command Line 1515 + 0x31 C++
!handle_one_connection Line 1158 + 0x9 C++
!pthread_start Line 63 + 0x7 C
!_threadstart Line 196 + 0xd C
kernel32.dll!_BaseThreadStart@8() + 0x52
[20 Apr 2006 23:35]
Shane Bester
above crash is on today's 5.0.21-bk built on windows.
[26 Apr 2006 12:44]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/5557
[26 Apr 2006 12:47]
Ingo Strüwing
Back to "In progress". The patch is not yet complete. It is just a preview for testing. Shane, if you have time, please apply the patch to a clean 5.0 tree and test with it. Regards, Ingo
[10 May 2006 12:45]
Ingo Strüwing
I dedetcted an unrelated bug (Bug #19604) that I need to fix before I can continue testing.
[1 Jun 2006 15:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/7163
[23 Jun 2006 3:01]
Shane Bester
multithreaded C program. see top of file for compile details
Attachment: bug20540.c (text/x-csrc), 5.37 KiB.
[23 Jun 2006 3:03]
Shane Bester
Ingo, Sorry for delay. compile the attached C program and run like this to kill the server: create table test(id int not null auto_increment primary key); Then, ./bug20540 200 60 10 1 "insert into test values (),(),();set global key_buffer_size=2048675;set global key_buffer_size=1048576;insert into test values (),(),()" 192.168.250.3 3306 test root crash happened within 1 minute on my box, over ethernet.
[26 Jun 2006 12:51]
Ingo Strüwing
The last patch had a problem with FLUSH_IGNORE_CHANGED.
[26 Jun 2006 12:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/8239
[18 Jul 2006 17:10]
Ingo Strüwing
I will split the fix in smaller pieces for easier review.
[1 Sep 2006 12:35]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/11242 ChangeSet@1.2208, 2006-09-01 12:34:55+02:00, istruewing@chilla.local +2 -0 Bug#17332 - changing key_buffer_size on a running server can crash under load Resizing a key cache while it was in heavy use could crash the server. There were several race conditions. I reworked some of the algorithms to fix the race conditions.
[4 Oct 2006 20:07]
RIchard Murphy
I am experiencing what would seem to be the same problem using Ver 14.7 Distrib 4.1.20, for pc-linux-gnu
[1 Nov 2006 16:54]
Ingo Strüwing
It is quite possible that this bug exists in 4.1 too. I'll check if we should fix it there too.
[1 Nov 2006 17:38]
Shane Bester
yes, 4.1.22 is also affected..
Attachment: 4.1_bk.txt (plain/text, text), 1010 bytes.
[2 Nov 2006 15:12]
Ingo Strüwing
In a meeting with Monty, Sanja, and Ingo we came to the following conclusions: - Ingo's last patch solves a couple of critical problems and needs to be added to 5.0 and 5.1. [Comment by Ingo: and probably 4.1, but this has not been decided yet.] - Sanja and Monty have done a full review of every single line of Ingo's last patch and only found one problem (which Ingo will correct). - We don't think that Igor's proposed solution would create a smaller patch. What is sure it would be partly same as Ingo's patch, but less efficient. - Monty, Sanja, and Ingo did an extra review of all code that is not part of the new resize code to ensure that there are no new bugs in normal operation (ie, when we don't do a resize). (As resize is already broken, it's not as critical if there would be new bugs in this code) - Monty found some small cleanup things in the patch that Ingo has promised to fix (mostly to move some repeated code into functions to make the code easier to manage) - Some of the key cache description of Ingos email will be moved to internal docs, some will be moved into mf_keycache.c - Ingo will spend 3-6 hours on running more test with code coverage in an effort to ensure that as many as possible of the changed lines are tested. To ensure we don't break 5.0 (especially as we may not get extensive community testing on this), we come up with the following plan: - Ingo will add the patch in 5.1. If we within one month after the next 5.1 release don't get any bug reports for the new code in the key cache, he will also apply the code in the 5.0 tree. (The alternative would be to disable resize of key cache in 5.0 and 5.1)
[2 Nov 2006 15:23]
Ingo Strüwing
For the purpose of a better test coverage I changed the test script. This revealed deadlocks in the key cache code. I have no fix for it yet. But I guess it will require changes that need another review.
[14 Nov 2006 16:30]
Ingo Strüwing
After almost two weeks of fixing one small synchronization problem after the other and eating all the CPU I could get, I was thrown off the test machine. So I stopped working on this bug for now.
[30 Nov 2006 9:47]
Ingo Strüwing
I do now have test machines for it.
[8 Dec 2006 14:57]
Ingo Strüwing
My current test script. Run on an installed version (BASEDIR, DATADIR).
Attachment: bug17332-8.sh (application/x-sh, text), 24.24 KiB.
[11 Jan 2007 16:12]
Scott Wilson
Hi, I wasn't able to get this test script to work under freebsd (didn't have time to try very hard), but I did replicate the changing key_buffer_size causing server crash on both 5.0.24 and 5.0.32 under freebsd 6.1 64-bit. -scott
[31 Jan 2007 18:45]
Ingo Strüwing
My current test script. Run on an installed version (BASEDIR, DATADIR).
Attachment: bug17332-9.sh (application/x-sh, text), 14.30 KiB.
[31 Jan 2007 18:48]
Ingo Strüwing
Detailed description for todays changeset and some general key cache information.
Attachment: keycache-changes.txt (text/plain), 63.38 KiB.
[31 Jan 2007 18:49]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/19110 ChangeSet@1.2406, 2007-01-31 18:49:07+01:00, istruewing@chilla.local +7 -0 Bug#17332 - changing key_buffer_size on a running server can crash under load Resizing a key cache while it was in heavy use could crash the server. There were several race conditions. I reworked some of the algorithms to fix the race conditions. No test case. Repeating the crashes requires heavy concurrent load on the key cache. A test script is attached to the bug report. More explanations to the changes are contained in a text file attached to the bug report.
[27 Feb 2007 14:39]
Ingo Strüwing
I ran the above mentioned program (bug20540.c) successfully for two hours each on a Pentuim4, a Dual Xeon, and a Quad Itanium (all three with Linux). The latter two could not create 200 threads so I used 100 there.
[27 Feb 2007 20:42]
Ingo Strüwing
I repeated also tests with the original streams from thread1.zip and thread2.zip.
[22 Mar 2007 9:35]
Ingo Strüwing
Again I am having difficulties with the test machines. This will defer the completion of the tests.
[22 Mar 2007 13:13]
Shane Bester
Ingo, the sql statement "load index into cache `idx`" also crashes if run concurrently. Just noting so you can test it too with your fixes. Uploading stack trace now.
[22 Mar 2007 13:15]
Shane Bester
stack trace and variables
Attachment: load_index_crash.txt (text/plain), 2.93 KiB.
[23 Mar 2007 11:53]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/22752 ChangeSet@1.2496, 2007-03-23 11:52:45+01:00, istruewing@chilla.local +2 -0 Bug#17332 - changing key_buffer_size on a running server can crash under load After review fixes
[23 Mar 2007 12:17]
Ingo Strüwing
Shane, thank you. I did already notice a couple of problems with LOAD INDEX during my tests. I believe I have fixed them all.
[10 May 2007 17:22]
Ingo Strüwing
Please remember. This will first go to 5.1 only. If it doesn't cause problems for some time, it will go to 5.0 too.
[14 May 2007 11:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/26580 ChangeSet@1.2516, 2007-05-14 11:33:47+02:00, istruewing@chilla.local +1 -0 Bug#17332 - changing key_buffer_size on a running server can crash under load Post-post-review fixes. Fixed a typo == -> = Optimized normal flush at end of statement (FLUSH_KEEP), but let other flush types be stringent. Added comments. Fixed debugging.
[14 May 2007 11:53]
Ingo Strüwing
Queued to 5.1-engines. Note that it is intended to push it also to 5.0 after a probationary period. Docs Team: This patch lifts an restriction of the LOAD INDEX INTO CACHE statement. Only with the IGNORE LEAVES modifier we need to have the same block size on all indexes in a table. In other words, we can now load the indexes of a table even if they have different block sizes. But only if we load the leaves too. You can change the first sentence of the last paragraph in http://dev.mysql.com/doc/refman/5.1/en/load-index.html to: LOAD INDEX INTO CACHE ... IGNORE LEAVES fails unless all indexes in a table have the same block size.
[14 May 2007 20:52]
James Day
The restriction of the LOAD INDEX INTO CACHE statement is in bug #3705. I'm also adding a note there about the partial removal of the restriction.
[24 May 2007 9:05]
Bugs System
Pushed into 5.1.19-beta
[24 May 2007 17:02]
Shane Bester
I tried the fix in 5.1.19-BK and couldn't crash the server, even when running 5000 'set global key_buffer_size' per second in 50 threads, all that in combination with huge inserts into many tables. Fix looks good for now!
[27 May 2007 20:48]
Paul DuBois
Noted in 5.1.19 changelog. Changing the size of a key buffer that is under heavy use could cause a server crash. The fix partially removes the limitation that LOAD INDEX INTO CACHE fails unless all indexes in a table have the same block size. Now the statement fails only if IGNORE LEAVES is specified. Resetting report to Patch Queued pending push of fix into 5.0.x.
[23 Jan 2008 16:41]
Paul DuBois
Closing this report. No changelog entry for 5.0; the patch may go into 5.0 after 5.1 has been GA for a while.
[31 Dec 2008 7:32]
Shane Bester
i think a safer patch for 5.0 would be to simple disallow the dynamic changing of key_buffer_size.
