Bug #38240 | crash in rotate_relay_log during 'flush logs' and 'kill' or shutdown of server | ||
---|---|---|---|
Submitted: | 19 Jul 2008 15:49 | Modified: | 17 Mar 2010 14:00 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.0.66a-debug, 5.0.89, 5.1.26-debug, 5.1.30 | OS: | Any |
Assigned to: | Andrei Elkin | CPU Architecture: | Any |
Tags: | active_mi |
[19 Jul 2008 15:49]
Philip Stoev
[11 Aug 2008 10:35]
MySQL Verification Team
Philip, I got this crash on 5.0.66a. I ran FLUSH LOGS 20 times per second while doing inserts and then 'mysqladmin shutdown' killed the server. Looks like a race condition with mutex lock/free ? mi was null in my case: mysqld-debug.exe!rotate_relay_log(st_master_info * mi=0x00000000) Line 5266 mysqld-debug.exe!reload_acl_and_cache Line 7099 mysqld-debug.exe!mysql_execute_command Line 4271 mysqld-debug.exe!mysql_parse Line 6178 mysqld-debug.exe!dispatch_command Line 1875 mysqld-debug.exe!do_command Line 1581 mysqld-debug.exe!handle_one_connection mysqld-debug.exe!pthread_start
[11 Aug 2008 10:35]
MySQL Verification Team
I meant mi is freed before the mutex is not needed anymore..
[11 Aug 2008 10:37]
Philip Stoev
Shane, many thanks for reproducing this. Can you please attach a test case and then set this bug to Open/Verified? I got this crash only once, that is why the bug is in Analyzing. Thank you!
[14 Aug 2008 16:43]
Philip Stoev
Shane, I am transferring this bug back to you. Please attach the test case you have and then set it to "Open" or "Verified". Thank you.
[20 Aug 2008 2:08]
MySQL Verification Team
to repeat: --------------- setup a single debug mysql server to replicate to itself: mysqld-debug --console --skip-grant-tables --server-id=5 --log-bin --port=3306 --replicate-same-server-id --slave-skip-errors=1050 --skip-innodb change master to master_host='127.0.0.1', master_port=3306, master_user='root', master_password=''; start slave; run the attached bug38240.c testcase. while it runs, execute 'mysqladmin shutdown'. if no crash happens, start the server and retry it. it's timing related.
[20 Aug 2008 2:20]
MySQL Verification Team
testcase that will expose a few different assertions if run as described above.
Attachment: bug38240.c (text/plain), 5.81 KiB.
[21 Aug 2008 10:53]
Lars Thalmann
Shane Bester wrote: > The following may very well all be part of the same overall bug: > > Bug#38240 > Bug#38715 > Bug#38716 > > The above 3 all relate to various locking/mutex issues in the > replication code during concurrent flush logs/stop/start > slave/reset slave. > > A reason for different bug reports was slightly different crashes in > each case. I'm guessing the devs will eventually set them to > duplicates and fix everything in one go.
[29 Nov 2008 9:20]
MySQL Verification Team
still happens in 5.1.30 release build: 0000000078EF1254 ntdll.dll!RtlEnterCriticalSection() 000000014012E6EA mysqld.exe!rotate_relay_log()[slave.cc:4025] 0000000140198EB5 mysqld.exe!reload_acl_and_cache()[sql_parse.cc:6668] 00000001401A03A9 mysqld.exe!mysql_execute_command()[sql_parse.cc:3859] 00000001401A2F06 mysqld.exe!mysql_parse()[sql_parse.cc:5791] 00000001401A3C1A mysqld.exe!dispatch_command()[sql_parse.cc:1202] 00000001401A4CD7 mysqld.exe!do_command()[sql_parse.cc:857] 0000000140246327 mysqld.exe!handle_one_connection()[sql_connect.cc:1115] 00000001402B82C5 mysqld.exe!pthread_start()[my_winthread.c:85] 00000001403CAC37 mysqld.exe!_callthreadstart()[thread.c:295] 00000001403CAD05 mysqld.exe!_threadstart()[thread.c:275] 0000000077D6B69A kernel32.dll!BaseThreadStart() Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 00000000033A68B0=flush logs thd->thread_id=2 thd->killed=KILL_CONNECTION
[16 Jun 2009 15:49]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76404 2801 Andrei Elkin 2009-06-16 Bug #38240 Crash in safe_mutex_lock () thr_mutex.c line 97 on rotate_relay_log () The reason for the crash was rotate_relay_log (mi=0x0) did not verify the passed value of mi, but might have returned early seeing it as NULL. Fixed with deploying a check of the mi arg value.
[16 Jun 2009 16:01]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76406 2802 Andrei Elkin 2009-06-16 Bug #38240 Crash in safe_mutex_lock () thr_mutex.c line 97 on rotate_relay_log deploying an assert to guard the only possible mi == NULL case may be as a result of killing of running rotate_relay_log() thread
[16 Jun 2009 20:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76414 2801 Andrei Elkin 2009-06-16 Bug #38240 Crash in safe_mutex_lock () thr_mutex.c line 97 on rotate_relay_log The reason for the crash was rotate_relay_log (mi=0x0) did not verify the passed value of active_mi. There are more cases where active_mi is supposed to be non-zero e.g change_master(), stop_slave(), and it's reasonable to protect from a similar crash all of them with common fixes. Fixed with spliting end_slave() on slave threads release and slave data clean-up parts (a new close_active_mi()). The new function is invoked at the very end of close_connections() so that all users of active_mi are proven to have left.
[17 Jun 2009 9:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76430 2801 Andrei Elkin 2009-06-17 Bug #38240 Crash in safe_mutex_lock () thr_mutex.c line 97 on rotate_relay_log The reason for the crash was rotate_relay_log (mi=0x0) did not verify the passed value of active_mi. There are more cases where active_mi is supposed to be non-zero e.g change_master(), stop_slave(), and it's reasonable to protect from a similar crash all of them with common fixes. Fixed with spliting end_slave() on slave threads release and slave data clean-up parts (a new close_active_mi()). The new function is invoked at the very end of close_connections() so that all users of active_mi are proven to have left.
[23 Jun 2009 9:10]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76882 2972 Andrei Elkin 2009-06-23 Bug #38240 Crash in safe_mutex_lock () thr_mutex.c line 97 on rotate_relay_log The reason for the crash was rotate_relay_log (mi=0x0) did not verify the passed value of active_mi. There are more cases where active_mi is supposed to be non-zero e.g change_master(), stop_slave(), and it's reasonable to protect from a similar crash all of them with common fixes. Fixed with spliting end_slave() in slave threads release and slave data clean-up parts (a new close_active_mi()). The new function is invoked at the very end of close_connections() so that all users of active_mi are proven to have left.
[8 Jul 2009 13:30]
Bugs System
Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source revid:aelkin@mysql.com-20090623091004-x0q3ig5ktrwf7skf) (merge vers: 5.1.37) (pib:11)
[9 Jul 2009 7:37]
Bugs System
Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source revid:aelkin@mysql.com-20090623091004-x0q3ig5ktrwf7skf) (merge vers: 5.1.37) (pib:11)
[9 Jul 2009 11:14]
Jon Stephens
Documented in the 5.1.37 changelog as follows: In certain cases, executing a great many FLUSH LOGS statements during replication could cause the master to crash.
[10 Jul 2009 11:21]
Bugs System
Pushed into 5.4.4-alpha (revid:anozdrin@bk-internal.mysql.com-20090710111017-bnh2cau84ug1hvei) (version source revid:aelkin@mysql.com-20090623113739-wmdndanxrv5723gu) (merge vers: 5.4.4-alpha) (pib:11)
[26 Aug 2009 13:46]
Bugs System
Pushed into 5.1.37-ndb-7.0.8 (revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)
[26 Aug 2009 13:46]
Bugs System
Pushed into 5.1.37-ndb-6.3.27 (revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc) (version source revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc) (merge vers: 5.1.37-ndb-6.3.27) (pib:11)
[26 Aug 2009 13:48]
Bugs System
Pushed into 5.1.37-ndb-6.2.19 (revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4) (version source revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4) (merge vers: 5.1.37-ndb-6.2.19) (pib:11)
[27 Aug 2009 16:33]
Bugs System
Pushed into 5.1.35-ndb-7.1.0 (revid:magnus.blaudd@sun.com-20090827163030-6o3kk6r2oua159hr) (version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)
[26 Jan 2010 23:33]
James Day
Can this really crash the master? Looks like something that can really only affect the slave?
[27 Jan 2010 8:52]
Sven Sandberg
James, I think you are right, the reference to master in the changelog entry is probably a typo. It should only affect servers cofigured as slaves.
[28 Jan 2010 9:03]
Jon Stephens
The changelog entry currently reads, "Replication: Shutting down the server while executing FLUSH LOGS, CHANGE MASTER TO, or STOP SLAVE could sometimes cause mysqld to crash. (Bug#38240)" It doesn't say anything about the master. Would you prefer I change "the server" and/or "mysqld" to "the slave"? Thanks!
[29 Jan 2010 5:59]
MySQL Verification Team
still affects 5.0.89, both release and debug binaries crash during shutdown: ntdll.dll!RtlEnterCriticalSection() mysqld-debug.exe!rotate_relay_log()[slave.cc:5345] mysqld-debug.exe!reload_acl_and_cache()[sql_parse.cc:7394] mysqld-debug.exe!mysql_execute_command()[sql_parse.cc:4521] mysqld-debug.exe!mysql_parse()[sql_parse.cc:6459] mysqld-debug.exe!dispatch_command()[sql_parse.cc:1959] mysqld-debug.exe!do_command()[sql_parse.cc:1640] mysqld-debug.exe!handle_one_connection()[sql_parse.cc:1233] mysqld-debug.exe!pthread_start()[my_winthread.c:85] mysqld-debug.exe!_threadstart()[thread.c:196] kernel32.dll!FlsSetValue() t some variables. s may be invalid and cause the dump to abort... t 053F37D8=flush logs
[29 Jan 2010 21:11]
James Day
Jon, yes, please say "replication slave" or similar to make it unambiguous about what can be affect.
[2 Feb 2010 10:44]
Andrei Elkin
Jon, I comfirm James' suggestion. cheers, Andrei.
[4 Feb 2010 13:51]
Jon Stephens
Modified changelog entry, which now reads: Shutting down the slave while executing FLUSH LOGS, CHANGE MASTER TO, or STOP SLAVE could sometimes cause it to crash. Set Need Merge status, because (a) it appears that this was supposed go into 5.0 as well, but never made it there, and (b) if this affects later versions, then it should be merged to them as well.
[17 Mar 2010 11:52]
Luis Soares
See also: BUG#50364.
[17 Mar 2010 14:00]
Jon Stephens
Appears no further changelog entries should be required, so I'm closing. Thanks!