Bug #25306 | Race conditions during replication slave shutdown | ||
---|---|---|---|
Submitted: | 28 Dec 2006 7:37 | Modified: | 31 May 2007 7:17 |
Reporter: | Ingo Strüwing | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.0.34 | OS: | Any |
Assigned to: | Mats Kindahl | CPU Architecture: | Any |
[28 Dec 2006 7:37]
Ingo Strüwing
[8 Jan 2007 10:26]
Rafal Somla
The four valgrind stacks reported here can be divided into two groups. Stacks #1, #3 and #4 are caused by accessing memory free'd inside end_slave() while stack #2 is related to memory management problems in slave thread. I created a separate BUG#25463 report with stack #2 (as it would probably need a separate solution). From now on this bug will concern only stacks #1, #3 and #4.
[8 Jan 2007 10:32]
Rafal Somla
Most probably the problem is in doing condition broadcast or mutex lock on condition/mutex from MASTER_INFO structure which has been already free'd inside end_slave(). This suggests some race conditions during replication slave shutdown. Unfortunatelly, valgrind doesn't say where the condition/mutex operations were called inside our code...
[8 Feb 2007 19:51]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/19582 ChangeSet@1.2392, 2007-02-03 20:14:16+01:00, rafal@quant.(none) +2 -0 BUG#25306 (Race conditions during replication slave shutdown (valgrind stacks)): The possibility of the race is removed by changing sequence of calls pthread_mutex_unlock(&mi->run_lock); pthread_cond_broadcast(&mi->stop_cond); into pthread_cond_broadcast(&mi->stop_cond); pthread_mutex_unlock(&mi->run_lock); at the end of I/O thread (similar change at the end of SQL thread). This ensures that no thread waiting on the condition executes between the broadcast and the unlock and thus can't delete the mi structure which caused the bug.
[12 Feb 2007 15:30]
Chuck Bell
The patch looks good. The explanation and patch are well documented.
[26 Feb 2007 15:37]
Rafal Somla
Looking at the issue
[28 Feb 2007 10:15]
Rafal Somla
The issue with rpl_ddl test is not related to the fix I made. The problem is present in the before-patch tree. I filled a separate BUG#26687 report. I revert to "patch approved" as the fix was already approved by rewievers.
[28 Feb 2007 10:52]
Rafal Somla
Pushed into mysql-5.0-rpl tree.
[1 Mar 2007 9:53]
Rafal Somla
Merged into 5.1-rpl tree.
[8 Mar 2007 8:13]
Andrei Elkin
pushed to 5.0.38,5.1.17-beta
[9 Mar 2007 14:26]
MC Brown
Internal change, no entry has been recorded in the changelog.
[19 Jun 2007 13:58]
Andrei Elkin
Bug #27675 mysqld safe_mutex assert at shutdown can relate to the latest finding. Terminating thread (after mysqladmin shutdown) does not lock mi->run_lock mutex (differently from a connection that STOP slave). That must be the reason of Bug#27675.