Bug #2921 | Replication problem on mutex lock in mySQL-4.0.18 | ||
---|---|---|---|
Submitted: | 22 Feb 2004 14:11 | Modified: | 11 Mar 2004 7:27 |
Reporter: | Dathan Pattishall | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 4.0.18 | OS: | Linux (RedHat 7.3) |
Assigned to: | Michael Widenius | CPU Architecture: | Any |
[22 Feb 2004 14:11]
Dathan Pattishall
[23 Feb 2004 10:39]
Dathan Pattishall
mysql> show processlist; +----------+-------------+------------------+------------+---------+-------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----------+-------------+------------------+------------+---------+-------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+ | 2 | system user | | NULL | Connect | 86957 | Has read all relay log; waiting for the I/O slave thread to update it | NULL | | 37105473 | mon | 10.10.1.19:50788 | mysql | Query | 79839 | Waiting for slave thread to start | SLAVE START | | 37105474 | mon | 10.10.1.19:50789 | mysql | Query | 79839 | NULL | SLAVE START | | 37105477 | system user | | NULL | Connect | 79839 | Waiting for slave mutex on exit | NULL | | 46076237 | root | localhost | NULL | Sleep | 254 | | NULL | | 46096195 | root | localhost | NULL | Query | 0 | NULL | show processlist
[28 Feb 2004 15:45]
Guilhem Bichot
Thanks for your very good bug report! Comment for myself: | 37105473 | mon | 10.10.1.19:50788 | mysql | Query | 79839 | Waiting for slave thread to start | SLAVE START | | 37105474 | mon | 10.10.1.19:50789 | mysql | Query | 79839 | NULL | SLAVE START First SLAVE START calls lock_slave_threads() which locks mi->run_lock and rli->run_lock. Then it wants to start the I/O thread: it creates this thread, then wants to wait for this thread to say "I have done all start steps, I'm ready"; for this wait it goes into a pthread_cond_wait(...,mi->run_lock) thus releasing mi->run_lock. So when it's waiting on the condition, it has only rli->run_lock, and not mi->run_lock (one sees the only problem: unlocking must of course be done in the reverse order of locking). Then the 2nd START SLAVE comes; it calls lock_slave_threads(), which successfully locks mi->run_lock, then blocks because rli->run_lock is locked by the 1st. When 1st wakes up, pthread_cond_wait() tries to lock mi->run_lock, but it's locked by the 2nd so it blocks. Deadlock. Now I just have to fix it :)
[9 Mar 2004 23:26]
Michael Widenius
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bugfix, yourself. More information about accessing the source trees is available at http://www.mysql.com/doc/en/Installing_source_tree.html Additional info: I fixed this by changing so that the SQL thread is started first. This ensures that the mutex are unlocked in the right order. Fix will be in 4.0.19 and 4.1.2
[11 Mar 2004 7:27]
Guilhem Bichot
Fixed in 4.0 ChangeSet@1.1738.1.1, 2004-03-11 16:23:35+01:00, guilhem@mysql.com (using LOCK_active_mi).