Description:
TC_LOG_MMAP::log_xid [log.cc] does not acquire locks in a consistent order:
- it acquires active->lock while holding LOCK_active
- it acquires LOCK_sync while holding active->lock
- it acquires LOCK_active while holding LOCK_sync
Hence, if threads T1, T2, T3 execute TC_LOG_MMAP::log_xid [log.cc] in parallel, it is possible to acquire locks in the order outlined in the following table, resulting in a deadlock. [Time increases from top to bottom. + means the lock is acquired, and - means the lock is released. In the last three rows, the threads try to acquire locks held by one of the other threads. We are assuming that syncing==0 (so the body of "if (syncing)" is not executed) and (active&&!active->free)==0 (so the body of that while loop is not executed).]
T1 T2 T3
+ LOCK_active
+ active->lock
- LOCK_active
+ LOCK_active
+ LOCK_sync
- active->lock
+ active->lock
- LOCK_active
+ LOCK_active
+ active->lock [T2]
+ LOCK_sync [T1]
+ LOCK_active [T3]
How to repeat:
Inspect the code.
Probably difficult to reproduce without augmenting the code. Requires the server to be run with binlogging turned off.
Suggested fix:
Whoever fixes this, plese don't just submit a one-line patch. Instead, create a rigorous design for how replication locks interact. A usable design needs the following components:
1. For each lock, specify precisely what it guards.
2. For each lock, specify which threads use it (e.g., the io thread, the sql thread, or client threads).
3. For each function that uses one or more locks, specify which locks are acquired or released, under which conditions.
4. Specify a linear order on locks, and make sure that "lock A acquired while lock B is held" implies "lock A > lock B".
We have 28 locks in replication files (some possibly aliases of others):
$ cd sql
$ grep -i pthread_mutex_lock slave.{cc,h} rpl_*.{cc,h} sql_binlog.cc repl_failsafe.{cc,h} log*.{cc,h} | sed 's/\([a-zA-Z0-9_>\.-]*lock[a-zA-Z0-9_]*\)/\n\1\n/gi' | grep -i lock | sort | uniq