MySQL Bugs: #27675: mysqld safe_mutex assert at shutdown

Bug #27675	mysqld safe_mutex assert at shutdown
Submitted:	5 Apr 2007 18:58	Modified:	11 Aug 2008 8:43
Reporter:	Tomas Ulin	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.1.20	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	pbfail, sr5_1

Description:
observed assert on mysqld shutdown in mysql-5.1-telco

Thu Apr 5 11:54:35 2007 tulin [C=1] (13 lines)
    tomas [C=1]
Download

070405 18:16:43 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000002' at position 769
070405 18:16:43 [Note] Event Scheduler: Purging the queue. 0 events
NDB: Found 2 NdbTransaction's that have not been released
NDB: Found 1 NdbReceiver that has not been released
safe_mutex: Trying to destroy a mutex that was locked at slave.cc, line 2482 at rpl_rli.cc, line 66
070405 18:16:43 - mysqld got signal 6;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=2
max_threads=151
threads_connected=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60345 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x40a84850, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x66e017
0x2b15b9d58aa5
0x10e0310
New value of fp=0x40a85940 failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/using-stack-trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do 
resolve it
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
C

How to repeat:
.
problem mutex is run_lock

destroy attempt in 

st_relay_log_info::~st_relay_log_info()
{
  DBUG_ENTER("st_relay_log_info::~st_relay_log_info");

  pthread_mutex_destroy(&run_lock);

locked in:

pthread_handler_t handle_slave_sql(void *arg)
{
...
  VOID(pthread_mutex_unlock(&LOCK_thread_count));
  thd->proc_info = "Waiting for slave mutex on exit";
  pthread_mutex_lock(&rli->run_lock);
  /* We need data_lock, at least to wake up any waiting master_pos_wait() */
  pthread_mutex_lock(&rli->data_lock);
...

I upgraded this bug to sr5_1 as it causes our test system to fail.
It's also a common crashing bug that causes back traces in our logs and is likely to confuse our users

The problem is most probably because shutting down thread invokes end_slave()
which cleans up active_mi struct without carring on the mutex:s slave threads may be holding at the moment.

Notice, that a similar activity with doing `STOP slave' also terminates slave threads but the terminator waits till slave unlock the mutex:s via
unlock_slave_threads(mi) at stop_slave().

However, end_slave() is different in that it does not lock the mutex:s.

Hence end_slave() can not just call unlock_slave_threads(mi) which otherwise would be safe and enough.

I'd suggest to change end_slave() to lock, wait for and release the mutex:s as stop_slave() does.

This bug does not show up, after repeated attempts to reproduce.  Without a test case, it can't be repeated.  Original reporter will re-open it if it shows up again.

When I was writting a remark that previously Bug#25306 could leave a
hole in not to handle mysqladmin shutdown I was under wrong impression
that the killing thread does not preemtn mutexes the slave threads may
own.

The fact is the killer is to acquire them and this does not let 
a scenario I left on the bug page.

Bug #25245 marked as a duplicate of this bug.

duplicate of bug #38694