Bug #32771 events_bugs.test fails randomly
Submitted: 27 Nov 2007 14:54 Modified: 21 Apr 19:42
Reporter: Ingo Strüwing
Status: Closed
Category:Tests: Server Severity:S1 (Critical)
Version:6.0.4 OS:Linux (Debian x64)
Assigned to: Konstantin Osipov Target Version:
Tags: crash

[27 Nov 2007 14:54] Ingo Strüwing
Description:
This may be difficult to repeat. It happened just once to me.

main.events_bugs               [ fail ]  timeout

Stopping All Servers
Warning;  Aborted waiting on pid file:
'/home2/mydev/testdir-6.0-axmrg-2/mysql-test/var/run/master.pid' after 70 seconds
Restoring snapshot of databases
Saving core.24823
Saving core.29364
Resuming Tests

The first core is from crash_commit_before.
The second core shows this backtrace:

abort () from /lib/libc.so.6
safe_mutex_lock (mp=0x1d52e90, try_lock=0 '\0', file=0xdde4b5 "event_scheduler.cc",
line=702) at thr_mutex.c:103
Event_scheduler::lock_data (this=0x1d52e90, func=0xdded4e "is_running", line=587) at
event_scheduler.cc:702
Event_scheduler::is_running (this=0x1d52e90) at event_scheduler.cc:587
Event_scheduler::run (this=0x1d52e90, thd=0x1db7cb8) at event_scheduler.cc:469
event_scheduler_thread (arg=0x1d7d338) at event_scheduler.cc:235
start_thread () from /lib/libpthread.so.0

Error log master.err contains:
safe_mutex: Trying to lock unitialized mutex at event_scheduler.cc, line 702

Disabling the test case in 6.0 though it happens rarely only. Please re-enable after fix.

How to repeat:
OS: Debian GNU/Linux/x86_64
OS: Debian Sid kernel 2.6.22  SMP PREEMPT
gcc (GCC) 4.2.3 20071014 (prerelease) (Debian 4.2.2-3)

bk clone bk-internal.mysql.com:/home/bk/mysql-6.0-engines mysql-6.0-axmrg
cd mysql-6.0-axmrg
BUILD/compile-pentium-debug-max --with-debug=full

make test-force
[28 Nov 2007 11:25] Bugs System
Pushed into 6.0.4-alpha
[28 Nov 2007 14:40] Georgi Kodinov
Not enough information was provided for us to be able to handle this bug. Please re-read
the instructions at http://bugs.mysql.com/how-to-report.php

If you can provide more information, feel free to add it to this bug and change the status
back to 'Open'.

Thank you for your interest in MySQL.

Looks like the event scheduler thread is still alive after the event scheduler has gone
away. But the stack trace doesn't tell what has happened. I did some code analysis but
wasn't able to find a path that can lead to such behavior. I was unable to find any
similar problems in the pushbuild hosts either (and several of them are 64 bit linuxes).
Please try to get a dbug trace of the crash and the result of 'thread apply all where'
from the core : this will hopefully reveal what causes the crash.
[28 Nov 2007 15:52] Ingo Strüwing
What I said still holds true: This may be difficult to repeat. It happened just once to
me.
I tried to no avail.
When closing, please do not forget to re-enable the test case.
[30 Nov 2007 16:01] Ingo Strüwing
Backtrace all threads

Attachment: bug32771-1.txt (text/plain), 9.52 KiB.

[30 Nov 2007 16:02] Ingo Strüwing
It happened again on my local machine. I added the requested information as a file (it is
too big as a comment).
[3 Jan 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[20 Apr 9:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45706

ChangeSet@1.2629, 2008-04-20 11:18:52+04:00, kostja@bodhi.(none) +2 -0
  A fix for Bug#32771 "events_bugs.test fails randomly".
  In Event_scheduler::stop(), which may be called from destructor,
  wait synchronously for the parallel Event_scheduler::stop() to
  complete before returning. This fixes a race between
  MySQL shutdown thread and the scheduler thread who could call
  stop() in parallel.
[20 Apr 15:01] Bugs System
Pushed into 6.0.6-alpha
[21 Apr 19:42] Paul DuBois
Noted in 6.0.6 changelog.

There was a race condition between the event scheduler and the server
shutdown thread.