Bug #45238 rpl_slave_skip, rpl_change_master failed (lost connection) for STOP SLAVE
Submitted: 1 Jun 2009 9:17 Modified: 23 Sep 2009 20:47
Reporter: Serge Kozlov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.0, 5.1, 6.0 OS:Any (Windows, Mac)
Assigned to: Davi Arnaut CPU Architecture:Any
Tags: disabled, pb2, rpl_change_master, rpl_slave_skip, stop slave, tf54

[1 Jun 2009 9:17] Serge Kozlov
Description:
Tests rpl_slave_skip, rpl_change_master failed in PB2 for query STOP SLAVE or after that. The behavior depends from version:
1. 5.0, 6.0: crash with dump file 
2. 5.1: - timeout 

The bug appears for Windows and Mac platforms

How to repeat:
See xfer in PB2 for rpl_slave_skip, rpl_change_master
[30 Jun 2009 7:36] Alexander Nozdrin
For 5.4 the title should be: many RPL tests fail sporadically on Windows.

The thing is the RPL tests fail sporadically. Sometimes they fail
in STOP SLAVE, sometimes not. That happens on Windows only.
No limited set of tests could be identified.

Adding
  --source include/not_windows.inc
to 'include/master-slave.inc' helps.
[13 Jul 2009 11:54] Luis Soares
It is likely that the following bugs are related:
  * BUG#45242: crash on win in mysql_close() -> free()
  * BUG#45243: crash on win in sql thread clear_tables_to_lock() -> free()
  * BUG#45521: rpl_slave_skip fails in pb2
  * BUG#40796: Crash due to heap corruption in rpl.rpl_extraColmaster_myisam
[13 Aug 2009 20:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/80781

2788 Davi Arnaut	2009-08-13
      Bug#46013: rpl_extraColmaster_myisam fails on pb2
      Bug#45243: crash on win in sql thread clear_tables_to_lock() -> free()
      Bug#45242: crash on win in mysql_close() -> free()
      Bug#45238: rpl_slave_skip, rpl_change_master failed (lost connection) for STOP SLAVE
      Bug#46030: rpl_truncate_3innodb causes server crash on windows
      Bug#46014: rpl_stm_reset_slave crashes the server sporadically in pb2
      
      When killing a user session on the server, it's necessary to
      interrupt (notify) the thread associated with the session that
      the connection is being killed so that the thread is woken up
      if waiting for I/O. On a few platforms (Mac, Windows and HP-UX)
      where the SIGNAL_WITH_VIO_CLOSE flag is defined, this interruption
      procedure is to asynchronously close the underlying socket of
      the connection.
      
      In order to enable this schema, each connection serving thread
      registers its VIO (I/O interface) so that other threads can
      access it and close the connection. But only the owner thread of
      the VIO might delete it as to guarantee that other threads won't
      see freed memory (the thread unregisters the VIO before deleting
      it). A side note: closing the socket introduces a harmless race
      that might cause a thread attempt to read from a closed socket,
      but this is deemed acceptable.
      
      The problem is that this infrastructure was meant to only be used
      by server threads, but the slave I/O thread was registering the
      VIO of a mysql handle (a client API structure that represents a
      connection to another server instance) as a active connection of
      the thread. But under some circumstances such as network failures,
      the client API might destroy the VIO associated with a handle at
      will, yet the VIO wouldn't be properly unregistered. This could
      lead to accesses to freed data if a thread attempted to kill a
      slave I/O thread whose connection was already broken.
      
      There was a attempt to work around this by checking whether
      the socket was being interrupted, but this hack didn't work as
      intended due to the aforementioned race -- attempting to read
      from the socket would yield a "bad file descriptor" error.
      
      The solution is to add a hook to the client API that is called
      from the client code before the VIO of a handle is deleted.
      This hook allows the slave I/O thread to detach the active vio
      so it does not point to freed memory.
     @ server-tools/instance-manager/mysql_connection.cc
        Add stub method required for linking.
     @ sql-common/client.c
        Invoke hook.
     @ sql/client_settings.h
        Export hook.
     @ sql/slave.cc
        Introduce hook that clears the active VIO before it is freed
        by the client API.
[13 Aug 2009 21:06] Davi Arnaut
Queued to 5.0-bugteam
[13 Aug 2009 22:00] Davi Arnaut
The user visible effect is that a STOP SLAVE statement might lead to a crash on Windows or Mac.
[27 Aug 2009 16:50] Jon Stephens
See BUG#45243 for documentation info.
[2 Sep 2009 10:25] Bugs System
Pushed into 5.0.86 (revid:joro@sun.com-20090902102337-n5rw8227wwp5cpx8) (version source revid:davi.arnaut@sun.com-20090813200720-utqy73cj0orcy80z) (merge vers: 5.0.86) (pib:11)
[2 Sep 2009 13:00] Jon Stephens
Bugfix also noted in 5.0.86 changelog.

Set status to Patch Pending, waiting for 5.4 push.
[2 Sep 2009 16:41] Bugs System
Pushed into 5.1.39 (revid:joro@sun.com-20090902154533-8actmfcsjfqovgsb) (version source revid:ramil@mysql.com-20090814091316-07dvnrvaj0th0th2) (merge vers: 5.1.38) (pib:11)
[3 Sep 2009 20:42] Jon Stephens
Now documented in the following changelogs: 5.0.86, NDB-6.2.19, NDB-6.3.27, NDB-7.0.8, 5.1.39 (should have documented for Cluster releases, not 5.1.37-main).

Set status to NDI, waiting for push to 5.4.
[14 Sep 2009 16:02] Bugs System
Pushed into 5.4.4-alpha (revid:alik@sun.com-20090914155317-m1g9wodmndzdj4l1) (version source revid:alik@sun.com-20090914155317-m1g9wodmndzdj4l1) (merge vers: 5.4.4-alpha) (pib:11)
[16 Sep 2009 9:35] Jon Stephens
Also documented in the 5.4.4 changelog.

Closed.
[23 Sep 2009 19:22] Alexander Nozdrin
The tests were not properly enabled in 6.0 (grep for 45238 in 6.0 tree).
Re-opening the bug to enable tests.
[23 Sep 2009 20:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84430

2837 Davi Arnaut	2009-09-23
      Post-merge fix for Bug#45238: re-enable disabled test cases.
[30 Sep 2009 8:16] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20090929093622-1mooerbh12e97zux) (version source revid:alik@sun.com-20090927203924-087s36mrs0uxepwb) (merge vers: 6.0.14-alpha) (pib:11)
[1 Oct 2009 5:58] Bugs System
Pushed into 5.1.39-ndb-6.3.28 (revid:jonas@mysql.com-20091001055605-ap2kiaarr7p40mmv) (version source revid:jonas@mysql.com-20091001055605-ap2kiaarr7p40mmv) (merge vers: 5.1.39-ndb-6.3.28) (pib:11)
[1 Oct 2009 7:25] Bugs System
Pushed into 5.1.39-ndb-7.0.9 (revid:jonas@mysql.com-20091001072547-kv17uu06hfjhgjay) (version source revid:jonas@mysql.com-20091001071652-irejtnumzbpsbgk2) (merge vers: 5.1.39-ndb-7.0.9) (pib:11)
[1 Oct 2009 13:25] Bugs System
Pushed into 5.1.39-ndb-7.1.0 (revid:jonas@mysql.com-20091001123013-g9ob2tsyctpw6zs0) (version source revid:jonas@mysql.com-20091001123013-g9ob2tsyctpw6zs0) (merge vers: 5.1.39-ndb-7.1.0) (pib:11)
[2 Oct 2009 0:06] Paul DuBois
Moved 5.4 changelog entry from 5.4.4 to 5.4.3.
[5 Oct 2009 10:49] Bugs System
Pushed into 5.1.39-ndb-6.2.19 (revid:jonas@mysql.com-20091005103850-dwij2dojwpvf5hi6) (version source revid:jonas@mysql.com-20090930185117-bhud4ek1y0hsj1nv) (merge vers: 5.1.39-ndb-6.2.19) (pib:11)