| Bug #34305 | show slave status handling segfaults when slave io is about to leave | ||
|---|---|---|---|
| Submitted: | 5 Feb 2008 11:35 | Modified: | 16 Feb 2008 10:45 |
| Reporter: | Andrei Elkin | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
| Version: | 5.x, x>=0 | OS: | Any |
| Assigned to: | Andrei Elkin | CPU Architecture: | Any |
[5 Feb 2008 15:36]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/41718 ChangeSet@1.2576, 2008-02-05 17:36:26+02:00, aelkin@koti.dsl.inet.fi +2 -0 Bug #34305 show slave status handling segfaults when slave io is about to leave The artifact was caused by a flaw in concurrent accessing the slave's io thd by the io itself and a handling show slave status thread. Namely, show_master_info did not acquire mi->run_lock mutex that is specified for mi->io_thd member. Fixed with deploying the mutex locking and unlocking. The mutex is kept short time and without interleaving with mi->data_lock mutex. Todo: to report and fix an issue with sys_var_slave_skip_counter::{methods} seem to acquire incorrectly active_mi->rli.run_lock instead of the specified active_mi->rli.data_lock A test case is difficult to compose, so rpl_packet should continue serving as the indicator.
[5 Feb 2008 15:50]
Andrei Elkin
Pushed to mysql-5.0-rpl.
[15 Feb 2008 13:40]
Bugs System
Pushed into 5.0.58
[15 Feb 2008 13:40]
Bugs System
Pushed into 5.1.24-rc
[15 Feb 2008 13:41]
Bugs System
Pushed into 6.0.5-alpha
[16 Feb 2008 10:45]
Jon Stephens
Documented in the 5.0.48, 5.1.24, and 6.0.5 changelogs as follows:
SHOW SLAVE STATUS failed when slave I/O was
about to terminate.
[2 Apr 2008 19:41]
Jon Stephens
Also noted in the 5.1.23-ndb-6.2.14 and 5.1.23-ndb-6.3.11 changelogs.

Description: Failures on pb with rpl_packet rpl_packet [ fail ] mysqltest: In included file "./include/wait_for_slave_io_to_stop.inc": At line 21: Error running query ' SHOW SLAVE STATUS': 2013 Lost connection to MySQL server during query The result from queries just before the failure was: < snip > reset slave; drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9; start slave; drop database if exists DB_NAME_OF_MAX_LENGTH_AKA_NAME_LEN_64_BYTES_____________________; and slave.err having 080204 18:34:49 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:11310', replication started in log 'master-bin.000001' at position 2028 080204 18:34:49 [ERROR] Error reading packet from server: Got packet bigger than 'max_allowed_packet' bytes ( server_errno=2020) 080204 18:34:49 [ERROR] Log entry on master is longer than max_allowed_packet (1485) on slave. If the entry is correct, restart the server with a higher value of max_allowed_packet 080204 18:34:49 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 2138 080204 18:34:49 - mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=1048576 read_buffer_size=131072 max_used_connections=2 max_connections=100 threads_connected=2 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 39424 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd=0xef6ec8 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... Cannot determine thread, fp=0x408801c0, backtrace may not be correct. Stack range sanity check OK, backtrace follows: 0x614da7 0x607d5a 0x797174 0x62f1f6 0x6369bc 0x639499 0x63ac33 0x63bfed seem to be due to a flaw in concurrent accessing the slave's io thd by the io itself and a handling show slave status thread. How to repeat: ./mysql-test-run rpl_packet still it's not so straightforward and requires "adjusting" timing. I did it this way: When the io passed sql_print_information("Slave I/O thread exiting,...) and reached delete thd; i stop the thread and let the handling show slave status to reach protocol->store(mi->io_thd ? mi->io_thd->proc_info : "", &my_charset_bin); Now if we let the io to go `thd' will be destroyed and in the following the show threads segfaults because of mi->io_thd->proc_info is not allocated anymore. Suggested fix: To implement a TODO left in show_master_info to acquire mi->run_lock mutex prior protocol->store().