Bug #17716 Slave crash in net_clear on qnx
Submitted: 25 Feb 2006 9:13 Modified: 27 Feb 2006 13:16
Reporter: Magnus Blåudd Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.0.19 OS:Other (QNX 6.2)
Assigned to: Magnus Blåudd CPU Architecture:Any

[25 Feb 2006 9:13] Magnus Blåudd
Description:
rpl000001                      [ fail ]

Errors are (from /home/mysqldev/pb/mysql-5.0/push-paul@snake-hub.snake.net-20060223212135.info/mysql-5.0.19-standard/mysql-test/var/log/mysqltest-time) :
mysqltest: At line 15: query 'stop slave' failed: 2013: Lost connection to MySQL server during query
(the last lines may be the most important ones)

Ending Tests
Shutting-down MySQL daemon

Master(s) shutdown finished
Slave(s) shutdown finished
Resuming Tests

How to repeat:
Run ./mysql-test-run-pl on buildqnx2

Suggested fix:
Most likely caused by the fix for bug#2845.
[25 Feb 2006 9:16] Magnus Blåudd
Compiled a debug build on buildqnx2 from the latest distribution produced by pushbuild. The trace files show that the slave crashes in net_clear. Debugging...

var/log/slave.log:
T@5    : | | | >vio_is_blocking
T@5    : | | | | exit: 0
T@5    : | | | <vio_is_blocking
T@5    : | | | >vio_read_buff
T@5    : | | | | enter: sd: 36, buf: 0x99a7018, size: 4
T@5    : | | | | >vio_read
T@5    : | | | | | enter: sd: 36, buf: 0x9997018, size: 16384
T@5    : | | | | | exit: 11
T@5    : | | | | <vio_read
T@5    : | | | <vio_read_buff
T@5    : | | | packet_header: Memory: 0x99a7018  Bytes: (4)
T@5    : | | | >vio_read_buff
T@5    : | | | | enter: sd: 36, buf: 0x99a7018, size: 7
T@5    : | | | <vio_read_buff
T@5    : | | | exit: Mysql handler: 9945c28
T@5    : | | <mysql_real_connect
T@5    : | | >my_b_flush_io_cache
T@5    : | | | >my_write
T@5    : | | | | my: Fd: 4  Buffer: 0x86fbff8  Count: 43  MyFlags: 20
T@5    : | | | <my_write
T@5    : | | <my_b_flush_io_cache
T@5    : | | exit: slave_was_killed: 0
T@5    : | <connect_to_master
T@5    : | >sql_print_information
T@5    : | | >vprint_msg_to_log
T@5    : | | | >print_buffer_to_file
T@5    : | | | | enter: buffer: Slave I/O thread: connected to master 'root@127.0.0.1:10170',  replication started in log 'FIRST' at position 4
T@5    : | | | <print_buffer_to_file
T@5    : | | <vprint_msg_to_log
T@5    : | <sql_print_information
T@5    : | >my_malloc
T@5    : | | my: size: 108  my_flags: 24
T@5    : | | exit: ptr: 0x86e1db0
T@5    : | <my_malloc
T@5    : | >my_malloc
T@5    : | | my: size: 18  my_flags: 32
T@5    : | | exit: ptr: 0x86e2e40
T@5    : | <my_malloc
T@5    : | >mysql_real_query
T@5    : | | enter: handle: 9945c28
T@5    : | | query: Query = 'SELECT UNIX_TIMESTAMP()'
T@5    : | | >mysql_send_query
T@5    : | | | enter: rpl_parse: 0  rpl_pivot: 1
T@5    : | | <mysql_send_query
T@5    : | | >cli_advanced_command
T@5    : | | | >net_clear
[25 Feb 2006 16:59] Magnus Blåudd
The compile time constant FD_SETSIZE needs to be defined before we include "<sys/select.h>". That is because the bit array type "fd_set"'s size is calculated using from it. When it's not defined it will default to 32 thus defining a 32 bits array,  as soon as we do a FD_SET(fd, &sfds) where fd ios higher then 32 it will write outside the variable. As in the tracefile example above the fd(or sd as it's called here)  was 36 and thus we where writing ouside the bitarray.
[25 Feb 2006 17:02] Magnus Blåudd
http://www.qnx.com/developers/docs/momentics621_docs/neutrino/lib_ref/s/select.html
[27 Feb 2006 9:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/3172
[27 Feb 2006 13:16] Magnus Blåudd
Pushed to 5.0.19