Bug #7658 optimize crashes slave thread (1 in 1000)
Submitted: 4 Jan 2005 13:00 Modified: 10 Jan 2005 18:00
Reporter: Martin Friebe (Gold Quality Contributor) (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:4.1.7 and 4.0.22 OS:FreeBSD (freebsd 4.8 and freebsd 4.10)
Assigned to: Guilhem Bichot CPU Architecture:Any

[4 Jan 2005 13:00] Martin Friebe
Description:
I experience the following bug: a mysql replication slave crashes while executing an optimize table from the replication log. This happens about every 1000 optimize statements.

the following setup applies:
- both have the same master. a mysql 4.1.7 server
- 1st slave: mysql 4.0.22 running on freebsd 4.8
 this slave was running fault-free against a mysql 4.0.20 server (as optimize was not replicated) 
- 2nd slave mysql 4.1.7 on freebsd 4.10
- both slaves are build from freebsd ports; with PTHREADS and optimization, non-static
- both clients have no problems, running stress test optimize outside the slave thread
- both slaves running on a dual XEON-CPU server with SMP (not sure, if that has anything to do with the problem
- see attached my.cnf and debug log (debug log is for
 mysql  Ver 12.22 Distrib 4.0.18, for portbld-freebsd4.8 (i386)
 build from ports)

- the table to be optimized, can allready be optimized, and does not have to contain much data, or be of any specific structure

there is a slight change of a problem related to freebsd, but the replication-slaves are running stable, except for optimize IN the slave thread

How to repeat:
setup replication, as above

create a random table, with some random data

start a loop sending 1000++ optimize comands

Suggested fix:
-
[4 Jan 2005 13:01] Martin Friebe
my.cnf for 4.0.22 slave

Attachment: my.cnf (application/octet-stream, text), 1.51 KiB.

[5 Jan 2005 17:17] Martin Friebe
just an update,  the crash does not seem to happen if the slave thread is restarted with
  slave stop; slave start
on a regular base.

This could indicate a resource leaking problem
[5 Jan 2005 17:39] Martin Friebe
replication of analyze table does also trigger the crash.

Ok I am going to take a guess here:

form the error dump it dies after a 
# sql_base.cc:   251:    5: send_fields: packet_header: Memory: be1ff430  Bytes: (4)
while previous executions do log furter entries of this. this entry is written in 
# my_net_write

and optimize, analyze are the only statements in replication that return rows to the client (what is the client in a slave thread? nil?)

Does that help?
[7 Jan 2005 9:24] Guilhem Bichot
Dear Martin,
Thanks much. I was indeed able to repeat a slave crash. Let's hope I can repeat it again to troubleshoot. Will keep you posted.
[7 Jan 2005 9:25] Guilhem Bichot
050107 10:18:07 Slave I/O thread: connected to master 'root@localhost:3306',  replication started in log 'FIRST' at position 4
==7753== Thread 13:
==7753== Invalid read of size 4
==7753==    at 0x810D7F0: net_real_write (net_serv.cc:390)
==7753==    by 0x810D734: net_write_buff(st_net*, char const*, unsigned long) (net_serv.cc:343)
==7753==    by 0x810D4E3: my_net_write (net_serv.cc:252)
==7753==    by 0x81A2755: mysql_admin_table(THD*, st_table_list*, st_ha_check_opt*, char const*, thr_lock_type, bool, unsigned, int (*)(THD*, st_table_list*, st_ha_check_opt*), int (handler::*)(THD*, st_ha_check_opt*)) (sql_string.h:64)
==7753==  Address 0x68 is not stack'd, malloc'd or (recently) free'd
mysqld got signal 11;
[10 Jan 2005 18:00] Guilhem Bichot
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Fixed in 4.0.24 and 4.1.9 in
ChangeSet@1.2024, 2005-01-10 13:52:32+01:00, guilhem@mysql.com
  Fix for BUG#7658 "optimize crashes slave thread (1 in 1000)]":
  mysql_admin_table() attempted to write to a vio which was 0. I could have fixed mysql_admin_table()
  but fixing my_net_write() looked more future-proof.