Bug #64624 Mysql is crashing during replication
Submitted: 12 Mar 2012 17:35 Modified: 12 Jun 2012 12:31
Reporter: Konstantin Malov Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.5.21 OS:Linux (CentOS 2.6.32-220.7.1.el6.x86_64)
Assigned to: CPU Architecture:Any
Tags: regression

[12 Mar 2012 17:35] Konstantin Malov
Mysql is crashing during replication. It happend several times even without any user activity on that slave. Only with replication which was turned on. It happend on 3 servers with different hardware, so it's probaly not a hardware error.  

*** glibc detected *** /usr/libexec/mysqld: malloc(): memory corruption: 0x00007f28f089f600 ***
======= Backtrace: =========
======= Memory map: ========
00400000-00c4b000 r-xp 00000000 fd:00 12192359                           /usr/libexec/mysqld
00e4b000-00f60000 rw-p 0084b000 fd:00 12192359                           /usr/libexec/mysqld
00f60000-00f8c000 rw-p 00000000 00:00 0
02d5d000-41af6000 rw-p 00000000 00:00 0                                  [heap]
7f26f8000000-7f26f88d2000 rw-p 00000000 00:00 0
7f26f88d2000-7f26fc000000 ---p 00000000 00:00 0
7f2700000000-7f2700b23000 rw-p 00000000 00:00 0
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
14:51:35 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 460918 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f28f0000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f3398843758 thread_stack 0x40000

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f28f089f1cc): is an invalid pointer
Connection ID (thread ID): 99

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

How to repeat:
master - slave replication of single database about 60Gb total size. It happend on several servers for last two days.
[12 Mar 2012 17:55] Valeriy Kravchuk
Please, send my.cnf file content and the output of:


Linux command from the system affected.
[12 Mar 2012 21:39] Konstantin Malov
[root@dc01db05 ~]# free
             total       used       free     shared    buffers     cached
Mem:      65961424   18128012   47833412          0     192856    3534636
-/+ buffers/cache:   14400520   51560904
Swap:      4194296          0    4194296



# Use only IP in ACL

# Start Slave only manualy
# Do I need to turn it on with master-master active-passive mode ?

# Use it during maintain time
# skip-networking
max-connections = 1000
max_user_connections = 950

# Replication (SLAVE)
server_id       = 140
log_bin         = binary_log
max_binlog_size = 1G
relay_log       = relay_binary_log
log_slave_updates = ON
read_only       = ON
replicate-ignore-db = mysql,information_schema,performance_schema

thread_cache_size = 128
max_connect_errors = 10000
max_allowed_packet = 64M
table_cache = 10240
net_write_timeout = 3
net_retry_count = 1

key_buffer_size = 64M

tmp_table_size = 64M # total
max_heap_table_size = 64M # per table

# This value is set according to this article
# http://www.mysqlperformanceblog.com/2010/10/25/impact-of-the-sort-buffer-size-in-mysql/
# It would be grate to do oueself tests
sort_buffer_size = 256K

# This value is set according to this article
# http://www.mysqlperformanceblog.com/2007/09/17/mysql-what-read_buffer_size-value-is-optima...
read_buffer_size = 128K

# Don't use big value here, memory allocation could be very painful with big size of the option
# See: http://www.mysqlperformanceblog.com/2010/07/05/how-is-join_buffer_size-allocated/
join_buffer_size = 1M

# It's better to performe our tests for this values
read_rnd_buffer_size = 2M

# Log bad queries
long_query_time = 1
slow-query-log-file     = /opt/mysql/slow-queries.log
#log                    = /opt/mysql/mysql/queries.log

query_cache_size = 64M # total
query_cache_limit = 16M # per query

# INNODB variables
# must be about 60% of all server's memory
innodb_buffer_pool_size = 40G
innodb_additional_mem_pool_size = 32M
#innodb_data_file_path     = ibdata0:512M:autoextend
# Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 1G
innodb_log_buffer_size = 2M
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 50
innodb_flush_method = O_DIRECT
innodb_thread_concurrency = 8
innodb_stats_on_metadata = OFF

[19 Mar 2012 16:10] Konstantin Malov
One more thing. Mysql 5.5.16 works fine (I have tried it from remi repo and from percona). But 5.5.20 is dying during replications. So my solution was back to 5.5.16 on all slaves.
[22 Mar 2012 7:19] Thomas Lecomte

We got the exact same problem here. We are running a 3-node setup under CentOS 5.6 x86_64 with the AMD64 Generic Linux RPM :

MA <---> MB

MA (Master A) and MB are replicating each other. MA also logs the MB requests so they are both sent to the slave S.

We upgraded from 5.1 to 5.5.21 this night, and started having malloc()-related problems :

---- 8< ----
120322  3:30:44 [Note] Slave: connected to master 'xxxx@MA:3306',replication resumed in log 'MA-bin.008550' at position 166729332
*** glibc detected *** /usr/sbin/mysqld: malloc(): memory corruption: 0x00002ab290e239c0 ***
======= Backtrace: =========
---- >8 ----

We reinstalled 5.5.18 on the slave (only), it crashed again. We then installed 5.5.16 as suggested and it ran fine (executing queries from the relay log).
Reverting all servers to 5.5.16 enabled us to get everything work again.

Attached below is an extract from the mysqld.err logfile, and our my.cnf.
[22 Mar 2012 7:19] Thomas Lecomte
my.cnf used under 5.5.21 with replication slave crashing

Attachment: my.cnf.out (application/octet-stream, text), 2.38 KiB.

[22 Mar 2012 7:20] Thomas Lecomte
MySQL error log - 5.5.21 with replication crashing

Attachment: mysqld.err-malloc (application/octet-stream, text), 25.03 KiB.

[22 Mar 2012 19:31] Craig Gill
I am also experiencing similar crashes during mysql replication, between a 5.5.15 master and 5.5.20 slave.

In my case, this crash happens every day at approximately the same time.

The slave mysql server is Gentoo Linux with kernel 3.2.1

Attached is the crash report and slave my.cnf file.
[22 Mar 2012 19:32] Craig Gill
mysqld.err crash report

Attachment: mysql5520_crash.txt (text/plain), 13.93 KiB.

[22 Mar 2012 19:33] Craig Gill
my.cnf of slave mysql

Attachment: mysql_5520_mycnf.txt (text/plain), 6.37 KiB.

[23 Mar 2012 22:45] Craig Gill
The same crash also happens with 5.5.22

Attachment: mysql5522_crash.txt (text/plain), 13.53 KiB.

[26 Mar 2012 21:46] Craig Gill
I also have not had the crash since rolling back to 5.5.16 on my mysql slave box.
[3 Apr 2012 11:52] Valeriy Kravchuk
I see ALTER TABLE ... ENABLE KEYS crashing in the last case reported. Does anybody else noted crashes on ALTER table similar to this case?

Craig, please, send the output of

show create table temp_schoolcontactstudents\G
[3 Apr 2012 16:55] Craig Gill
In response to [3 Apr 11:52] Valeriy Kravchuk:

mysql> show create table temp_schoolcontactstudents\G
*************************** 1. row ***************************
       Table: temp_schoolcontactstudents
Create Table: CREATE TABLE `temp_schoolcontactstudents` (
  `cid` int(11) NOT NULL,
  `emailaddress` varchar(254) CHARACTER SET utf8 DEFAULT NULL,
  `orgid` int(11) NOT NULL,
  `ctcid` int(11) NOT NULL,
  `ctcemailaddress` varchar(254) CHARACTER SET utf8 DEFAULT NULL
1 row in set (0.00 sec)

[13 Apr 2012 0:51] Sveta Smirnova
See also bug #62942
[16 Apr 2012 17:53] HC Jehg
I have the same problem here, on 3 different servers, with different hardware and Mandriva Linux 2010.2 64 bit.

I have tried this with the following installations on 64 bit machines

mysql-5.1.62-0.1mdv2010.2.x86_64 rpm -> mysql-5.1.62-0.1mdv2010.2.x86_64 rpm
mysql-5.1.62-0.1mdv2010.2.x86_64 rpm -> mysql-5.1.62.tar.gz source tar
mysql-5.1.62-0.1mdv2010.2.x86_64 rpm -> mysql-5.5.23.tar.gz source tar
mysql-5.5.22.tar.gz source tar       -> mysql-5.5.22.tar.gz source tar 
mysql-5.5.22.tar.gz source tar       -> mysql-5.5.23.tar.gz source tar 

(I am of course perfectly well aware that replication from one version to an other is not recommended)

Machines equipped with Mandriva Linux 2010.2
# uname -a
Linux tars-hs2.hobsoft.net #1 SMP Mon Sep 20 19:01:10 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

Same problem (On slaves only):

** glibc detected *** /usr/sbin/mysqld: malloc(): memory corruption: 0x00007fxxxxxxxxxx ***

...and slave servers stops replicating, but no other error is shown

/etc/init.d/mysqld stop    (or restart)

...takes forever.

If I kill the server (killall -9 mysqld_safe  mysqld) and restart, it seems to continue replicating.... (I did not check consistency of master and slave after this restart)

All tables are InnoDB tables with lots of dependencies.

All queries are resonably simple (Insert, insert on duplicate and a few Create Temporary table, but no ALTER) Replaying the relay log up to the number showed in SHOW SLAVE STATUS ( Relay_Log_Pos: 136575) gives:

# mysqlbinlog /data/hobsoft/mysql/mysqld-relay-bin.000002 | grep "at 1360575" --before-context=15 --after-context=15

#120416 18:52:56 server id 1  end_log_pos 1360157       Xid = 29561466
# at 1360303
#120416 18:52:56 server id 1  end_log_pos 1360225       Query   thread_id=4065  exec_time=0     error_code=0
SET TIMESTAMP=1334595176/*!*/;
# at 1360371
#120416 18:52:56 server id 1  end_log_pos 1360402       Query   thread_id=4065  exec_time=0     error_code=0
SET TIMESTAMP=1334595176/*!*/;
/*servermaintenance.php*/ UPDATE Webservers SET WebserverUpdate_panels="Updating" WHERE WebserverName="xxxxxx xx"
# at 1360548
#120416 18:52:56 server id 1  end_log_pos 1360429       Xid = 29561602
# at 1360575
#120416 18:52:56 server id 1  end_log_pos 1360497       Query   thread_id=3731  exec_time=0     error_code=0
SET TIMESTAMP=1334595176/*!*/;
# at 1360643
# at 1360711
#120416 18:52:56 server id 1  end_log_pos 1360565       Table_map: `xxxx`.`Recipes` mapped to number 3409
#120416 18:52:56 server id 1  end_log_pos 1360663       Write_rows: table id 3409 flags: STMT_END_F


The common thing I see every time is that the line after the relay point "# at 1360575" is a SET TIMESTAMP=

Hans-Christian Jehg
[16 Apr 2012 18:03] HC Jehg


takes forever.
[16 Apr 2012 18:59] MySQL Verification Team
Since crashing is happening so early in the call stack - does anybody see these crashes when slave's query cache is disabled?  (replication traffic does check query cache ... probably this is a bug itself :)
[16 Apr 2012 19:01] MySQL Verification Team
My best guess so far is a regression or improper fix for this: Bug 12589870
I already got this on my 5.5.22 slave:

Conditional jump or move depends on uninitialised value(s)
at: Query_cache::send_result_to_client(sql_cache.cc:1520)
by: mysql_parse (sql_parse.cc:5572)
by: Query_log_event::do_apply_event (log_event.cc:3379)
by: apply_event_and_update_pos (log_event.h:1132)
by: exec_relay_log_event (slave.cc:2505)
by: handle_slave_sql (slave.cc:3322)
by: start_thread (pthread_create.c:301)
by: clone (clone.S:115)

The code in question is:

size_t db_len;
memcpy((char *) &db_len, (sql + query_length + 1), sizeof(size_t));
if (thd->db_length != db_len)   <----------
[16 Apr 2012 19:21] HC Jehg
OK, I have now had the replication running for 100 times longer than it took for the memory corruption to occur before

I am running Mandriva 2012.2 64bit
mysql-5.5.22.tar.gz source tar -> mysql-5.5.23.tar.gz source tar 

but with Row based bin log

No errors so far...

[16 Apr 2012 19:23] HC Jehg

Mixed    : clib Memory Corruption

Row based: Works so far

[16 Apr 2012 20:32] MySQL Verification Team
Making progress now.  Query cache is misbehaving when different or no default database is switched to.   Now seeing a write passed the end of a buffer. It would corrupt memory.

Invalid write of size 8
at 0x54671F: Query_cache::send_result_to_client (sql_cache.cc:1611)
by 0x579276: mysql_parse (sql_parse.cc:5572)
by 0x71647B: Query_log_event::do_apply_event (log_event.cc:3379)
by 0x515904: apply_event_and_update_pos (log_event.h:1132)
by 0x51AFD2: exec_relay_log_event (slave.cc:2505)
by 0x51DF2A: handle_slave_sql (slave.cc:3322)
by 0x30B8806CCA: start_thread (pthread_create.c:301)
by 0x30B84E0C2C: clone (clone.S:115)
Address 0x15057008 is 216 bytes inside a block of size 220 alloc'd
at 0x4A07335: malloc (vg_replace_malloc.c:263)
by 0x796A61: my_malloc (my_malloc.c:38)
by 0x70962F: Query_log_event::Query_log_event (log_event.cc:2897)
by 0x7116E2: Log_event::read_log_event (log_event.cc:1240)
by 0x714A54: Log_event::read_log_event (log_event.cc:1153)
by 0x51ABE4: exec_relay_log_event( (slave.cc:4559)
by 0x51DF2A: handle_slave_sql (slave.cc:3322)
by 0x30B8806CCA: start_thread (pthread_create.c:301)
by 0x30B84E0C2C: clone (clone.S:115)

sql_cache.cc:1611 is this:
memcpy((uchar *)(sql + (tot_length - QUERY_CACHE_FLAGS_SIZE)),
	 (uchar*) &flags, QUERY_CACHE_FLAGS_SIZE);

(gdb) p tot_length
$1 = 215
(gdb) p sql
$2 = 0x15056f39 "/*'trkgwqcnhxhdhtehkdyf' */ \n\n\ninsert into `test`.`t1` set\n`a` = -1 on duplicate key update\n`a` = 0.9107\n\n\n/*   */"
(gdb) p sizeof(Query_cache_query_flags)
$3 = 88

log_event.cc:2897 is this:
if (!(start= data_buf = (Log_event::Byte*) my_malloc(catalog_len + 1 +
       time_zone_len + 1 +
       data_len + 1 +
       user.length + 1 +
       host.length + 1 +
       db_len + 1,
[16 Apr 2012 23:40] HC Jehg
I am still replicating row based without any errors.

[17 Apr 2012 6:32] MySQL Verification Team
all the errors from an overnight run. not good. i'll think of an easy testcase today.

Attachment: bug64624_valgrind_errors_5.5.22.txt (text/plain), 5.42 KiB.

[30 Apr 2012 5:47] MySQL Verification Team
Steps to reproduce this bug are in bug #62942.
Run slave in valgrind.
[7 May 2012 13:47] Greg Stewart
I'm getting the same errors (bugs #64624 & #62942) after building a 5.5.23 system as a slave to 7.1.9a-1 Cluster.

After a lot of research between MySQL and Percona forums, I found that comments embedded in querys using mixed-replication while the 5.5.23 query cache was on caused MySQL to crash in less in than three minutes, consistently.

Setting query_cache_type = OFF fixed my problem temporarily, but this is not a good solution as I need the cache.

With query cache off, I have the box replicating since Friday night (4th May) without errors.

I am looking into whether I can safely change from MBR to RBR on the master in this environment, but this is a real show-stopper of a bug.
[7 May 2012 22:04] Dima Rogozin
For what it worth, I have similar situation. Percona 5.5.22-2. Replication hangs after about an hour with memory corruption message.
Switching query cache off helps.
[12 May 2012 20:57] Matt Johnson
I have a similar problem on Ubuntu 12.04 with MySQL 5.5.22. For me, the solution was to disable log-slave-updates. Unfortunately, I need that since I am doing A -> B -> C style replication.
[31 May 2012 13:54] Greg Stewart
As an update: I have completed our transition from 7.1.9a-1 Cluster to Community 5.5.23, and I have RBR working smoothly with query cache ON for both master and slave.

I see that 5.5.25-1 is GA (although, as of 31st May, the change logs say it isn't), but I have a lot of red tape to go through before I can upgrade these production boxes (and I don't have a test box to use). 

But, since I don't see anything related to QC mentioned in the change logs for 5.5.24 and up, the only change that intrigues me at the moment is the improvement to innodb adaptive flushing, so I'll hold off on upgrading the binaries.

A note: Matt Johnson mentioned he turned off log-slave-updates, but I've had that var off all along, and the problem was unbearable for us when QC was on with SBR.
[5 Jun 2012 10:08] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

[5 Jun 2012 10:09] Jon Stephens
Fixed in 5.6. Documented as follows in the 5.6.6 changelog:

      After upgrading a replication slave to MySQL 5.5.21, having the query
      cache enabled led the slave to fail.

[5 Jun 2012 14:43] Kolbe Kegel
Is there no plan to put a fix for this bug into MySQL 5.5 or MySQL Cluster 7.2?
[12 Jun 2012 12:16] Jon Stephens
This fix was also pushed to MySQL 5.1.64 and MySQL 5.5.26; the 5.1 and 5.5 changelogs will be updated with this information shortly.

Regarding MySQL Cluster: The first set of MySQL Cluster releases to be based on these (or later) MySQL Server versions will also include the fix. The latest MySQL Cluster NDB 7.0 and 7.1 are currently based on MySQL Server 5.1.63, and the latest MySQL Cluster NDB 7.2 is currently based on MySQL Server 5.5.25.) It is not possible at present to say whether or not the MySQL Cluster trees will be re-based on later MySQL Server versions in time for the next set of MySQL Cluster releases--but in any case, the fix should find its way into them within a release or two.
[12 Jun 2012 12:31] Jon Stephens
Documented as follows in the 5.6.6 changelog:

        After upgrading a replication slave to MySQL 5.6.2 or later, 
        enabling the query cache eventually caused the slave to fail.

In the 5.5.26 changelog:

        After upgrading a replication slave to MySQL 5.5.18 or later, 
        enabling the query cache eventually caused the slave to fail.

In the 5.1.64 changelog:

        After upgrading a replication slave to MySQL 5.1.60 or later, 
        enabling the query cache eventually caused the slave to fail.