Bug #64624 | Mysql is crashing during replication | ||
---|---|---|---|
Submitted: | 12 Mar 2012 17:35 | Modified: | 12 Jun 2012 12:31 |
Reporter: | Konstantin Malov | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.5.21 | OS: | Linux (CentOS 2.6.32-220.7.1.el6.x86_64) |
Assigned to: | CPU Architecture: | Any | |
Tags: | regression |
[12 Mar 2012 17:35]
Konstantin Malov
[12 Mar 2012 17:55]
Valeriy Kravchuk
Please, send my.cnf file content and the output of: free Linux command from the system affected.
[12 Mar 2012 21:39]
Konstantin Malov
[root@dc01db05 ~]# free total used free shared buffers cached Mem: 65961424 18128012 47833412 0 192856 3534636 -/+ buffers/cache: 14400520 51560904 Swap: 4194296 0 4194296 [mysqld] user=mysql datadir=/opt/mysql socket=/var/lib/mysql/mysql.sock # Use only IP in ACL skip-name-resolve # Start Slave only manualy skip-slave-start # Do I need to turn it on with master-master active-passive mode ? #log-slave-updates # Use it during maintain time # skip-networking bind-address=0.0.0.0 max-connections = 1000 max_user_connections = 950 connect_timeout=5 # Replication (SLAVE) server_id = 140 log_bin = binary_log max_binlog_size = 1G relay_log = relay_binary_log log_slave_updates = ON read_only = ON #skip_slave_start replicate-ignore-db = mysql,information_schema,performance_schema thread_cache_size = 128 max_connect_errors = 10000 max_allowed_packet = 64M table_cache = 10240 net_write_timeout = 3 net_retry_count = 1 key_buffer_size = 64M tmp_table_size = 64M # total max_heap_table_size = 64M # per table # This value is set according to this article # http://www.mysqlperformanceblog.com/2010/10/25/impact-of-the-sort-buffer-size-in-mysql/ # It would be grate to do oueself tests sort_buffer_size = 256K # This value is set according to this article # http://www.mysqlperformanceblog.com/2007/09/17/mysql-what-read_buffer_size-value-is-optima... read_buffer_size = 128K # Don't use big value here, memory allocation could be very painful with big size of the option # See: http://www.mysqlperformanceblog.com/2010/07/05/how-is-join_buffer_size-allocated/ join_buffer_size = 1M # It's better to performe our tests for this values read_rnd_buffer_size = 2M # Log bad queries #log-queries-not-using-indexes long_query_time = 1 slow-query-log slow-query-log-file = /opt/mysql/slow-queries.log #log = /opt/mysql/mysql/queries.log log-slow-admin-statements query_cache_size = 64M # total query_cache_limit = 16M # per query # INNODB variables # must be about 60% of all server's memory innodb_buffer_pool_size = 40G innodb_additional_mem_pool_size = 32M #innodb_data_file_path = ibdata0:512M:autoextend # Set .._log_file_size to 25 % of buffer pool size innodb_log_file_size = 1G innodb_log_buffer_size = 2M innodb_flush_log_at_trx_commit = 2 innodb_lock_wait_timeout = 50 innodb_flush_method = O_DIRECT innodb_thread_concurrency = 8 innodb_file_per_table innodb_stats_on_metadata = OFF [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid
[19 Mar 2012 16:10]
Konstantin Malov
One more thing. Mysql 5.5.16 works fine (I have tried it from remi repo and from percona). But 5.5.20 is dying during replications. So my solution was back to 5.5.16 on all slaves.
[22 Mar 2012 7:19]
Thomas Lecomte
Hello, We got the exact same problem here. We are running a 3-node setup under CentOS 5.6 x86_64 with the AMD64 Generic Linux RPM : MA <---> MB | S MA (Master A) and MB are replicating each other. MA also logs the MB requests so they are both sent to the slave S. We upgraded from 5.1 to 5.5.21 this night, and started having malloc()-related problems : ---- 8< ---- 120322 3:30:44 [Note] Slave: connected to master 'xxxx@MA:3306',replication resumed in log 'MA-bin.008550' at position 166729332 *** glibc detected *** /usr/sbin/mysqld: malloc(): memory corruption: 0x00002ab290e239c0 *** ======= Backtrace: ========= /lib64/libc.so.6[0x36148730fe] /lib64/libc.so.6(__libc_malloc+0x6e)[0x3614874e2e] /usr/sbin/mysqld(my_malloc+0x32)[0x797112] /usr/sbin/mysqld(alloc_root+0x104)[0x790f74] /usr/sbin/mysqld(_ZN9base_list9push_backEPv+0x28)[0x57c8c8] /usr/sbin/mysqld(_Z10MYSQLparsePv+0x1292e)[0x6430be] /usr/sbin/mysqld(_Z9parse_sqlP3THDP12Parser_stateP19Object_creation_ctx+0x9d)[0x56f78d] /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x71)[0x5796f1] /usr/sbin/mysqld(_ZN15Query_log_event14do_apply_eventEPK14Relay_log_infoPKcj+0xd5c)[0x716b3c] /usr/sbin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x125)[0x5158d5] /usr/sbin/mysqld[0x51afa3] /usr/sbin/mysqld(handle_slave_sql+0xa7b)[0x51defb] /lib64/libpthread.so.0[0x361580673d] /lib64/libc.so.6(clone+0x6d)[0x36148d44bd] ---- >8 ---- We reinstalled 5.5.18 on the slave (only), it crashed again. We then installed 5.5.16 as suggested and it ran fine (executing queries from the relay log). Reverting all servers to 5.5.16 enabled us to get everything work again. Attached below is an extract from the mysqld.err logfile, and our my.cnf.
[22 Mar 2012 7:19]
Thomas Lecomte
my.cnf used under 5.5.21 with replication slave crashing
Attachment: my.cnf.out (application/octet-stream, text), 2.38 KiB.
[22 Mar 2012 7:20]
Thomas Lecomte
MySQL error log - 5.5.21 with replication crashing
Attachment: mysqld.err-malloc (application/octet-stream, text), 25.03 KiB.
[22 Mar 2012 19:31]
Craig Gill
I am also experiencing similar crashes during mysql replication, between a 5.5.15 master and 5.5.20 slave. In my case, this crash happens every day at approximately the same time. The slave mysql server is Gentoo Linux with kernel 3.2.1 Attached is the crash report and slave my.cnf file.
[22 Mar 2012 19:32]
Craig Gill
mysqld.err crash report
Attachment: mysql5520_crash.txt (text/plain), 13.93 KiB.
[22 Mar 2012 19:33]
Craig Gill
my.cnf of slave mysql
Attachment: mysql_5520_mycnf.txt (text/plain), 6.37 KiB.
[23 Mar 2012 22:45]
Craig Gill
The same crash also happens with 5.5.22
Attachment: mysql5522_crash.txt (text/plain), 13.53 KiB.
[26 Mar 2012 21:46]
Craig Gill
I also have not had the crash since rolling back to 5.5.16 on my mysql slave box.
[3 Apr 2012 11:52]
Valeriy Kravchuk
I see ALTER TABLE ... ENABLE KEYS crashing in the last case reported. Does anybody else noted crashes on ALTER table similar to this case? Craig, please, send the output of show create table temp_schoolcontactstudents\G
[3 Apr 2012 16:55]
Craig Gill
In response to [3 Apr 11:52] Valeriy Kravchuk: mysql> show create table temp_schoolcontactstudents\G *************************** 1. row *************************** Table: temp_schoolcontactstudents Create Table: CREATE TABLE `temp_schoolcontactstudents` ( `cid` int(11) NOT NULL, `emailaddress` varchar(254) CHARACTER SET utf8 DEFAULT NULL, `orgid` int(11) NOT NULL, `ctcid` int(11) NOT NULL, `ctcemailaddress` varchar(254) CHARACTER SET utf8 DEFAULT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql>
[13 Apr 2012 0:51]
Sveta Smirnova
See also bug #62942
[16 Apr 2012 17:53]
HC Jehg
I have the same problem here, on 3 different servers, with different hardware and Mandriva Linux 2010.2 64 bit. I have tried this with the following installations on 64 bit machines mysql-5.1.62-0.1mdv2010.2.x86_64 rpm -> mysql-5.1.62-0.1mdv2010.2.x86_64 rpm mysql-5.1.62-0.1mdv2010.2.x86_64 rpm -> mysql-5.1.62.tar.gz source tar mysql-5.1.62-0.1mdv2010.2.x86_64 rpm -> mysql-5.5.23.tar.gz source tar mysql-5.5.22.tar.gz source tar -> mysql-5.5.22.tar.gz source tar mysql-5.5.22.tar.gz source tar -> mysql-5.5.23.tar.gz source tar (I am of course perfectly well aware that replication from one version to an other is not recommended) Machines equipped with Mandriva Linux 2010.2 # uname -a Linux tars-hs2.hobsoft.net 2.6.33.7-server-2mnb #1 SMP Mon Sep 20 19:01:10 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux Same problem (On slaves only): ** glibc detected *** /usr/sbin/mysqld: malloc(): memory corruption: 0x00007fxxxxxxxxxx *** ...and slave servers stops replicating, but no other error is shown /etc/init.d/mysqld stop (or restart) ...takes forever. If I kill the server (killall -9 mysqld_safe mysqld) and restart, it seems to continue replicating.... (I did not check consistency of master and slave after this restart) All tables are InnoDB tables with lots of dependencies. All queries are resonably simple (Insert, insert on duplicate and a few Create Temporary table, but no ALTER) Replaying the relay log up to the number showed in SHOW SLAVE STATUS ( Relay_Log_Pos: 136575) gives: # mysqlbinlog /data/hobsoft/mysql/mysqld-relay-bin.000002 | grep "at 1360575" --before-context=15 --after-context=15 #120416 18:52:56 server id 1 end_log_pos 1360157 Xid = 29561466 COMMIT/*!*/; # at 1360303 #120416 18:52:56 server id 1 end_log_pos 1360225 Query thread_id=4065 exec_time=0 error_code=0 SET TIMESTAMP=1334595176/*!*/; BEGIN /*!*/; # at 1360371 #120416 18:52:56 server id 1 end_log_pos 1360402 Query thread_id=4065 exec_time=0 error_code=0 SET TIMESTAMP=1334595176/*!*/; /*servermaintenance.php*/ UPDATE Webservers SET WebserverUpdate_panels="Updating" WHERE WebserverName="xxxxxx xx" /*!*/; # at 1360548 #120416 18:52:56 server id 1 end_log_pos 1360429 Xid = 29561602 COMMIT/*!*/; # at 1360575 #120416 18:52:56 server id 1 end_log_pos 1360497 Query thread_id=3731 exec_time=0 error_code=0 SET TIMESTAMP=1334595176/*!*/; BEGIN /*!*/; # at 1360643 # at 1360711 #120416 18:52:56 server id 1 end_log_pos 1360565 Table_map: `xxxx`.`Recipes` mapped to number 3409 #120416 18:52:56 server id 1 end_log_pos 1360663 Write_rows: table id 3409 flags: STMT_END_F BINLOG ' xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx= xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx= '/*!*/; The common thing I see every time is that the line after the relay point "# at 1360575" is a SET TIMESTAMP= Hans-Christian Jehg
[16 Apr 2012 18:03]
HC Jehg
Also SLAVE STOP; takes forever.
[16 Apr 2012 18:59]
MySQL Verification Team
Since crashing is happening so early in the call stack - does anybody see these crashes when slave's query cache is disabled? (replication traffic does check query cache ... probably this is a bug itself :)
[16 Apr 2012 19:01]
MySQL Verification Team
My best guess so far is a regression or improper fix for this: Bug 12589870 I already got this on my 5.5.22 slave: Conditional jump or move depends on uninitialised value(s) at: Query_cache::send_result_to_client(sql_cache.cc:1520) by: mysql_parse (sql_parse.cc:5572) by: Query_log_event::do_apply_event (log_event.cc:3379) by: apply_event_and_update_pos (log_event.h:1132) by: exec_relay_log_event (slave.cc:2505) by: handle_slave_sql (slave.cc:3322) by: start_thread (pthread_create.c:301) by: clone (clone.S:115) The code in question is: size_t db_len; memcpy((char *) &db_len, (sql + query_length + 1), sizeof(size_t)); if (thd->db_length != db_len) <---------- {
[16 Apr 2012 19:21]
HC Jehg
OK, I have now had the replication running for 100 times longer than it took for the memory corruption to occur before I am running Mandriva 2012.2 64bit mysql-5.5.22.tar.gz source tar -> mysql-5.5.23.tar.gz source tar but with Row based bin log No errors so far... HC
[16 Apr 2012 19:23]
HC Jehg
Precision: Mixed : clib Memory Corruption Row based: Works so far HC
[16 Apr 2012 20:32]
MySQL Verification Team
Making progress now. Query cache is misbehaving when different or no default database is switched to. Now seeing a write passed the end of a buffer. It would corrupt memory. Invalid write of size 8 at 0x54671F: Query_cache::send_result_to_client (sql_cache.cc:1611) by 0x579276: mysql_parse (sql_parse.cc:5572) by 0x71647B: Query_log_event::do_apply_event (log_event.cc:3379) by 0x515904: apply_event_and_update_pos (log_event.h:1132) by 0x51AFD2: exec_relay_log_event (slave.cc:2505) by 0x51DF2A: handle_slave_sql (slave.cc:3322) by 0x30B8806CCA: start_thread (pthread_create.c:301) by 0x30B84E0C2C: clone (clone.S:115) Address 0x15057008 is 216 bytes inside a block of size 220 alloc'd at 0x4A07335: malloc (vg_replace_malloc.c:263) by 0x796A61: my_malloc (my_malloc.c:38) by 0x70962F: Query_log_event::Query_log_event (log_event.cc:2897) by 0x7116E2: Log_event::read_log_event (log_event.cc:1240) by 0x714A54: Log_event::read_log_event (log_event.cc:1153) by 0x51ABE4: exec_relay_log_event( (slave.cc:4559) by 0x51DF2A: handle_slave_sql (slave.cc:3322) by 0x30B8806CCA: start_thread (pthread_create.c:301) by 0x30B84E0C2C: clone (clone.S:115) sql_cache.cc:1611 is this: memcpy((uchar *)(sql + (tot_length - QUERY_CACHE_FLAGS_SIZE)), (uchar*) &flags, QUERY_CACHE_FLAGS_SIZE); (gdb) p tot_length $1 = 215 (gdb) p sql $2 = 0x15056f39 "/*'trkgwqcnhxhdhtehkdyf' */ \n\n\ninsert into `test`.`t1` set\n`a` = -1 on duplicate key update\n`a` = 0.9107\n\n\n/* */" (gdb) p sizeof(Query_cache_query_flags) $3 = 88 log_event.cc:2897 is this: if (!(start= data_buf = (Log_event::Byte*) my_malloc(catalog_len + 1 + time_zone_len + 1 + data_len + 1 + QUERY_CACHE_FLAGS_SIZE + user.length + 1 + host.length + 1 + db_len + 1, MYF(MY_WME))))
[16 Apr 2012 23:40]
HC Jehg
I am still replicating row based without any errors. HC
[17 Apr 2012 6:32]
MySQL Verification Team
all the errors from an overnight run. not good. i'll think of an easy testcase today.
Attachment: bug64624_valgrind_errors_5.5.22.txt (text/plain), 5.42 KiB.
[30 Apr 2012 5:47]
MySQL Verification Team
Steps to reproduce this bug are in bug #62942. Run slave in valgrind.
[7 May 2012 13:47]
Greg Stewart
I'm getting the same errors (bugs #64624 & #62942) after building a 5.5.23 system as a slave to 7.1.9a-1 Cluster. After a lot of research between MySQL and Percona forums, I found that comments embedded in querys using mixed-replication while the 5.5.23 query cache was on caused MySQL to crash in less in than three minutes, consistently. Setting query_cache_type = OFF fixed my problem temporarily, but this is not a good solution as I need the cache. With query cache off, I have the box replicating since Friday night (4th May) without errors. I am looking into whether I can safely change from MBR to RBR on the master in this environment, but this is a real show-stopper of a bug.
[7 May 2012 22:04]
Dima Rogozin
For what it worth, I have similar situation. Percona 5.5.22-2. Replication hangs after about an hour with memory corruption message. Switching query cache off helps.
[12 May 2012 20:57]
Matt Johnson
I have a similar problem on Ubuntu 12.04 with MySQL 5.5.22. For me, the solution was to disable log-slave-updates. Unfortunately, I need that since I am doing A -> B -> C style replication.
[31 May 2012 13:54]
Greg Stewart
As an update: I have completed our transition from 7.1.9a-1 Cluster to Community 5.5.23, and I have RBR working smoothly with query cache ON for both master and slave. I see that 5.5.25-1 is GA (although, as of 31st May, the change logs say it isn't), but I have a lot of red tape to go through before I can upgrade these production boxes (and I don't have a test box to use). But, since I don't see anything related to QC mentioned in the change logs for 5.5.24 and up, the only change that intrigues me at the moment is the improvement to innodb adaptive flushing, so I'll hold off on upgrading the binaries. A note: Matt Johnson mentioned he turned off log-slave-updates, but I've had that var off all along, and the problem was unbearable for us when QC was on with SBR.
[5 Jun 2012 10:08]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at http://dev.mysql.com/doc/en/installing-source.html
[5 Jun 2012 10:09]
Jon Stephens
Fixed in 5.6. Documented as follows in the 5.6.6 changelog: After upgrading a replication slave to MySQL 5.5.21, having the query cache enabled led the slave to fail. Closed.
[5 Jun 2012 14:43]
Kolbe Kegel
Is there no plan to put a fix for this bug into MySQL 5.5 or MySQL Cluster 7.2?
[12 Jun 2012 12:16]
Jon Stephens
This fix was also pushed to MySQL 5.1.64 and MySQL 5.5.26; the 5.1 and 5.5 changelogs will be updated with this information shortly. Regarding MySQL Cluster: The first set of MySQL Cluster releases to be based on these (or later) MySQL Server versions will also include the fix. The latest MySQL Cluster NDB 7.0 and 7.1 are currently based on MySQL Server 5.1.63, and the latest MySQL Cluster NDB 7.2 is currently based on MySQL Server 5.5.25.) It is not possible at present to say whether or not the MySQL Cluster trees will be re-based on later MySQL Server versions in time for the next set of MySQL Cluster releases--but in any case, the fix should find its way into them within a release or two.
[12 Jun 2012 12:31]
Jon Stephens
Documented as follows in the 5.6.6 changelog: After upgrading a replication slave to MySQL 5.6.2 or later, enabling the query cache eventually caused the slave to fail. In the 5.5.26 changelog: After upgrading a replication slave to MySQL 5.5.18 or later, enabling the query cache eventually caused the slave to fail. In the 5.1.64 changelog: After upgrading a replication slave to MySQL 5.1.60 or later, enabling the query cache eventually caused the slave to fail. Closed.