Bug #47134 | Crash on startup when XA support functions activated by a second engine | ||
---|---|---|---|
Submitted: | 4 Sep 2009 12:32 | Modified: | 4 Mar 2013 18:52 |
Reporter: | Paul McCullagh (Basic Quality Contributor) (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Storage Engine API | Severity: | S1 (Critical) |
Version: | 5.1.35+ | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | crash, pbxt, qc, startup, tc.log, xa |
[4 Sep 2009 12:32]
Paul McCullagh
[4 Sep 2009 13:33]
Paul McCullagh
I have found the immediate source of the crash. In the function TC_LOG_MMAP::open() we have: Line: 4957, file: log.cc: pg->end=(my_xid *)(pg->start + tc_log_page_size); The type of pg->start is (ulonglong *). The code is assuming the type is (char *). So it is adding 8*tc_log_page_size to pg->start instead of 1*tc_log_page_size. This makes me doubt the reliability of this code in general.
[4 Sep 2009 18:22]
MySQL Verification Team
miguel@luz:~/dbs$ 5.1/libexec/mysqld 090904 15:20:27 [Note] Plugin 'FEDERATED' is disabled. 090904 15:20:27 InnoDB: Started; log sequence number 0 44233 090904 15:20:27 [Note] PrimeBase XT (PBXT) Engine 1.0.08d RC loaded... 090904 15:20:27 [Note] Paul McCullagh, PrimeBase Technologies GmbH, http://www.primebase.org mysqld: handler.cc:1586: int ha_recover(HASH*): Assertion `total_ha_2pc == (ulong) opt_bin_log+1' failed. 090904 15:20:27 - mysqld got signal 6 ;
[4 Sep 2009 18:39]
MySQL Verification Team
Thank you for the bug report. Verified against the last tree source 5.1.
[7 Oct 2009 2:14]
Elena Stepanova
I am also getting a crash on 6.0.9; it disappears if i switch off pbxt_support_xa, so I assume it is a related problem but it happens not on startup but after some queries, but has somewhat different stack trace, so I thought it might be worth mentioning here as well. Stack trace: #0 0x00002b492993eea3 in pthread_kill () from /lib64/libpthread.so.0 #1 0x000000000065ceec in handle_segfault (sig=11) at mysqld.cc:2671 #2 <signal handler called> #3 0x00002b492993e51a in pthread_cond_signal@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #4 0x000000000070679c in TC_LOG_MMAP::sync (this=0xf11aa0) at log.cc:7062 #5 0x000000000070d080 in TC_LOG_MMAP::log_xid (this=0xf11aa0, thd=<value optimized out>, xid=<value optimized out>) at log.cc:7031 #6 0x000000000075d911 in ha_commit_trans (thd=0x1fa03e0, all=<value optimized out>) at handler.cc:1172 #7 0x00000000007f7f5c in trans_commit (thd=0x1fa03e0) at transaction.cc:132 #8 0x000000000066e000 in mysql_execute_command (thd=0x1fa03e0) at sql_parse.cc:4026 #9 0x0000000000671d07 in mysql_parse (thd=0x1fa03e0, inBuf=0x20201f8 "commit", length=6, found_semicolon=0x4c89a100) at sql_parse.cc:5735 #10 0x0000000000672606 in dispatch_command (command=COM_QUERY, thd=0x1fa03e0, packet=0x2010bd1 "commit", packet_length=<value optimized out>) at sql_parse.cc:1007 #11 0x00000000006730d2 in do_command (thd=0x1fa03e0) at sql_parse.cc:690 #12 0x0000000000666284 in handle_one_connection (arg=0x1fa03e0) at sql_connect.cc:1145 #13 0x00002b492993a143 in start_thread () from /lib64/libpthread.so.0 #14 0x00002b492a0738cd in clone () from /lib64/libc.so.6 #15 0x0000000000000000 in ?? () # Test case: # (MySQL server 6.0.9, PBXT 1.0.09) drop database if exists pbxt_innodb; create database pbxt_innodb; use pbxt_innodb; create table t_pbxt ( i int not null auto_increment, primary key (i), f varchar (128) ) engine = pbxt; create table t_inno ( i int not null auto_increment, primary key (i), f varchar (128) ) engine = innodb; start transaction; insert into t_pbxt (f) values ( 'abc'), ('def' ); insert into t_inno (f) values ( 'abc'), ('def' ); commit;
[17 May 2010 16:25]
Mark Callaghan
I have no problem reproducing this: 1) start mysqld with InnoDB and PBXT enabled 2) kill -9 mysqld On recovery the server crashes again and again until I remove var/tc.log. In many cases it is probably a bad idea to remove that file. 100517 09:18:26 [Note] The server was not shutdown correctly, recovery required 100517 9:18:26 [Note] Recovering after a crash using tc.log 100517 9:18:26 - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=2097152000 read_buffer_size=131072 max_used_connections=0 max_threads=2000 threads_connected=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6409238 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = (nil) thread_stack 0x40000 /data/51s/libexec/mysqld(my_print_stacktrace+0x32) [0x9ad62a] /data/51s/libexec/mysqld(handle_segfault+0x28a) [0x5e9db8] /lib64/libpthread.so.0 [0x328f80de70] /data/51s/libexec/mysqld(TC_LOG_MMAP::recover()+0x180) [0x6a7030] /data/51s/libexec/mysqld(TC_LOG_MMAP::open(char const*)+0x4a6) [0x6a7592] /data/51s/libexec/mysqld [0x5ebe08] /data/51s/libexec/mysqld(main+0x23d) [0x5ec62d] /lib64/libc.so.6(__libc_start_main+0xf4) [0x328ec1d8b4] /data/51s/libexec/mysqld [0x510089]
[26 Jun 2011 17:38]
Valeriy Kravchuk
See bug #32926 also (looks like the same problem but reported earlier for 6.0.x).
[21 Aug 2012 17:35]
Mark Callaghan
Rich Prohaska from Tokutek posted this: TokuDB is a MySQL storage engine that implements XA. Unfortunately, MySQL 5.6.6-m9 (and prior versions) do not support multiple XA storage engines, as described in MySQL bug #32926. This bug report described how to crash mysqld with the falcon and innodb storage engines. Another way to reproduce this problem is to add a fake prepare method to the example storage engine, install the example plugin, and run MySQL with the binlog disabled. It would be nice (for us) if this problem was fixed sometime prior to the 5.6.6 GA. We made some changes to the 5.6.6-m9 code base to support TokuDB as another XA storage engine. diff -r mysql-5.6.6-m9/sql/handler.cc mysql-5.6.6-m9-patched/sql/handler.cc 1824c1824 < DBUG_ASSERT(total_ha_2pc == (ulong) opt_bin_log+1); // only InnoDB and binlog --- > // DBUG_ASSERT(total_ha_2pc == (ulong) opt_bin_log+1); // only InnoDB and binlog diff -r mysql-5.6.6-m9/sql/log.cc mysql-5.6.6-m9-patched/sql/log.cc 2512c2512 < pg->end=(my_xid *)(pg->start + tc_log_page_size); --- > pg->end=(my_xid *)(data + (i+1)*tc_log_page_size); 2513a2514 > pg->ptr=pg->start; 2774c2775,2776 < mysql_cond_signal(&active->cond); // wake up a new syncer --- > if (active) > mysql_cond_signal(&active->cond); // wake up a new syncer diff -r mysql-5.6.6-m9/storage/example/ha_example.cc mysql-5.6.6-m9-patched/storage/example/ha_example.cc 131a132,136 > static int example_prepare(handlerton *hton, THD *thd, bool all) > { > DBUG_ENTER("example_prepare"); > DBUG_RETURN(EINVAL); > } 146a152 > example_hton->prepare= example_prepare;
[29 Dec 2012 21:26]
Erlend Dahl
Marked Bug#32926 "lack of suppport for multiple two-phase-commit storage engines" as a duplicate, since the work is being carried out in the context of Bug#47134
[4 Mar 2013 18:52]
Paul DuBois
Noted in 5.7.1 changelog. If XA support was activated by multiple storage engines, the server would exit.
[6 Nov 2013 12:22]
Laurynas Biveinis
mysql-server$ bzr log -r 5201.2.1 ------------------------------------------------------------ revno: 5201.2.1 committer: Dmitry Shulga <Dmitry.Shulga@oracle.com> branch nick: mysql-trunk-bug11755370 timestamp: Fri 2013-03-01 19:02:41 +0600 message: This patch fixes bug#11755370 - CRASH ON STARTUP WHEN XA SUPPORT FUNCTIONS ACTIVATED BY A SECOND ENGINE. The reason for server crash was that there was a bug in pointer arithmetics when server calculated begin/end values of the memory pages mmaped to the file. Beside this bug reason there were also a bug related to multi-threaded programming. There were a race condition when several threads concurrently calls TC_LOG_MMAP::commit. Method commit() calls methods log()/unlog() that writes xid values to the mmaped file. This file is divided by pages and those pages is used as IO units during file synchronization. Access to those pages is controlled by a few mutexes. There was a bug in working with those mutexes. To fix the race condition bug the work with mutexes was simplified. Access to shared data now protected by one dedicated mutex. Formerly to protect access to shared data it were used several mutexes. Such approach added finer grained concurrency effect but introduced possible race condition. After analysis was done we decided to rewrite a logic regarding to synchronization of access to shared data in the way that increases robustness and decreases fine graining locking. Also there was a bug in code manipulating with list of pages in the pool. MTR test isn't added since there isn't currently any other engine with XA feature enabled aside from INNODB. I tested my patch using the fake second engine with XA feature enabled that I created from ha_example.cc. I think it won't be correct to modify ha_example.cc just for the sake of patch test coverage. Nevertheless I added unittest that runs 10 threads and calls TC_LOG_MMAP::commit for distributed transactions. Such test should crash the server without changes included in this patch. Also another two new unit tests were added to test TC_LOG_MMAP behavior in situations when its pages are close to be fully filled are fully filled. mysql-server$ bzr log -r 5208 ------------------------------------------------------------ revno: 5208 committer: Dmitry Shulga <Dmitry.Shulga@oracle.com> branch nick: mysql-trunk-new timestamp: Mon 2013-03-04 22:56:49 +0600 message: This is a follow-up for patch for bug#11755370 - CRASH ON STARTUP WHEN XA SUPPORT FUNCTIONS ACTIVATED BY A SECOND ENGINE. This patch fixes the compiler warnings. mysql-server$ bzr log -r 5283 ------------------------------------------------------------ revno: 5283 committer: Dmitry Shulga <Dmitry.Shulga@oracle.com> branch nick: mysql-trunk timestamp: Wed 2013-03-13 22:30:51 +0600 message: This is a follow-up patch for bug#11755370. It fixes compilation warnings reported by Sun Studio compiler. mysql-server$ bzr log -r 5381 ------------------------------------------------------------ revno: 5381 committer: Tor Didriksen <tor.didriksen@oracle.com> branch nick: trunk-merge timestamp: Fri 2013-03-22 14:07:07 +0100 message: Bug#11755370 - CRASH ON STARTUP WHEN XA SUPPORT FUNCTIONS Post push fix: disable msync/fsync in unit test. (reduces execution time from seconds to milliseconds) mysql-server$ bzr log -r 5406 ------------------------------------------------------------ revno: 5406 committer: Tor Didriksen <tor.didriksen@oracle.com> branch nick: trunk-review timestamp: Tue 2013-03-26 12:43:43 +0100 message: Bug#11755370 - CRASH ON STARTUP WHEN XA SUPPORT FUNCTIONS Post-push fix: remove compile warnings for sun studio.
[13 Aug 2015 7:10]
Zhenye Xie
This bug still exists in mysql 5.6 branch. But already fixed in 5.7 branch.