Bug #70854 Tc_log_page_size should be unflushable or server crashes if 2 XA SEs installed
Submitted: 7 Nov 2013 10:40 Modified: 3 Apr 2014 15:15
Reporter: Laurynas Biveinis (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Logging Severity:S2 (Serious)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any
Tags: innodb, SE, tc_log_page_size, TokuDB, xa

[7 Nov 2013 10:40] Laurynas Biveinis
Description:
Credit goes to MariaDB.

With bug 47134 fixed, if TC_LOG_MMAP is in use (because of two XA-supporting storage engines available, such as InnoDB and TokuDB), FLUSH STATUS should cause a server crash on the next XA commit.

The status variable tc_log_page_size is directly bound to tc_log_page_size source variable, which is setup once in TC_LOG_MMAP::open(). A FLUSH would cause it to be reset to zero.  Then TC_LOG_MMAP::unlog() would attempt to divide by it.

How to repeat:
Code analysis. It's not easy to make a testcase due to stock MySQL not having a second XA-supporting storage engine.

Suggested fix:
Make Tc_log_page_size a SHOW_LONG_NOFLUSH variable.
[7 Nov 2013 18:09] Sinisa Milivojevic
We believe you that a crash is possible under circumstances that you describe. However, we can not set this bug to "Verified" unless we have a fully repeatable test case for the MySQL server. A test case can be accepted even if it requires up to 10 runs in order for a crash to occur. 

If you can not come with a fully repeatable test case for MySQL server, we can set this bug to "To be fixed later", with an internal note that this bug should be tested when we have two XA capable storage engines. The other option is that you revisit this bug when we have two XA-enabled stored engines.
[8 Nov 2013 1:53] Davi Arnaut
Somehow Bug#47134 can be verified, fixed and tested without two XA capable storage engines, but this one can't.
[8 Nov 2013 9:10] Laurynas Biveinis
OK.

1) Make EXAMPLE a XA engine and fix -DWITH_EXAMPLE_STORAGE_ENGINE=1:

=== modified file 'storage/example/CMakeLists.txt'
--- storage/example/CMakeLists.txt	2011-09-07 10:08:09 +0000
+++ storage/example/CMakeLists.txt	2013-11-08 09:01:25 +0000
@@ -15,4 +15,4 @@
 
 SET(EXAMPLE_PLUGIN_DYNAMIC "ha_example")
 SET(EXAMPLE_SOURCES ha_example.cc)
-MYSQL_ADD_PLUGIN(example ${EXAMPLE_SOURCES} STORAGE_ENGINE MODULE_ONLY)
+MYSQL_ADD_PLUGIN(example ${EXAMPLE_SOURCES} STORAGE_ENGINE)

=== modified file 'storage/example/ha_example.cc'
--- storage/example/ha_example.cc	2013-05-19 18:46:26 +0000
+++ storage/example/ha_example.cc	2013-11-08 08:59:43 +0000
@@ -97,6 +97,10 @@
                                        TABLE_SHARE *table, 
                                        MEM_ROOT *mem_root);
 
+static int example_prepare_handler(handlerton *hton,
+                                   THD *thd,
+                                   bool all);
+
 handlerton *example_hton;
 
 /* Interface to mysqld, to check system tables supported by SE */
@@ -141,6 +145,7 @@
   example_hton= (handlerton *)p;
   example_hton->state=                     SHOW_OPTION_YES;
   example_hton->create=                    example_create_handler;
+  example_hton->prepare=                   example_prepare_handler;
   example_hton->flags=                     HTON_CAN_RECREATE;
   example_hton->system_database=   example_system_database;
   example_hton->is_supported_system_table= example_is_supported_system_table;
@@ -185,6 +190,13 @@
   return new (mem_root) ha_example(hton, table);
 }
 
+static int example_prepare_handler(handlerton *hton,
+                                   THD *thd,
+                                   bool all)
+{
+  return 0;
+}
+
 ha_example::ha_example(handlerton *hton, TABLE_SHARE *table_arg)
   :handler(hton, table_arg)
 {}

2) Run this:

CREATE TABLE t1 (a INT) ENGINE=EXAMPLE;
CREATE TABLE t2 (a INT) ENGINE=InnoDB;

SHOW STATUS LIKE 'Tc_log_page_size';

BEGIN;
INSERT INTO t1 VALUES (1);
INSERT INTO t2 VALUES (2);
COMMIT;

FLUSH STATUS;

SHOW STATUS LIKE 'Tc_log_page_size';

DROP TABLE t1;
DROP TABLE t2;

3) Observe this:

CREATE TABLE t1 (a INT) ENGINE=EXAMPLE;
CREATE TABLE t2 (a INT) ENGINE=InnoDB;
SHOW STATUS LIKE 'Tc_log_page_size';
Variable_name	Value
Tc_log_page_size	4096
BEGIN;
INSERT INTO t1 VALUES (1);
INSERT INTO t2 VALUES (2);
COMMIT;
main.foo                                 [ fail ]
        Test ended at 2013-11-08 11:09:09

CURRENT_TEST: main.foo
mysqltest: At line 11: query 'FLUSH STATUS' failed: 2013: Lost connection to MySQL server during query
...
Thread 1 (Thread 0x7f1084356700 (LWP 2015)):
#0  0x00007f10924b3f0c in __pthread_kill (threadid=<optimized out>, signo=8) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:62
#1  0x0000000000b38cbe in my_write_core (sig=8) at /home/laurynas/percona/src/mysql-server/mysys/stacktrace.c:258
#2  0x000000000081827e in handle_fatal_signal (sig=8) at /home/laurynas/percona/src/mysql-server/sql/signal_handler.cc:216
#3  <signal handler called>
#4  0x00000000007db6d5 in TC_LOG_MMAP::unlog (this=0x1927760 <tc_log_mmap>, cookie=8192, xid=25) at /home/laurynas/percona/src/mysql-server/sql/log.cc:2213
#5  0x00000000007db2b4 in TC_LOG_MMAP::commit (this=0x1927760 <tc_log_mmap>, thd=0x7f1064000a50, all=true) at /home/laurynas/percona/src/mysql-server/sql/log.cc:2081
#6  0x00000000009a53b6 in trans_commit_implicit (thd=0x7f1064000a50) at /home/laurynas/percona/src/mysql-server/sql/transaction.cc:284
#7  0x00000000008c3121 in mysql_execute_command (thd=0x7f1064000a50) at /home/laurynas/percona/src/mysql-server/sql/sql_parse.cc:4794
#8  0x00000000008c4459 in mysql_parse (thd=0x7f1064000a50, rawbuf=0x7f106400dd20 "FLUSH STATUS", length=12, parser_state=0x7f1084354ed0) at /home/laurynas/percona/src/mysql-server/sql/sql_parse.cc:5274
#9  0x00000000008b97ad in dispatch_command (command=COM_QUERY, thd=0x7f1064000a50, packet=0x7f1064004a31 "FLUSH STATUS", packet_length=12) at /home/laurynas/percona/src/mysql-server/sql/sql_parse.cc:1328
#10 0x00000000008b88f7 in do_command (thd=0x7f1064000a50) at /home/laurynas/percona/src/mysql-server/sql/sql_parse.cc:1027
#11 0x00000000009c7036 in handle_connection (arg=0x357eea0) at /home/laurynas/percona/src/mysql-server/sql/conn_handler/connection_handler_per_thread.cc:227
#12 0x0000000000be0e8d in pfs_spawn_thread (arg=0x34cd1c0) at /home/laurynas/percona/src/mysql-server/storage/perfschema/pfs.cc:1983
#13 0x00007f10924aef6e in start_thread (arg=0x7f1084356700) at pthread_create.c:311
#14 0x00007f10919bb9cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

In GDB:

Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7fffec433700 (LWP 1893)]
0x00000000007db6d5 in TC_LOG_MMAP::unlog (this=0x1927760 <tc_log_mmap>, cookie=8192, xid=7) at /home/laurynas/percona/src/mysql-server/sql/log.cc:2213
2213	  PAGE *p= pages + (cookie / tc_log_page_size);
(gdb) print tc_log_page_size
$1 = 0
[8 Nov 2013 14:57] Sinisa Milivojevic
One more question only ...

I suppose that main.foo is the name of the MTR script that should be run by MTR script ???
[8 Nov 2013 15:12] Laurynas Biveinis
Sinisa -

Yes. I put the SQL from step 2) in mysql-test/t/foo.test and run it through MTR.
[6 Dec 2013 17:41] Sinisa Milivojevic
After couple of weeks, I managed to verify this bug. Had problems with interim 5.7 versions, most of all some `cmake` bugs. Small bugs ..... :o)

But, now it is fully verified.
[7 Feb 2014 2:01] Paul Dubois
Noted in 5.7.4 changelog.

FLUSH STATUS cleared a variable that could result a subsequent
implicit commit of an XA transaction causing a server exit.
[3 Apr 2014 15:15] Laurynas Biveinis
5.7$ bzr log -r 7195
------------------------------------------------------------
revno: 7195
committer: Dmitry Shulga <Dmitry.Shulga@oracle.com>
branch nick: mysql-trunk-bug17911445
timestamp: Tue 2013-12-31 16:23:39 +0700
message:
  This patch fixes the bug#17911445 - TC_LOG_PAGE_SIZE SHOULD BE UNFLUSHABLE OR
  SERVER CRASHES IF 2 XA SES INSTALLED.
  
  The status variable tc_log_page_size is directly bound to tc_log_page_size 
  source variable, which is setup once in TC_LOG_MMAP::open().
  Running of statement FLUSH STATUS would cause it to be reset to zero.
  Then during doing implicit commit the TC_LOG_MMAP::unlog() would
  attempt to divide by it that led to crash.
  
  To fix this bug a type of status variable tc_log_page_size was changed to 
  SHOW_LONG_NOFLUSH that denies reset of status variable value to zero.
  
  MTR test isn't added since there isn't currently any other engine with 
  XA feature enabled aside from INNODB. I tested my patch using the fake 
  second engine with XA feature enabled that I created from ha_example.cc.
  I think it won't be correct to modify ha_example.cc just for the sake of patch 
  test coverage.
[3 Apr 2014 15:15] Laurynas Biveinis
5.7$ bzr log -r 7195
------------------------------------------------------------
revno: 7195
committer: Dmitry Shulga <Dmitry.Shulga@oracle.com>
branch nick: mysql-trunk-bug17911445
timestamp: Tue 2013-12-31 16:23:39 +0700
message:
  This patch fixes the bug#17911445 - TC_LOG_PAGE_SIZE SHOULD BE UNFLUSHABLE OR
  SERVER CRASHES IF 2 XA SES INSTALLED.
  
  The status variable tc_log_page_size is directly bound to tc_log_page_size 
  source variable, which is setup once in TC_LOG_MMAP::open().
  Running of statement FLUSH STATUS would cause it to be reset to zero.
  Then during doing implicit commit the TC_LOG_MMAP::unlog() would
  attempt to divide by it that led to crash.
  
  To fix this bug a type of status variable tc_log_page_size was changed to 
  SHOW_LONG_NOFLUSH that denies reset of status variable value to zero.
  
  MTR test isn't added since there isn't currently any other engine with 
  XA feature enabled aside from INNODB. I tested my patch using the fake 
  second engine with XA feature enabled that I created from ha_example.cc.
  I think it won't be correct to modify ha_example.cc just for the sake of patch 
  test coverage.