Bug #18946 Test case rpl_ndb_ddl disabled
Submitted: 10 Apr 2006 13:38 Modified: 26 Apr 2007 10:25
Reporter: Lars Thalmann Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Replication Severity:S3 (Non-critical)
Version:5.1-BK OS:Any
Assigned to: Lars Thalmann CPU Architecture:Any

[10 Apr 2006 13:38] Lars Thalmann
Description:
rpl_ndb_ddl              : master hangs

How to repeat:
Check disabled.def
[10 Apr 2006 15:15] Valeriy Kravchuk
Verified just as described with 5.1.10-BK (ChangeSet@1.2303.1.1, 2006-04-09 19:43:36-07:00)
[14 Apr 2006 5:39] Tomas Ulin
test case has been rerun 5 times in pushbuild without issues

reenabled test in main tree
[14 Apr 2006 8:47] Lars Thalmann
Test case is still disabled.
[19 Sep 2006 18:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/12233

ChangeSet@1.2322, 2006-09-19 20:50:28+02:00, jmiller@mysql.com +2 -0
  Fix for Bug#18946
[13 Nov 2006 15:40] Jonathan Miller
Pushed into 5.1 Replication Team tree
[22 Nov 2006 14:25] Matthias Leich
There are much more tests needed.
The current behaviour of InnoDB looks wrong and that of NDB reasonable.
But the NDB behaviour might cause a lot of additional bad effects,
when having a transactions on the master like:
1. modify a persistent table
2. CREATE TEMPORARY TABLE t1 AS SELECT 1;
3. ROLLBACK
# Summary of changes on master: We have now the TEMP TABLE
# Summary of changes on slave:  nothing
4. INSERT INTO persistent table SELECT ... FROM t1;
5. COMMIT
# master: success
# slave: unknown table ?
[6 Dec 2006 20:27] Jonathan Miller
Matthias,

NDB does not support temp tables.

Cheers, Jeb
[12 Feb 2007 20:43] Matthias Leich
Some news after many hours of experiments on MySQL 5.1 (2007-02):
1. Jonathan, thank you for the hint. This is a weakness of the testscripts
   when being applied to different storage engines.
   We must get the same result (server response) when executing CREATE
   TEMPORARY TABLE for all storage engines to be tested.
   The check of storage engine related properties of TEMPORARY TABLEs
   is not the intention of this test. It should be simply checked
   if some command like CREATE TEMPORARY TABLE causes an implicit commit.
   I will modify the test so that a "standard" storage engine like
   MyISAM or MEMORY for the TEMPORARY TABLE is used.
2. There are several logical script bugs/wrong comments/etc.
   Example: The slave connection is an observer of changes! It must not
            affect actions of the master connection --> Slave should
            run with AUTOCOMMIT = 1
   I will fix these weaknesses.
3. The test shows at least in connection with NDB a fairly till not
   acceptable behaviour.
   It needs in general a huge runtime (low CPU and I/O activity) which
   is sometimes bigger than the timeouts used by mysql-test-run.pl.
   I currently do not know which part of MySQL is reponsible.
   I have one "testing" activity per 30 seconds within
   slave-data/mysql/general_log.CSV .
4. The current outcome of this test in connection with InnoDB and
   Falcon suffers from till now not reported bugs.
   They violate the intended behaviour pattern described 
   within 
   Bug#22864 Rollback following CREATE ... SELECT discards 
             'CREATE table' from log
   ------------------   ----- ----- ----- -----
   Statement              SC    EC   URB   ERB
   ------------------   ----- ----- ----- -----
   ...
   CREATE TMP TABLE      No    No    Yes   Yes
                                     XXX !
5. The test should be extended.
[6 Mar 2007 17:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/21250

ChangeSet@1.2472, 2007-03-06 18:15:31+01:00, mleich@four.local.lan +9 -0
  Bug#18946 Test case rpl_ndb_ddl disabled
  1. Fixes within the testscripts   (affects rpl_ddl.test and rpl_ndb_ddl.test)
     - slave connection is only an observer (-> AUTOCOMMIT = 0)
       This removes the problem with the hanging test around DROP DATABASE (NDB). The hanging test around DROP DATABASE is a difference to InnoDB/MyISAm behaviour but fare
       away of a clear bug. IMHO this behaviour does not violate the SQL standard and should be therefore simply accepted.   
     - removal of wrong comments
     - CREATE/DROP TEMPORARY TABLE must not cause implicit commit of the current transaction.
       NDB behaves here correct and InnoDB/Falcon wrong.
     - Add a missing connection slave
     - Reenable the test rpl_ndb_ddl.
  2. Disable rpl_ddl.test because of Bug#26418.
  3. Reenable rpl_ndb_ddl.test
  4. Improvements (affect rpl_ddl.test and rpl_ndb_ddl.test)
     - Better + extended comments which should prevent that somebody accidently destroys the logics of the test
     - Replace SELECT's printing comments by "--echo"  (decreases the number of auxiliary SQL commands)
     - Remove the need for include/rpl_stmt_seq2.inc   (was mostly redundant to rpl_stmt_seq.inc)
     - Remove extra/rpl_tests/rpl_ndb_ddl.test         (corrected extra/rpl_tests/rpl_ddl.test is sufficient)  
     - Shift assignment of values to $show_binlog, $manipulate (variables useful for debugging) into the toplevel scripts
     - The temporary tables get now their storage engine from the variable $temp_engine_type. (more deterministic testing conditions)
     - Add additional protocol line if the connection is switched (was partially missing)
     - Add two DML commands for comparison purposes
[30 Mar 2007 19:23] Matthias Leich
Review by Lars + approval to push via email
(https://intranet.mysql.com/secure/mailarchive/mail.php?folder=104&mail=139310).

Bug fix pushed to mysql-5.1-rpl (5.1.18) after successful testing.

There is no documentation needed.
[31 Mar 2007 14:55] Matthias Leich
My evironment:
   Intel Core2Duo (64 Bit), Linux openSUSE 10.2 (X86-64)
   mysql-5.1-rpl last Changeset ChangeSet@1.2541, 2007-03-30
   compile-pentium-debug-max
   MySQL server reports:5.1.18-beta-debug-log

make test-ns:
cd mysql-test ; \
        /usr/bin/perl ./mysql-test-run.pl   --mysqld=--binlog-format=mixed
Logging: ./mysql-test-run.pl --mysqld=--binlog-format=mixed
MySQL Version 5.1.18
Using binlog format 'mixed'
....
TEST                           RESULT         TIME (ms)
-------------------------------------------------------

1st                            [ pass ]              2
alias                          [ pass ]             83
....
rpl_ndb_commit_afterflush      [ pass ]          36525
rpl_ndb_dd_advance             [ disabled ]  Bug#25913 ...
rpl_ndb_dd_basic               [ skipped ]   Not running with ....
rpl_ndb_dd_partitions          [ disabled ]  BUG#19259 ...
rpl_ndb_ddl                    [ pass ]         173586
rpl_ndb_delete_nowhere         [ skipped ]   Not running with ...
rpl_ndb_do_db                  [ skipped ]   Not running with ...
rpl_ndb_do_table               [ skipped ]   Not running with ... 
rpl_ndb_extraCol               [ skipped ]   Not running with ...
rpl_ndb_func003                [ skipped ]   Not running with ...
rpl_ndb_idempotent             [ skipped ]   Not running with ...
rpl_ndb_innodb2ndb             [ disabled ]  Bug #19710 ...
rpl_ndb_innodb_trans           [ pass ]          16380
...
-------------------------------------------------------
Stopping All Servers
Shutting-down Instance Manager
All 568 tests were successful.
The servers where restarted 139 times
Spent 4384.545 seconds actually executing testcases

That means I cannot reproduce the problem.

We had within the history of this bug report the problem
that the testcase was endless hanging around the end of
the test (DROP DATABASE ...).
This problem
- was deterministic (also reproducible on my box)
- could not be called a bug
- is now avoided by appropriate coding

That means the current bad effect is the same, but the 
reason could not be the old one. And we do not have this
problem under all environments.

my box  is openSUSE 10.2 (X86-64), Intel Core2Duo
           compile-pentium-debug-max
sapsrv1 is SUSE LINUX 10.1 (X86-64), some Intel Processor
           compile-pentium-debug-max
           parallel pushbuild runs ??
It could be also possible that the sapsrv1 did not had
enough remaining CPU and/or I/O power for the test.
My proposal is to 
1) rerun the test on sapsrv1 when the general load is low
2) experiment with the "mysql-test-run.pl" options
   "--testcase-timeout" and "--suite-timeout".
[31 Mar 2007 23:55] Bugs System
Pushed into 5.1.18-beta
[13 Apr 2007 1:38] Jonathan Miller
Assigning it back to Lars, so he can decided what to do with this.

After a length review it is apparent that the NDB engine is acting correctly but the InnoDB does not act correctly in 2 instances. Not sure if it is really engine related or if it is the difference between the NDB Injector Thread and MySQLD Replication.

In both of the test cases listed below the insert is supposed to be rolled back, but is actually committed when using the InnoDB engine.

######## DROP TEMPORARY TABLE mysqltest1.t23 ########

-------- switch to master -------
INSERT INTO t1 SET f1= 5 + 1;
SELECT MAX(f1) FROM t1;
MAX(f1)
6

-------- switch to slave --------
SELECT MAX(f1) FROM t1;
MAX(f1)
5

-------- switch to master -------
DROP TEMPORARY TABLE mysqltest1.t23;
SELECT MAX(f1) FROM t1;
MAX(f1)
6

-------- switch to slave --------
SELECT MAX(f1) FROM t1;
MAX(f1)
5

-------- switch to master -------
ROLLBACK;
SELECT MAX(f1) FROM t1;
MAX(f1)
5

TEST-INFO: MASTER: The INSERT is not committed (Succeeded)

-------- switch to slave --------
SELECT MAX(f1) FROM t1;
MAX(f1)
6 <-------- ************** Should be (5) ***************************

TEST-INFO: SLAVE:  The INSERT is committed (Failed)

-------- switch to master -------
SHOW TABLES LIKE 't23';
Tables_in_mysqltest1 (t23)

-------- switch to slave --------
SHOW TABLES LIKE 't23';
Tables_in_mysqltest1 (t23)

-------- switch to master -------

*****************************************************************

######## CREATE TEMPORARY TABLE mysqltest1.t22 (f1 BIGINT) ENGINE=MEMORY ########

-------- switch to master -------
INSERT INTO t1 SET f1= 8 + 1;
SELECT MAX(f1) FROM t1;
MAX(f1)
9

-------- switch to slave --------
SELECT MAX(f1) FROM t1;
MAX(f1)
8

-------- switch to master -------
CREATE TEMPORARY TABLE mysqltest1.t22 (f1 BIGINT) ENGINE=MEMORY;
SELECT MAX(f1) FROM t1;
MAX(f1)
9

-------- switch to slave --------
SELECT MAX(f1) FROM t1;
MAX(f1)
8

-------- switch to master -------
ROLLBACK;
SELECT MAX(f1) FROM t1;
MAX(f1)
8

TEST-INFO: MASTER: The INSERT is not committed (Succeeded)

-------- switch to slave --------
SELECT MAX(f1) FROM t1;
MAX(f1)
9 <-------- ************** Should be (8) ***************************

TEST-INFO: SLAVE:  The INSERT is committed (Failed)

Note: The rpl_ndb_ddl.test is passing. I can make a commit for 5.1 that removed this test from disable.def

Please let me know if you would like me to create that patch.
Best wishes,
/Jeb
[20 Apr 2007 12:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/25003

ChangeSet@1.2570, 2007-04-20 16:02:14+02:00, mleich@four.local.lan +1 -0
  The fix for   Bug#18946: Test case rpl_ndb_ddl disabled   around end of March 2007 enabled this testcase.
  It was later disabled because the test failed with timeout on one testing box.
  The reason for this failing test could not be found because we do not have informations about the conditions on the box during this test.
  Jeb and I tried this test on other boxes and it passed.
  My experience is that
  - tests using NDB need in general often significant more runtime
    than comparable tests of other storage engines
  - the actual load of the box where the test is running and the
    filesystem (nfs could be extreme slow) where the tests are
    executed might have a huge impact on the test performance 
    (runtime * 2 till 3)
  - there are sometimes problems with the ports most probably
    caused by OS properties (NDB+RPL need many ports) or
    parallel tests accidently running with the same ports.
  AFAIK these are the reasons why the NDB tests fail sometimes with timeout.
  Conclusion: We enable rpl_ndb_ddl again because the failure happens in rare cases and seems not to be caused by errors within the server or test code.
[20 Apr 2007 15:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/25021

ChangeSet@1.2571, 2007-04-20 18:39:01+02:00, mleich@four.local.lan +1 -0
  The fix for   Bug#18946: Test case rpl_ndb_ddl disabled   around end
  of March 2007 enabled this testcase.
  It was later disabled because the test failed with timeout on one
  testing box.
  The reason for this failing test could not be found because we do not
  have informations about the conditions on the box during this test.
  Jeb and I tried this test on other boxes and it passed.
  My experience is that
  - tests using NDB need in general often significant more runtime
    than comparable tests of other storage engines
  - the actual load of the box where the test is running and the
    filesystem (nfs could be extreme slow) where the tests are
    executed might have a huge impact on the test performance 
    (runtime * 2 till 3)
  - there are sometimes problems with the ports most probably
    caused by OS properties (NDB+RPL need many ports) or
    parallel tests accidently running with the same ports.
  AFAIK these are the reasons why the NDB tests fail sometimes with
  timeout.
  Conclusion: We enable rpl_ndb_ddl again because the failure happens
  in rare cases and seems not to be caused by errors within the server or
  test code.
[23 Apr 2007 9:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/25089

ChangeSet@1.2571, 2007-04-24 11:29:52+02:00, mleich@four.local.lan +1 -0
  The fix for   Bug#18946: Test case rpl_ndb_ddl disabled   around end
  of March 2007 enabled this testcase.
  It was later disabled because the test failed with timeout on one
  testing box.
  The reason for this failing test could not be found because we do not
  have informations about the conditions on the box during this test.
  Jeb and I tried this test on other boxes and it passed.
  My experience is that
  - tests using NDB need in general often significant more runtime
    than comparable tests of other storage engines
  - the actual load of the box where the test is running and the
    filesystem (nfs could be extreme slow) where the tests are
    executed might have a huge impact on the test performance 
    (runtime * 2 till 3)
  - there are sometimes problems with the ports most probably
    caused by OS properties (NDB+RPL need many ports) or
    parallel tests accidently running with the same ports.
  AFAIK these are the reasons why the NDB tests fail sometimes with
  timeout.
  Conclusion: We enable rpl_ndb_ddl again because the failure happens
  in rare cases and seems not to be caused by errors within the server
  or test code.
[23 Apr 2007 17:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/25147

ChangeSet@1.2571, 2007-04-24 19:05:19+02:00, mleich@four.local.lan +1 -0
  The fix for   Bug#18946: Test case rpl_ndb_ddl disabled   pushed around end of March 2007 enabled this testcase.
  It was later disabled because the test failed with timeout on one testing box.
  The reason for this failing test could not be found because we do not have informations about the conditions on the box during this test.
  Jeb and I tried this test on other boxes and it passed.
  My experience is that
  - tests using NDB need in general often significant more runtime
    than comparable tests of other storage engines
  - the actual load of the box where the test is running and the
    filesystem (nfs could be extreme slow) where the tests are
    executed might have a huge impact on the test performance 
    (runtime * 2 till 3)
  - there are sometimes problems with the ports most probably
    caused by OS properties (NDB+RPL need many ports) or
    parallel tests accidently running with the same ports.
  AFAIK these are the reasons why the NDB tests fail sometimes with timeout.
  Conclusion: We enable rpl_ndb_ddl again because the failure happens in rare cases
  and seems not to be caused by errors within the server or test code.
[24 Apr 2007 9:00] Matthias Leich
rpl_ndb_ddl failed unfortunately again because of timeout on
two pushbuild boxes.
NDB is is far way too slow with statement based replication.
[24 Apr 2007 12:11] Jonathan Miller
This test should only run as RBR for NDB

If rpl_ndb_ddl.test is running under SBR, then we need to add:
-- source include/have_binlog_format_row.inc
[26 Apr 2007 10:24] Matthias Leich
Jeb, thank you for the hint and the valuable discussion.

I already fixed the problem by
  --source include/have_binlog_format_mixed_or_row.inc
  (skips this test when running with SBR)
but your solution with
  --source include/have_binlog_format_row.inc
  (skips this test when running with SBR or MIXED)
is much better.
Reason:
   This test executes DML statements on a NDB table to detect
   if some SQL statements of special interest commit the ongoing
   transaction.
   When running in MIXED mode, automatic switching from statement-
   based to row-based replication takes place when a DML statement
   updates a NDB table.
   That means running this test on NDB with binlog-format=mixed and 
   binlog-format=row mostly checks the same routines twice.
   Therefore we skip the variant with binlog-format=mixed.

Fix pushed to mysql-5.1-rpl (5.1.18) after successful testing.
There is no documentation needed.
[1 Jun 2007 19:24] Bugs System
Pushed into 5.1.20-beta