| Bug #53676 | Unexpected errors and possible table corruption on ADD PARTITION and LOCK TABLE | ||
|---|---|---|---|
| Submitted: | 15 May 2010 20:42 | Modified: | 2 Sep 2010 14:19 |
| Reporter: | Elena Stepanova | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Partitions | Severity: | S2 (Serious) |
| Version: | 5.5.3-m3, 5.5.5-m3, 5.6.99 | OS: | Any |
| Assigned to: | Mattias Jonsson | CPU Architecture: | Any |
[17 May 2010 22:11]
Elena Stepanova
Test #1 can also be continued -- after connection con1 unlocks the table, we switch back to con2 and try to rename the table: --connection con2 RENAME TABLE t1 TO t1_tmp; The statement returns error 1025: ERROR HY000: Error on rename of './test/t1' to './test/t1_tmp' (errno: 155) (MySQL error code 155: The table does not exist in engine)
[17 May 2010 23:09]
Elena Stepanova
The test case below combines variations of the problem listed before. It also shows that the damage is not limited to one connection.
# Test #3
--source include/have_innodb.inc
--disable_abort_on_error
--connect (con1,localhost,root,,)
--disable_warnings
DROP TABLE IF EXISTS t1;
--enable_warnings
CREATE TABLE t1 ( i INT NOT NULL AUTO_INCREMENT PRIMARY KEY, f INT )
ENGINE = InnoDB PARTITION BY HASH(i) PARTITIONS 2;
--connect (con2,localhost,root,,)
SET lock_wait_timeout = 2;
--connection con1
--echo # Connection 1 locks the table
LOCK TABLE t1 READ;
--connection con2
--echo # Connection 2 tries to add partitions (timeout):
ALTER TABLE t1 ADD PARTITION PARTITIONS 2;
--connect (con3,localhost,root,,)
--echo # Connection 3 tries to add partitions (partition already exists):
ALTER TABLE t1 ADD PARTITION PARTITIONS 2;
--connect (con4,localhost,root,,)
--echo # Connection 4 tries to rename the table:
send RENAME TABLE t1 TO t2;
--connection con1
--real_sleep 1
--echo # Connection 1 unlocks the table:
UNLOCK TABLES;
--connection con4
--echo # Connection 4 gets error on rename:
--reap
--connect (con5,localhost,root,,)
--echo # SHOW TABLES returns the table (not renamed):
SHOW TABLES;
--echo # Connection 5 attempts to read from the table (table does not exist):
SELECT * FROM t1;
# End of Test #3
[8 Jun 2010 9:34]
Mattias Jonsson
related to bug#53770.
[11 Jun 2010 0:00]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/110773 3092 Mattias Jonsson 2010-06-11 Bug#53676: Unexpected errors and possible table corruption on ADD PARTITION and LOCK TABLE Bug#53770: Server crash at handler.cc:2076 on LOAD DATA after timed out COALESCE PARTITION 5.5 fix for: Bug#51042: REORGANIZE PARTITION can leave table in an inconsistent state in case of crash Needs to be back-ported to 5.1 5.5 fix for: Bug#50418: DROP PARTITION does not interact with transactions Main problem was non-persistent operations done before meta-data lock was taken (53770+53676). And 53676 needed to keep the table/partitions opened and locked while copying the data to the new partitions. Also added thorough tests to spot some additional bugs in the ddl_log code, which could result in bad state between the .frm and partitions. @ mysql-test/suite/parts/inc/partition_crash.inc recovery test including a crash @ mysql-test/suite/parts/inc/partition_crash_add.inc test all states in fast_alter_partition_table ADD PARTITION branch @ mysql-test/suite/parts/inc/partition_crash_change.inc test all states in fast_alter_partition_table CHANGE PARTITION branch @ mysql-test/suite/parts/inc/partition_crash_drop.inc test all states in fast_alter_partition_table DROP PARTITION branch @ mysql-test/suite/parts/inc/partition_fail.inc recovery test including an injected error @ mysql-test/suite/parts/inc/partition_fail_add.inc test all states in fast_alter_partition_table ADD PARTITION branch @ mysql-test/suite/parts/inc/partition_fail_change.inc test all states in fast_alter_partition_table ADD PARTITION branch @ mysql-test/suite/parts/inc/partition_fail_drop.inc test all states in fast_alter_partition_table DROP PARTITION branch @ mysql-test/suite/parts/t/partition_debug-master.opt opt file for testing crash recovery @ mysql-test/suite/parts/t/partition_debug.test Test for ALTER PARTITION commands involving crash and error injections. @ mysql-test/t/partition_innodb.test Also fixes bug#50418 @ sql/sql_base.cc Removed abort_and_upgrade_lock_and_close_table and exporting alter_close_tables instead @ sql/sql_base.h removed some non existing functions, added alter_close_tables. @ sql/sql_partition.cc fast_alter_partition_table: Splitted abort_and_upgrade_lock_and_close_table to its parts (wait_while_table_is_used and close_alter_tables) and always have wait_while_table_is_used before any persistent operations (including logs, which will be executed on failure) and close_alter_tables after create/read/write operations and before drop operations. Added error injections for better test coverage. write_log_final_change_partition: fixed a log_entry linking bug (delete_frm was not linked to change/drop partition) and drop partition must be executed before change partition (change partition can rename a partition to an old name, like REORG p1 INTO (p1,p2). write_log_add_change_partition: need to use drop_frm first, and relinking that entry and reusing its execute entry. @ sql/table.h removed a duplicate declaration.
[11 Jun 2010 11:04]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/110809 3092 Mattias Jonsson 2010-06-11 Bug#53676: Unexpected errors and possible table corruption on ADD PARTITION and LOCK TABLE Bug#53770: Server crash at handler.cc:2076 on LOAD DATA after timed out COALESCE PARTITION 5.5 fix for: Bug#51042: REORGANIZE PARTITION can leave table in an inconsistent state in case of crash Needs to be back-ported to 5.1 5.5 fix for: Bug#50418: DROP PARTITION does not interact with transactions Main problem was non-persistent operations done before meta-data lock was taken (53770+53676). And 53676 needed to keep the table/partitions opened and locked while copying the data to the new partitions. Also added thorough tests to spot some additional bugs in the ddl_log code, which could result in bad state between the .frm and partitions. Updated patch with InnoDB crash and fail tests and fixed a glitch in ADD PARTITION and DROP PARTITION. @ mysql-test/suite/parts/inc/partition_crash.inc recovery test including a crash @ mysql-test/suite/parts/inc/partition_crash_add.inc test all states in fast_alter_partition_table ADD PARTITION branch @ mysql-test/suite/parts/inc/partition_crash_change.inc test all states in fast_alter_partition_table CHANGE PARTITION branch @ mysql-test/suite/parts/inc/partition_crash_drop.inc test all states in fast_alter_partition_table DROP PARTITION branch @ mysql-test/suite/parts/inc/partition_fail.inc recovery test including an injected error @ mysql-test/suite/parts/inc/partition_fail_add.inc test all states in fast_alter_partition_table ADD PARTITION branch @ mysql-test/suite/parts/inc/partition_fail_change.inc test all states in fast_alter_partition_table ADD PARTITION branch @ mysql-test/suite/parts/inc/partition_fail_drop.inc test all states in fast_alter_partition_table DROP PARTITION branch @ mysql-test/suite/parts/t/partition_debug_myisam-master.opt opt file for testing crash recovery @ mysql-test/suite/parts/t/partition_debug_myisam.test Test for ALTER PARTITION commands involving crash and error injections. @ mysql-test/t/partition_innodb.test Also fixes bug#50418 @ sql/sql_base.cc Removed abort_and_upgrade_lock_and_close_table and exporting alter_close_tables instead @ sql/sql_base.h removed some non existing functions, added alter_close_tables. @ sql/sql_partition.cc fast_alter_partition_table: Splitted abort_and_upgrade_lock_and_close_table to its parts (wait_while_table_is_used and close_alter_tables) and always have wait_while_table_is_used before any persistent operations (including logs, which will be executed on failure) and close_alter_tables after create/read/write operations and before drop operations. Added error injections for better test coverage. write_log_final_change_partition: fixed a log_entry linking bug (delete_frm was not linked to change/drop partition) and drop partition must be executed before change partition (change partition can rename a partition to an old name, like REORG p1 INTO (p1,p2). write_log_add_change_partition: need to use drop_frm first, and relinking that entry and reusing its execute entry. @ sql/table.h removed a duplicate declaration.
[8 Jul 2010 12:48]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/113127 3093 Mattias Jonsson 2010-07-08 Bug#53676: Unexpected errors and possible table corruption on ADD PARTITION and LOCK TABLE Review fixes after Mikaels review. @ sql/sql_partition.cc Changed from using mysql_unlock_tables to use mysql_lock_remove instead, to avoid unlocking tables under LOCK TABLE. Fixed comments and style.
[13 Aug 2010 7:53]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/115640 3174 Mattias Jonsson 2010-08-13 Bug#53676: Unexpected errors and possible table corruption on ADD PARTITION and LOCK TABLE Bug#53770: Server crash at handler.cc:2076 on LOAD DATA after timed out COALESCE PARTITION 5.5 fix for: Bug#51042: REORGANIZE PARTITION can leave table in an inconsistent state in case of crash Needs to be back-ported to 5.1 5.5 fix for: Bug#50418: DROP PARTITION does not interact with transactions Main problem was non-persistent operations done before meta-data lock was taken (53770+53676). And 53676 needed to keep the table/partitions opened and locked while copying the data to the new partitions. Also added thorough tests to spot some additional bugs in the ddl_log code, which could result in bad state between the .frm and partitions. Collapsed patch, includes all fixes required from the reviewers. @ mysql-test/r/partition_innodb.result updated result with new test @ mysql-test/suite/parts/inc/partition_crash.inc crash test include file @ mysql-test/suite/parts/inc/partition_crash_add.inc test all states in fast_alter_partition_table ADD PARTITION branch @ mysql-test/suite/parts/inc/partition_crash_change.inc test all states in fast_alter_partition_table CHANGE PARTITION branch @ mysql-test/suite/parts/inc/partition_crash_drop.inc test all states in fast_alter_partition_table DROP PARTITION branch @ mysql-test/suite/parts/inc/partition_fail.inc recovery test including injecting errors @ mysql-test/suite/parts/inc/partition_fail_add.inc test all states in fast_alter_partition_table ADD PARTITION branch @ mysql-test/suite/parts/inc/partition_fail_change.inc test all states in fast_alter_partition_table CHANGE PARTITION branch @ mysql-test/suite/parts/inc/partition_fail_drop.inc test all states in fast_alter_partition_table DROP PARTITION branch @ mysql-test/suite/parts/inc/partition_mgm_crash.inc include file that runs all crash and failure injection tests. @ mysql-test/suite/parts/r/partition_debug_innodb.result new test result file @ mysql-test/suite/parts/r/partition_debug_myisam.result new test result file @ mysql-test/suite/parts/r/partition_special_innodb.result updated result @ mysql-test/suite/parts/r/partition_special_myisam.result updated result @ mysql-test/suite/parts/t/partition_debug_innodb-master.opt opt file for using with crashing tests of partitioned innodb @ mysql-test/suite/parts/t/partition_debug_innodb.test partitioned innodb test that require debug builds @ mysql-test/suite/parts/t/partition_debug_myisam-master.opt opt file for using with crashing tests of partitioned myisam @ mysql-test/suite/parts/t/partition_debug_myisam.test partitioned myisam test that require debug builds @ mysql-test/suite/parts/t/partition_special_innodb-master.opt added innodb-file-per-table to easier verify partition status on disk @ mysql-test/suite/parts/t/partition_special_innodb.test added test case @ mysql-test/suite/parts/t/partition_special_myisam.test added test case @ mysql-test/t/partition_innodb.test added test case @ sql/sql_base.cc Moved alter_close_tables to sql_partition.cc @ sql/sql_base.h removed some non existing and duplicated functions. @ sql/sql_partition.cc fast_alter_partition_table: Spletted abort_and_upgrad_lock_and_close_table to its parts (wait_while_table_is_used and alter_close_tables) and always have wait_while_table_is_used before any persistent operations (including logs, which will be executed on failure) and alter_close_tables after create/read/write operations and before drop operations. moved alter_close_tables here from sql_base.cc Added error injections for better test coverage. write_log_final_change_partition: fixed a log_entry linking bug (delete_frm was not linked to change/drop partition) and drop partition must be executed before change partition (change partition can rename a partition to an old name, like REORG p1 INTO (p1,p2). write_log_add_change_partition: need to use drop_frm first, and relinking that entry and reusing its execute entry. @ sql/sql_table.cc added initialization of next_active_log_entry. @ sql/table.h removed a duplicate declaration.
[13 Aug 2010 10:27]
Mattias Jonsson
pushed into mysql-5.5-bugfixing, mysql-trunk-bugfixing and mysql-next-mr-bugfixing
[25 Aug 2010 9:23]
Bugs System
Pushed into mysql-5.5 5.5.6-m3 (revid:alik@ibmvm-20100825092002-2yvkb3iwu43ycpnm) (version source revid:alik@ibmvm-20100825092002-2yvkb3iwu43ycpnm) (merge vers: 5.5.6-m3) (pib:20)
[26 Aug 2010 10:44]
Jon Stephens
Documented in the 5.5.6 changelog as follows:
ALTER TABLE ... ADD PARTITION run concurrently with a read lock caused
spurious ER_TABLE_EXISTS_ERROR and ER_NO_SUCH_TABLE errors on subsequent
attempts.
Set NM, waiting for push to trunk.
[30 Aug 2010 8:30]
Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@sun.com-20100830082732-n2eyijnv86exc5ci) (version source revid:alik@sun.com-20100830082732-n2eyijnv86exc5ci) (merge vers: 5.6.1-m4) (pib:21)
[30 Aug 2010 8:33]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100830082745-n6sh01wlwh3itasv) (version source revid:alik@sun.com-20100830082745-n6sh01wlwh3itasv) (pib:21)
[30 Aug 2010 11:16]
Jon Stephens
Also documented in the 5.6.1 changelog. Closed.
[2 Sep 2010 14:19]
Jon Stephens
No new changelog entry required. Returned to Closed state.
[15 Oct 2010 15:59]
Mattias Jonsson
marked bug#51042 as duplicate.

Description: One connection sets a READ lock on a partitioned table (partitioned by hash), and a concurrent connection attempts to perform ALTER .. ADD PARTITION on the table. The first time the query finishes with ER_LOCK_WAIT_TIMEOUT as expected, the second attempt does not wait for a lock and fails with ER_TABLE_EXISTS_ERROR (saying that a partition already exists). Then the first connection unlocks the table and locks it again, after that the next attempt of the second connection to add a partition ends with ER_NO_SUCH_TABLE, and the same diagnostics is shown by CHECK TABLE. This test flow is provided as Test1 in 'How to repeat' section. In some variations of this test case ER_NO_SUCH_TABLE becomes persistent -- even new connections trying to access the table keep getting the same error. One of such variations is provided as Test2. However, this effect is sporadic, for the same test executed several times on the same box, CHECK TABLE in a new connection can sometimes return an error and sometimes give 'OK' status. Could not reproduce with 5.1 or 5.5.2, as the first attempt to add a partition keeps waiting for a lock, no timeout there. How to repeat: # Test1 --source include/have_innodb.inc --disable_abort_on_error --connect (con1,localhost,root,,) --disable_warnings DROP TABLE IF EXISTS t1; --enable_warnings CREATE TABLE t1 ( i INT NOT NULL AUTO_INCREMENT PRIMARY KEY, f INT ) ENGINE = InnoDB PARTITION BY HASH(i) PARTITIONS 2; --connect (con2,localhost,root,,) SET lock_wait_timeout = 2; --connection con1 --echo #Connection 1 locks the table LOCK TABLE t1 READ; --connection con2 --echo # Connection 2 tries to add partitions: --echo # First attempt: lock wait timeout (as expected) ALTER TABLE t1 ADD PARTITION PARTITIONS 2; --echo # Second attempt: says that partition already exists ALTER TABLE t1 ADD PARTITION PARTITIONS 2; --connection con1 --echo # Connection 1 unlocks the table and locks it again: UNLOCK TABLES; --real_sleep 1 LOCK TABLE t1 READ; --connection con2 --echo # Connection 2 tries again to add partitions: --echo # Third attempt: says that the table does not exist ALTER TABLE t1 ADD PARTITION PARTITIONS 2; --echo # Check table returns the same CHECK TABLE t1; --connection con1 UNLOCK TABLES; # End of Test1 -------------- # Test2 --source include/have_innodb.inc --disable_abort_on_error --connect (con1,localhost,root,,) --disable_warnings DROP TABLE IF EXISTS t1; --enable_warnings CREATE TABLE t1 ( i INT NOT NULL AUTO_INCREMENT PRIMARY KEY, f INT ) ENGINE = InnoDB PARTITION BY HASH(i) PARTITIONS 2; --connect (con2,localhost,root,,) SET lock_wait_timeout = 2; --connection con1 LOCK TABLE t1 READ; --connection con2 ALTER TABLE t1 ADD PARTITION PARTITIONS 2; send ALTER TABLE t1 ADD PARTITION PARTITIONS 2; --connection con1 UNLOCK TABLES; --connection con2 --reap --connect (con3,localhost,root,,) CHECK TABLE t1; SELECT * FROM t1; # End of Test2