Bug #50561 | ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query | ||
---|---|---|---|
Submitted: | 23 Jan 2010 1:12 | Modified: | 14 Oct 2010 12:58 |
Reporter: | Mattias Jonsson | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Partitions | Severity: | S2 (Serious) |
Version: | 5.1+ | OS: | Any |
Assigned to: | Mattias Jonsson | CPU Architecture: | Any |
[23 Jan 2010 1:12]
Mattias Jonsson
[25 Jan 2010 10:06]
Sveta Smirnova
Thank you for the report. Which result should be after test case run?
[26 Jan 2010 16:01]
Mattias Jonsson
Apply this diff to a mysql-5.1(-bugteam) tree
Attachment: b50561.how_to_repeat.diff (application/octet-stream, text), 2.28 KiB.
[26 Jan 2010 16:02]
Mattias Jonsson
unpack these test files under mysql-test and run ./mtr b50561
Attachment: b50561-test.tgz (application/x-gzip, text), 1.33 KiB.
[26 Jan 2010 16:08]
Mattias Jonsson
Then test with the current behavior (i.e allow I_S to ignore flush and name locks) by changing '#if 1' to '#if 0' in sql/sql_show.cc (see diff). Then re-run the test and you should get: 1) t1#P#p0#TMP#.ibd 'left-over' file in the test db 2) 11,22 and 33 as duplicate rows ! And this in the error log: 100126 19:00:09 [ERROR] Failed to open table test/t1#P#p10 after 10 attemtps. 100126 19:00:09 [ERROR] Cannot find or open table test/t1#P#p10 from the internal data dictionary of InnoDB though the .frm file for the table exists. Maybe you have deleted and recreated InnoDB data files but have forgotten to delete the corresponding .frm files of InnoDB tables, or you have moved .frm files to another database? or, the table contains indexes that this version of the engine doesn't support. See http://dev.mysql.com/doc/refman/5.1/en/innodb-troubleshooting.html how you can resolve the problem. 100126 19:00:09 InnoDB: Warning: MySQL is trying to drop table `test`.`t1` /* Partition `p0` */ InnoDB: though there are still open handles to it. InnoDB: Adding the table to the background drop queue. 100126 19:00:10 InnoDB: Error; possible reasons: InnoDB: 1) Table rename would cause two FOREIGN KEY constraints InnoDB: to have the same internal name in case-insensitive comparison. InnoDB: 2) table `test`.`t1` /* Partition `p0` */ exists in the InnoDB internal data InnoDB: dictionary though MySQL is trying to rename table `test`.`t1` /* Temporary Partition `p0` */ to it. InnoDB: Have you deleted the .frm file and not used DROP TABLE? InnoDB: You can look for further help from InnoDB: http://dev.mysql.com/doc/refman/5.1/en/innodb-troubleshooting.html InnoDB: If table `test`.`t1` /* Partition `p0` */ is a temporary table #sql..., then it can be that InnoDB: there are still queries running on the table, and it will be InnoDB: dropped automatically when the queries end. InnoDB: You can drop the orphaned table inside InnoDB by InnoDB: creating an InnoDB table with the same name in another InnoDB: database and copying the .frm file to the current database. InnoDB: Then MySQL thinks the table exists, and DROP TABLE will InnoDB: succeed.
[27 Jan 2010 6:54]
Sveta Smirnova
Thank you for the report. Verified as described.
[8 Feb 2010 11:28]
Mattias Jonsson
I will try to fix the test case from bug#47343 which will fail after the MDL implementation has been pushed and probably add a DBUG_EXECUTE_IF to return failure instead to simulate error in the storage engine.
[10 Feb 2010 12:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/99810 3332 Mattias Jonsson 2010-02-10 Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Problem was the mutex LOCK_open was released while handling the frm-file and renaming partitions. Which allowed the table to be opened with MYSQL_LOCK_IGNORE_FLUSH flag resulting in renaming a partition that was already in use, which could cause the table to be unusable. Solution was to hold LOCK_open from the time of waiting on all other instances of the table to be closed until all renaming operations are done. @ mysql-test/suite/parts/r/partition_debug_sync.result Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query New test result file @ mysql-test/suite/parts/t/partition_debug_sync.test Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query New test file @ sql/ha_partition.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added two DEBUG_SYNC point to be able to verify test Minor spelling correction @ sql/ha_partition.h Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Minor spelling correction @ sql/mysql_priv.h Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Modified wait_while_table_is_used to also be able to check if the thread is killed Added WFRM_HOLDING_LOCK for signalling that LOCK_open is held when calling mysql_write_frm @ sql/sql_base.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added one DEBUG_SYNC point. Updated the wait_while_table_is_used calls. Removed abort_and_upgrade_lock, since partitioning now uses the underlying calls directly, as mysql_alter_table does. Removed close_open_tables_and_downgrade, since that was dead code. (Note that mysql_lock_downgrade_write still exists unused!) @ sql/sql_partition.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Changed the flow in fast_alter_partition_table take LOCK_open, wait_while_table_is_used, close_data_file_and_morph_locks directly instead of relying on wrapper functions, since LOCK_open must be held during this critical section. Removed alter_close_tables, abort_and_upgrade_lock since they was only used in fast_alter_partition_table and is no longer used. @ sql/sql_show.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added one DEBUG_SYNC point. @ sql/sql_table.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added a flag to mysql_write_frm to be able to signal if LOCK_open already is held, or if it is needed to be taken. Added a flag to wait_while_table_is_used to allow the function to be used by partitioning.
[17 Feb 2010 8:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/100579 3332 Mattias Jonsson 2010-02-17 Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Problem was the mutex LOCK_open was released while handling the frm-file and renaming partitions. Which allowed the table to be opened with MYSQL_LOCK_IGNORE_FLUSH flag resulting in renaming a partition that was already in use, which could cause the table to be unusable. Solution was to hold LOCK_open from the time of waiting on all other instances of the table to be closed until all renaming operations are done. (Update: removed ifdef dependency for include of debug_sync.h) @ mysql-test/suite/parts/r/partition_debug_sync.result Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query New test result file @ mysql-test/suite/parts/t/partition_debug_sync.test Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query New test file @ sql/ha_partition.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added two DEBUG_SYNC point to be able to verify test Minor spelling correction @ sql/ha_partition.h Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Minor spelling correction @ sql/mysql_priv.h Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Modified wait_while_table_is_used to also be able to check if the thread is killed Added WFRM_HOLDING_LOCK for signalling that LOCK_open is held when calling mysql_write_frm @ sql/sql_base.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added one DEBUG_SYNC point. Updated the wait_while_table_is_used calls. Removed abort_and_upgrade_lock, since partitioning now uses the underlying calls directly, as mysql_alter_table does. Removed close_open_tables_and_downgrade, since that was dead code. (Note that mysql_lock_downgrade_write still exists unused!) @ sql/sql_partition.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Changed the flow in fast_alter_partition_table take LOCK_open, wait_while_table_is_used, close_data_file_and_morph_locks directly instead of relying on wrapper functions, since LOCK_open must be held during this critical section. Removed alter_close_tables, abort_and_upgrade_lock since they was only used in fast_alter_partition_table and is no longer used. @ sql/sql_show.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added one DEBUG_SYNC point. @ sql/sql_table.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added a flag to mysql_write_frm to be able to signal if LOCK_open already is held, or if it is needed to be taken. Added a flag to wait_while_table_is_used to allow the function to be used by partitioning.
[16 Mar 2010 23:01]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/103516 3404 Mattias Jonsson 2010-03-17 Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query There were two problem: 1) MYSQL_LOCK_IGNORE_FLUSH also ignored name locks 2) there was a race between abort_and_upgrade_locks and alter_close_tables (i.e. remove_table_from_cache and close_data_files_and_morph_locks) Which allowed the table to be opened with MYSQL_LOCK_IGNORE_FLUSH flag resulting in renaming a partition that was already in use, which could cause the table to be unusable. Solution was to not release the LOCK_open mutex in abort_and_upgrade_locks, and not take it again in alter_close_tables (i.e. keep it locked). And to not allow IGNORE_FLUSH to skip waiting for a named locked table. @ mysql-test/suite/parts/r/partition_debug_sync_innodb.result Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added test result @ mysql-test/suite/parts/t/partition_debug_sync_innodb-master.opt Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added test option @ mysql-test/suite/parts/t/partition_debug_sync_innodb.test Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added test file @ sql/authors.h Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Time to be acknowledged :) @ sql/ha_partition.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added DEBUG_SYNC for deterministic testing @ sql/sql_base.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Changed MYSQL_LOCK_IGNORE_FLUSH to not ignore name locks Do not release LOCK_open in abort_and_upgrade_locks (to be released in alter_close_tables) Added DEBUG_SYNC for deterministic testing @ sql/sql_partition.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Do not take LOCK_open in alter_close_tables, (since it now is never released in abort_and_upgrade_locks). @ sql/sql_show.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added DEBUG_SYNC for deterministic testing
[16 Mar 2010 23:04]
Mattias Jonsson
New patch proposed, davi agreed to look at the locking part of the patch and correct it if needed. dlenev had some worries about deadlocks if IGNORE_FLUSH did not ignore name locks too if I understood him correctly.
[17 Mar 2010 13:00]
Mattias Jonsson
merging the two functions abort_and_upgrade_lock and alter_close_tables into one function, so that LOCK_open is not taken in one function and released in another... Also trying to create a test case where the LOCK_open glitch can be verified. (the code line for only ignore flush and not open_placeholder does fix the reported test case).
[17 Mar 2010 14:11]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/103601 3404 Mattias Jonsson 2010-03-17 Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query There were two problem: 1) MYSQL_LOCK_IGNORE_FLUSH also ignored name locks 2) there was a race between abort_and_upgrade_locks and alter_close_tables (i.e. remove_table_from_cache and close_data_files_and_morph_locks) Which allowed the table to be opened with MYSQL_LOCK_IGNORE_FLUSH flag resulting in renaming a partition that was already in use, which could cause the table to be unusable. Solution was to not allow IGNORE_FLUSH to skip waiting for a named locked table. And to not release the LOCK_open mutex between the calls to remove_table_from_cache and close_data_files_and_morph_locks by merging the functions abort_and_upgrade_locks and alter_close_tables. @ mysql-test/suite/parts/r/partition_debug_sync_innodb.result Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added test result @ mysql-test/suite/parts/t/partition_debug_sync_innodb-master.opt Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added test option @ mysql-test/suite/parts/t/partition_debug_sync_innodb.test Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added test file @ sql/authors.h Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Time to be acknowledged :) @ sql/ha_partition.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added DEBUG_SYNC for deterministic testing @ sql/mysql_priv.h Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Renamed function since merging alter_close_tables into abort_and_upgrade_lock. @ sql/sql_base.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Changed MYSQL_LOCK_IGNORE_FLUSH to not ignore name locks (open_placeholder). Merged alter_close_tables into abort_and_upgrade_locks (and added _and_close_table to the name) to not release LOCK_open between remove_table_from_cache and close_data_files_and_morph_locks. Added DEBUG_SYNC for deterministic testing. @ sql/sql_partition.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Removed alter_close_tables, (merged it into abort_and_upgrad_lock) so that LOCK_open never is released between remove_table_from_cache and close_data_files_and_morph_locks. @ sql/sql_show.cc Bug#50561: ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query Added DEBUG_SYNC for deterministic testing
[18 Mar 2010 13:05]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/103693 3405 Mattias Jonsson 2010-03-18 Additional fix for DEBUG_SYNC which failed for some rpl-tests, due to DBUG_ASSERT. (added in bug#50561) @ sql/sql_base.cc DEBUG_SYNC asserts that thd->debug_sync_control is set.
[10 May 2010 13:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/107845 4055 Mattias Jonsson 2010-05-10 [merge] Manual merge of bug#50561 into mysql-pe.
[10 May 2010 13:47]
Mattias Jonsson
Pushed to mysql-pe and mysql-5.1-bugteam
[28 May 2010 6:00]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:alik@sun.com-20100512070920-xgpmqeytp0gc183c) (pib:16)
[28 May 2010 6:29]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:alik@sun.com-20100514054548-91z72f0mcskr84kj) (merge vers: 6.0.14-alpha) (pib:16)
[28 May 2010 6:57]
Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:alexey.kopytov@sun.com-20100511160250-pevxq2uoey47dw1f) (merge vers: 5.5.5-m3) (pib:16)
[1 Jun 2010 16:59]
Jon Stephens
Documented bugfix in the 5.5.5 and 6.0.14 changelogs as follows: ALTER TABLE statements that cause partitions of InnoDB tables to be renamed or dropped (such as ALTER TABLE ... ADD PARTITION, ALTER TABLE ... DROP PARTITION, and ALTER TABLE ... REORGANIZE PARTITION) -- when run concurrently with queries against the INFORMATION_SCHEMA.PARTITIONS table -- could fail, cause the affected partitioned tables to become unusable, or both. This was due to the fact that the INFORMATION_SCHEMA database ignored the name lock imposed by the ALTER TABLE statement on the partitions affected. In particular, this led to problems with InnoDB tables, because InnoDB would accept the rename operation, but put it in a background queue, so that subsequent rename operations failed when InnoDB was unable to find the correct partition. Now, INFORMATION_SCHEMA honors name locks imposed by ongoing ALTER TABLE statements that cause partitions to be renamed or dropped. See also BUG#45808 and BUG#47343. Status = NM -- waiting for push to 5.1 tree.
[2 Jun 2010 8:50]
Bugs System
Pushed into 5.1.48 (revid:georgi.kodinov@oracle.com-20100602084411-2yu607bslbmgufl3) (version source revid:mattias.jonsson@sun.com-20100510131706-wfci7c0gb7072wwa) (merge vers: 5.1.47) (pib:16)
[2 Jun 2010 12:31]
Jon Stephens
Also documented in the 5.1.48 changelog. Closed.
[6 Sep 2010 11:16]
John Embretsen
This bugfix is suggested to be the cause of the regression reported as Bug#56541.
[14 Oct 2010 8:30]
Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (version source revid:vasil.dimov@oracle.com-20100513074652-0cvlhgkesgbb2bfh) (merge vers: 5.5.5-m3) (pib:21)
[14 Oct 2010 8:45]
Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.39 (revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (version source revid:vasil.dimov@oracle.com-20100513074652-0cvlhgkesgbb2bfh) (merge vers: 5.5.5-m3) (pib:21)
[14 Oct 2010 8:59]
Bugs System
Pushed into mysql-5.1-telco-6.2 5.1.51-ndb-6.2.19 (revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (version source revid:vasil.dimov@oracle.com-20100513074652-0cvlhgkesgbb2bfh) (merge vers: 5.5.5-m3) (pib:21)
[14 Oct 2010 12:58]
Jon Stephens
Already documented in 5.1.48 changelog; no new changelog entries required. Setting back to Closed.