Bug #25700 merge base tables get corrupted by optimize/analyze/repair table
Submitted: 18 Jan 2007 17:32 Modified: 3 Dec 2007 9:34
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Merge storage engine Severity:S2 (Serious)
Version:5.0 OS:Any (*)
Assigned to: Ingo Strüwing CPU Architecture:Any
Tags: bfsm_2007_02_15, corruption, merge, Optimize

[18 Jan 2007 17:32] Shane Bester
Description:
According to the documentation, http://dev.mysql.com/doc/refman/5.0/en/merge-table-problems.html

we have to FLUSH tables before making changes to base tables mapped to an open merge table.  

this limit is ridiculous, and it's not possible to always know for sure
whether somebody has even created a merge table.

subsequently, running 'mysqlcheck -o <db>' can cause widespread corruption.

How to repeat:
known problem that tables mapped to merges can get corrupted.
a proper testcase can be provided later, if needed.

Suggested fix:
allow the 'optimize/analyze/repair table' to wait until the base table is available.

alternate, but partial solution is to have mysqlcheck issue a LOCK TABLE x;FLUSH TABLE x;OPTIMIZE TABLE x;FLUSH TABLE x;UNLOCK TABLES;
[18 Jan 2007 18:02] MySQL Verification Team
I filed a separate bug to address the risk via documentation, until a proper solution is found: http://bugs.mysql.com/bug.php?id=25647
[24 Jan 2007 13:39] MySQL Verification Team
testcase for corruption. see top of file for gcc build instructions

Attachment: testcase.c (text/plain), 5.87 KiB.

[25 Jan 2007 11:57] MySQL Verification Team
output of testcase on 5.0.36BK on suse9-x86:

sbester@linux:~> ./testcase 
running initializations..
completed spawning new database worker threads
testcase is now running, so watch for error output
query failed 'insert into `m1`(`c1`,`c2`,`c3`) values ('-1807816428','-1539112928','-2041529681')' : 126 (Incorrect key file for table './test/t1.MYI'; try to repair it)
query failed 'insert into `m1`(`c1`,`c2`,`c3`) values ('-1807816428','-1539112928','-2041529681')' : 126 (Incorrect key file for table './test/t1.MYI'; try to repair it)
[25 Jan 2007 13:51] MySQL Verification Team
output of testcase on 4.1.23BK on suse9-x86

sbester@linux:~> ./testcase 
running initializations..
completed spawning new database worker threads
testcase is now running, so watch for error output
query failed 'insert into `m1`(`c1`,`c2`,`c3`) values ('-1807816428','-1539112928','-2041529681')' : 1015 (Can't lock file (errno: 9))
query failed 'insert into `m1`(`c1`,`c2`,`c3`) values ('-1807816428','-1539112928','-2041529681')' : 1015 (Can't lock file (errno: 9))
[3 Apr 2007 14:28] Ingo Strüwing
Shane, great testcase again! testcase.c showed two problems on my machine:

1. FLUSH TABLES in LOCK TABLES, which I will fix with Bug#26379,

2. Server crash in key cache, which I will fix with Bug#17332.

I ran testcase.c on a tree with the fix for Bug#26379 installed and added "db_query(dbc,"SET GLOBAL key_buffer_size= 0",2);" to the initializations in testcase.c. Then the program completed its test without any error.

But I'm not sure if this means that the subject of this bug report is fixed. It does only mean that testcase.c will not be able to repeat a problem when the mentioned fixes are applied. Another test will be required.

However, while typing this, Monty called me and suggested a different fix for Bug#26379. This has the potential to also solve the subject of this bug report properly. Please stay tuned.
[3 Apr 2007 16:45] MySQL Verification Team
Ingo, I noticed that I had left out a FLUSH TABLES before the UNLOCK TABLES in the testcase .  Does that cause this entire testcase to be invalid now ?
[3 Apr 2007 18:09] Ingo Strüwing
To be honest, I don't know. I do not even know exactly what the reason for the flushes is. Monty said that table administration statements need exclusive acces to the table. No other thread must even try to open the table. FLUSH TABLES cannot asure this. It does the trick only if thereafter no new accesses on the MERGE table are attempted. A FLUSH TABLES after the administration statement is probably not required.

When trying to access a MERGE table while an admimistration statement is running on a sub-table, opening the table is not prevented in the current code. So your test can cause problems with and without flushes.

My upcoming patch for Bug#26379 tries to prevent the open of a MERGE table if one of its sub-tables is in an administration statement.
[16 Nov 2007 8:09] Ingo Strüwing
For the patch see Bug#26379 - Combination of FLUSH TABLE and REPAIR TABLE corrupts a MERGE table.
[28 Nov 2007 10:35] Ingo Strüwing
For documentation see Bug#26379 - Combination of FLUSH TABLE and REPAIR TABLE corrupts a MERGE table.
[3 Dec 2007 9:34] MC Brown
A note has been added to the 5.1.23 and 6.0.4 changelogs: 

Important Change: Incompatible Change: A number of problems existed in the implementation of MERGE tables that could cause problems. The problems are summarized below:

Bug#26379 - Combination of FLUSH TABLE and REPAIR TABLE corrupts a MERGE table. This was caused in a number of situations:

A thread trying to lock a MERGE table performs busy waiting while REPAIR TABLE or a similar table administration task is ongoing on one or more of its MyISAM tables.

A thread trying to lock a MERGE table performs busy waiting until all threads that did REPAIR TABLE or similar table administration tasks on one or more of its MyISAM tables in LOCK TABLES segments do UNLOCK TABLES. The difference against problem #1 is that the busy waiting takes place after the administration task. It is terminated by UNLOCK TABLES only.

Two FLUSH TABLES within a LOCK TABLES segment can invalidate the lock. This does not require a MERGE table. The first FLUSH TABLES can be replaced by any statement that requires other threads to reopen the table. In 5.0 and 5.1 a single FLUSH TABLES can provoke the problem.

Bug#26867 - Simultaneously executing LOCK TABLES and REPAIR TABLE on a MERGE table would result in memory/cpu hogging.

Trying DML on a MERGE table, which has a child locked and repaired by another thread, made an infinite loop in the server.

Bug#26377 - Deadlock with MERGE and FLUSH TABLE

Locking a MERGE table and its children in parent-child order and flushing the child deadlocked the server.

Bug#25038 - Waiting TRUNCATE

Truncating a MERGE child, while the MERGE table was in use, let the truncate fail instead of waiting for the table to become free.

Bug#25700 - MERGE base tables get corrupted by OPTIMIZE/ANALYZE/REPAIR TABLE

Repairing a child of an open MERGE table corrupted the child. It was necessary to FLUSH the child first.

Bug#30275 - MERGE tables: FLUSH TABLES or UNLOCK TABLES causes server to crash.

Flushing and optimizing locked MERGE children crashed the server.

Bug#19627 - temporary merge table locking

Use of a temporary MERGE table with non-temporary children could corrupt the children.

Temporary tables are never locked. Creation of tables with non-temporary chidlren of a temporary MERGE table is now prohibited.

Bug#27660 - Falcon: MERGE table possible

It was possible to create a MERGE table with non-MyISAM children.

Bug#30273 - MERGE tables: Can't lock file (errno: 155)

This was a Windows-only bug. Table administration statements sometimes failed with "Can't lock file (errno: 155)".

The fix introduces the following changes in behavior:

This patch changes the behavior of temporary MERGE tables. Temporary MERGE must have temporary children. The old behavior was wrong. A temporary table is not locked. Hence even non-temporary children were not locked. See Bug#19627.

You cannot change the union list of a non-temporary MERGE table when LOCK TABLES is in effect. The following does not work:

CREATE TABLE m1 ... ENGINE=MRG_MYISAM ...;
      LOCK TABLES t1 WRITE, t2 WRITE, m1 WRITE;
      ALTER TABLE m1 ... UNION=(t1,t2) ...;
However, you can do this with a temporary MERGE table.

You cannot create a MERGE table with CREATE ... SELECT, neither as a temporary MERGE table, nor as a non-temporary MERGE table. For example:

CREATE TABLE m1 ... ENGINE=MRG_MYISAM ... SELECT ...;
Gives error message: table is not BASE TABLE.