Bug #25700 merge base tables get corrupted by optimize/analyze/repair table
Submitted: 18 Jan 2007 18:32 Modified: 3 Dec 2007 10:34
Reporter: Shane Bester
Status: Closed
Category:Server: Merge Severity:S2 (Serious)
Version:5.0 OS:Any (*)
Assigned to: Ingo Strüwing Target Version:
Tags: bfsm_2007_02_15, corruption, Optimize, merge

[18 Jan 2007 18:32] Shane Bester
Description:
According to the documentation,
http://dev.mysql.com/doc/refman/5.0/en/merge-table-problems.html

we have to FLUSH tables before making changes to base tables mapped to an open merge
table.  

this limit is ridiculous, and it's not possible to always know for sure
whether somebody has even created a merge table.

subsequently, running 'mysqlcheck -o <db>' can cause widespread corruption.

How to repeat:
known problem that tables mapped to merges can get corrupted.
a proper testcase can be provided later, if needed.

Suggested fix:
allow the 'optimize/analyze/repair table' to wait until the base table is available.

alternate, but partial solution is to have mysqlcheck issue a LOCK TABLE x;FLUSH TABLE
x;OPTIMIZE TABLE x;FLUSH TABLE x;UNLOCK TABLES;
[18 Jan 2007 19:02] Shane Bester
I filed a separate bug to address the risk via documentation, until a proper solution is
found: http://bugs.mysql.com/bug.php?id=25647
[24 Jan 2007 14:39] Shane Bester
testcase for corruption. see top of file for gcc build instructions

Attachment: testcase.c (text/plain), 5.87 KiB.

[25 Jan 2007 12:57] Shane Bester
output of testcase on 5.0.36BK on suse9-x86:

sbester@linux:~> ./testcase 
running initializations..
completed spawning new database worker threads
testcase is now running, so watch for error output
query failed 'insert into `m1`(`c1`,`c2`,`c3`) values
('-1807816428','-1539112928','-2041529681')' : 126 (Incorrect key file for table
'./test/t1.MYI'; try to repair it)
query failed 'insert into `m1`(`c1`,`c2`,`c3`) values
('-1807816428','-1539112928','-2041529681')' : 126 (Incorrect key file for table
'./test/t1.MYI'; try to repair it)
[25 Jan 2007 14:51] Shane Bester
output of testcase on 4.1.23BK on suse9-x86

sbester@linux:~> ./testcase 
running initializations..
completed spawning new database worker threads
testcase is now running, so watch for error output
query failed 'insert into `m1`(`c1`,`c2`,`c3`) values
('-1807816428','-1539112928','-2041529681')' : 1015 (Can't lock file (errno: 9))
query failed 'insert into `m1`(`c1`,`c2`,`c3`) values
('-1807816428','-1539112928','-2041529681')' : 1015 (Can't lock file (errno: 9))
[3 Apr 2007 16:28] Ingo Strüwing
Shane, great testcase again! testcase.c showed two problems on my machine:

1. FLUSH TABLES in LOCK TABLES, which I will fix with Bug#26379,

2. Server crash in key cache, which I will fix with Bug#17332.

I ran testcase.c on a tree with the fix for Bug#26379 installed and added
"db_query(dbc,"SET GLOBAL key_buffer_size= 0",2);" to the initializations in testcase.c.
Then the program completed its test without any error.

But I'm not sure if this means that the subject of this bug report is fixed. It does only
mean that testcase.c will not be able to repeat a problem when the mentioned fixes are
applied. Another test will be required.

However, while typing this, Monty called me and suggested a different fix for Bug#26379.
This has the potential to also solve the subject of this bug report properly. Please stay
tuned.
[3 Apr 2007 18:45] Shane Bester
Ingo, I noticed that I had left out a FLUSH TABLES before the UNLOCK TABLES in the
testcase .  Does that cause this entire testcase to be invalid now ?
[3 Apr 2007 20:09] Ingo Strüwing
To be honest, I don't know. I do not even know exactly what the reason for the flushes is.
Monty said that table administration statements need exclusive acces to the table. No
other thread must even try to open the table. FLUSH TABLES cannot asure this. It does the
trick only if thereafter no new accesses on the MERGE table are attempted. A FLUSH TABLES
after the administration statement is probably not required.

When trying to access a MERGE table while an admimistration statement is running on a
sub-table, opening the table is not prevented in the current code. So your test can cause
problems with and without flushes.

My upcoming patch for Bug#26379 tries to prevent the open of a MERGE table if one of its
sub-tables is in an administration statement.
[16 Nov 2007 9:09] Ingo Strüwing
For the patch see Bug#26379 - Combination of FLUSH TABLE and REPAIR TABLE corrupts a MERGE
table.
[28 Nov 2007 11:35] Ingo Strüwing
For documentation see Bug#26379 - Combination of FLUSH TABLE and REPAIR TABLE corrupts a
MERGE table.
[3 Dec 2007 10:34] MC Brown
A note has been added to the 5.1.23 and 6.0.4 changelogs: 

Important Change: Incompatible Change: A number of problems existed in the implementation
of MERGE tables that could cause problems. The problems are summarized below:

Bug#26379 - Combination of FLUSH TABLE and REPAIR TABLE corrupts a MERGE table. This was
caused in a number of situations:

A thread trying to lock a MERGE table performs busy waiting while REPAIR TABLE or a
similar table administration task is ongoing on one or more of its MyISAM tables.

A thread trying to lock a MERGE table performs busy waiting until all threads that did
REPAIR TABLE or similar table administration tasks on one or more of its MyISAM tables in
LOCK TABLES segments do UNLOCK TABLES. The difference against problem #1 is that the busy
waiting takes place after the administration task. It is terminated by UNLOCK TABLES
only.

Two FLUSH TABLES within a LOCK TABLES segment can invalidate the lock. This does not
require a MERGE table. The first FLUSH TABLES can be replaced by any statement that
requires other threads to reopen the table. In 5.0 and 5.1 a single FLUSH TABLES can
provoke the problem.

Bug#26867 - Simultaneously executing LOCK TABLES and REPAIR TABLE on a MERGE table would
result in memory/cpu hogging.

Trying DML on a MERGE table, which has a child locked and repaired by another thread,
made an infinite loop in the server.

Bug#26377 - Deadlock with MERGE and FLUSH TABLE

Locking a MERGE table and its children in parent-child order and flushing the child
deadlocked the server.

Bug#25038 - Waiting TRUNCATE

Truncating a MERGE child, while the MERGE table was in use, let the truncate fail instead
of waiting for the table to become free.

Bug#25700 - MERGE base tables get corrupted by OPTIMIZE/ANALYZE/REPAIR TABLE

Repairing a child of an open MERGE table corrupted the child. It was necessary to FLUSH
the child first.

Bug#30275 - MERGE tables: FLUSH TABLES or UNLOCK TABLES causes server to crash.

Flushing and optimizing locked MERGE children crashed the server.

Bug#19627 - temporary merge table locking

Use of a temporary MERGE table with non-temporary children could corrupt the children.

Temporary tables are never locked. Creation of tables with non-temporary chidlren of a
temporary MERGE table is now prohibited.

Bug#27660 - Falcon: MERGE table possible

It was possible to create a MERGE table with non-MyISAM children.

Bug#30273 - MERGE tables: Can't lock file (errno: 155)

This was a Windows-only bug. Table administration statements sometimes failed with "Can't
lock file (errno: 155)".

The fix introduces the following changes in behavior:

This patch changes the behavior of temporary MERGE tables. Temporary MERGE must have
temporary children. The old behavior was wrong. A temporary table is not locked. Hence
even non-temporary children were not locked. See Bug#19627.

You cannot change the union list of a non-temporary MERGE table when LOCK TABLES is in
effect. The following does not work:

CREATE TABLE m1 ... ENGINE=MRG_MYISAM ...;
      LOCK TABLES t1 WRITE, t2 WRITE, m1 WRITE;
      ALTER TABLE m1 ... UNION=(t1,t2) ...;
However, you can do this with a temporary MERGE table.

You cannot create a MERGE table with CREATE ... SELECT, neither as a temporary MERGE
table, nor as a non-temporary MERGE table. For example:

CREATE TABLE m1 ... ENGINE=MRG_MYISAM ... SELECT ...;
Gives error message: table is not BASE TABLE.