MySQL Bugs: #20814: Using LOCK (MyISAM) causes table corruption and data loss

Bug #20814	Using LOCK (MyISAM) causes table corruption and data loss
Submitted:	2 Jul 2006 15:04	Modified:	11 Feb 2007 21:23
Reporter:	[ name withheld ]	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: MyISAM storage engine	Severity:	S1 (Critical)
Version:	5.0.22	OS:	Linux (Linux Suse ES9 )
Assigned to:		CPU Architecture:	Any

Description:
Using LOCK seems to corrupt db tables as number of connections becomes high.
All below tables are MyISAM.

Mysql 5.0.22 build used for Suse 9 is the one provided on MySQL site.

Suse ES9 kernel: 2.6.5-7.191-bigs

How to repeat:
The scenario that brings correption to table I below within 1-3 hours is:

A number of connection run this:
LOCK TABLES I WRITE, J WRITE
-- sql on above
UNLOCK TABLES

With another number of connections trying to run this:
LOCK TABLES A WRITE, B READ, J READ, I READ
-- sql on above
UNLOCK TABLES

And many other connections running sql queries that read or write from/to the above tables.

Within 1-3 hours (depending on number of connections, the higher the sooner) errors like this start appearing in the log of mysql server:

[ERROR] Got error 127 when reading table './DB/I'
One more error number appeared too sometimes, i think it was 130.

After running a repair on table I, the log gets rows like below:
Found 1723 of 1724 rows when repairing './DB/I'

After the repair, the DB is operational again for a few hours, until disaster hits again. However, the repair leads to data loss at times.

Unfortunately, we cannot provide a set of sql statements that reproduce this as it is very related to multiple connections from many machines (does not seem to occur with a few connections only).

We are near certain that LOCK is the culprit here for the table corruption, because we rewrote our application to not issue LOCK statements anymroe and have been operational for 2-3 weeks without issues.

This looks related to Bug #18544 but it looks that 5.0.22 has already taken the fix to that bug.

Suggested fix:
See above

Thank you for a problem report. Sorry, but we need a more detailed description of how to repeat/test case. Like that in bug #18544. Can you repeat the behaviour described with a small number of connections?

Unfortunately this happens intermittently and only when many connections are active (more than 100).

Send SHOW PROCESSLIST results then. Just to understand what all that connections are doing with MyISAM tables.

As desribed above, we removed the use of LOCK TABLES from our app.
The only way to reproduce this problem is to bring back the use of LOCK. We cannot do this anymore, as we will make our users very unhappy and also corrupt data.

What I can say is that SHOW PROCESSLIST would show queries that do selects joining data from two or more of the A, B, J, I tables; many of them would be in "lock" state as tables are LOCKed by the LOCK command use. Also many connections would try at the same time to issue the same "LOCK TABLES" statemement, as mentioned earlier.

If it helps, let me also add that we make use of a lot of "application locks" in our application, those obtained with GET_LOCK.

I understand this may not be a useful bug report (I do write software for a living, so I know that bugs that cannot be reproduced, cannot be that helpful).
I just wanted to bring to your (and users') attention that there is a problem when using LOCK with MyIsam in an environment where many simultaneous connections are occurring and it looks to be a rather serious one.

At the very least, just archiving this bug report may be useful for someone else who experiences a similar problem in the future.

The error 127 appears even after the removal of LOCK statements.
This occurs in exaclty the same table, once every 12-15 days with server restarting once a week. Memory is at 600MB when it happens (on a 4GB mem machine).

Some more info that could help:
- There are MEMORY tables joined with myisam tables in a few of the very frequent queries.
- The table which gets corrupted is a table that is small (in terms or row size) in comparion to others. It also receives very frequent UPDATE statements that update single integers or dates in a number of its rows.
- Application locks are used a lot in the application (GET_LOCK etc)
- DB is shrunk every week (many DELETE's) or so with optimise/analyse run after the srhink.

Please, try to repeat with a newer version, 5.0.24, just released. Do you have any FULLTEXT indexes on problematic tables? Have you checked your RAM for possible faults?

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

I couldn't ever repeat this bug.  Instead, I found a new bug with LOCK TABLES which I filed as bug #25966.

Perhaps older versions of the server didn't behave correctly during times of memory shortage and that introduced MyISAM corruption?

correction, please set thread_cache_size=0 in my.cnf and see if problems go away.  I suspect that could be a cause.