Bug #1983 INSERT "DELAYED" INTO x SELECT FROM y crashes index
Submitted: 29 Nov 2003 6:23 Modified: 17 Dec 2003 8:17
Reporter: Peter Lieven Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S1 (Critical)
Version:4.0.16 OS:Linux (Linux/x86)
Assigned to: Michael Widenius

[29 Nov 2003 6:23] Peter Lieven
Description:
I use mysql to store IP accounting information in our network. The table that holds the information looks like this:

CREATE TABLE `log_YYYYMM` (
  `ip` int(4) unsigned NOT NULL default '0',
  `date` int(4) unsigned NOT NULL default '0',
  `in` int(4) unsigned NOT NULL default '0',
  `out` int(4) unsigned NOT NULL default '0',
  PRIMARY KEY  (`ip`,`date`)
) TYPE=MyISAM;

The accounting information is collected by serveral machines which basically might not be trustable. To make sure only valid information gets into the master log, each of those machines has a sperate spool table of the same structure.

On the mysql server runs a script which runs every x minutes and copies data from each spool table to the master log.

This query is of the following structure:

INSERT INTO log_YYYYMM SELECT * FROM spool_xxx WHERE <condition>

<condition> checks that only those entries are read from spool_xxx which have a valid time and ip for the according machine to that spool table.

After this query is executed my script deletes all entries from spool_xxx table that are not longer needed. After this an OPTIMIZE TABLE statement for the spool_xxx table is run.

This construction worked fine for almost one year now.

Then one day I started to optimize some queries and when I did this I changed the INSERT query to an INSERT DELAYED query.

This at first look worked fine, but after a few hours the master log table log_YYYYMM crashed with a corrupted index.

A check table showed the following error:
"Found 14169593 keys of 14169590". I repaired the table, but soon the table crashed again. The error is always the same "Found x1 keys of x2" where x1 is always greater than x2.

I first did not realize, that that query was causing the crash and the server run almost 2 weeks with that problem. The log_YYYYMM crashed irregulary, sometimes a few times a day, other times i had to wait even a 2-3 days.

The data that is transferred from spool to log is almost the same amount of rows each time. The problem also seems not to depend on the number of rows in log_YYYYMM.

I checked back my notfication emails about corrupted tables and cross checked with changes to queries I made and I came across the idea that the crash could have been caused by the added "delayed" statement in the insert query.

It's almost two weeks now since I removed the "delayed" statement again and there was no crash on the table since then.

Any ideas?

Regards,
Peter

How to repeat:
?
[30 Nov 2003 3:37] Alexander Keremidarski
Can you please provide little more information to help us repeat the problem?

Probably my.cnf file will be usefull and also if you have any clues like if specific size of table, number of rows which seems relevant.

It would be great if you can repeat it with some simple script generating random data. This is what I am going to try now and see if I will succeed. 

Don't hesitate to go into details. 2 years ago I created very similar application collecting NetFlow data from Cisco routers and I used almost exatctly the same approach.
[30 Nov 2003 3:50] Alexander Keremidarski
No need to send any more information.

I was able to corrupt table with random data filled in spool_xxx file and now I am going to narrow down test case and check where exactly corruptiopn happens.

Thank you a lot about reporting this bug as well as for detailed description which was enough to repeat it.
[30 Nov 2003 4:41] Peter Lieven
Hi Alex,

thanks for the fast response. I was just setting up a test environment myself.
I stop here, but if you need any assistense please let me know.
[30 Nov 2003 7:27] Alexander Keremidarski
Some more information.

I was able to repeat it only when many rows are inserted. With 100 000 rows no corruption occurs while with 1M rows I get various different corruptions. (At some point I even got perror 22 which does not make much sense)

INSERT DELAYED INTO log_YYYYMM SELECT * FROM spool_xxx LIMIT 100000;
ERROR 1030: Got error 22 from table handler

It also seems that corruption is random. Starting with empty table sometimes it corrupts upon first INSERT, sometimes it needs 10 or more INSERTs before corruption. The more the rows are inserted the higher the chance for corruption.

Next weirdness is that table got corrupted at some point, but after that corrution disappears :( This should not happen when:

| myisam_recover_options | OFF   |

It pretty much looks like INSERT DELAYED ... SELECT ...; does not lock table it inserts into. What I observed below should not happen with locked table (first CHECK statement should not succeed at all, but wait for table to be unlocked)

with 2 session opened:

# Session 1
mysql> INSERT DELAYED INTO log_YYYYMM SELECT * FROM spool_xxx LIMIT 1000000;

# Session 2 (while Session 1 statement is in progress!!!):

mysql> check table log_YYYYMM;
+-----------------+-------+----------+---------------------------------------------------------------+
| Table           | Op    | Msg_type | Msg_text                      |
+-----------------+-------+----------+---------------------------------------------------------------+
| bugs.log_YYYYMM | check | warning  | Size of indexfile is: 2537472      Should be: 1024            |
| bugs.log_YYYYMM | check | warning  | Size of datafile is: 2097152       Should be: 0               |
| bugs.log_YYYYMM | check | error    | Wrong pagepointer: 3072 at page: 73728                      |
| bugs.log_YYYYMM | check | error    | Found key at page 1024 that points to record outside datafile |
| bugs.log_YYYYMM | check | error    | Corrupt                      |
+-----------------+-------+----------+---------------------------------------------------------------+

# IMHO this CHECK statement must wait until table is released!

# Session 1 statement ends with:

Query OK, 1000000 rows affected (29.49 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

mysql> check table log_YYYYMM;
+-----------------+-------+----------+----------------------------+
| Table           | Op    | Msg_type | Msg_text                   |
+-----------------+-------+----------+----------------------------+
| bugs.log_YYYYMM | check | warning  | Table is marked as crashed |
| bugs.log_YYYYMM | check | status   | OK                         |
+-----------------+-------+----------+----------------------------+

mysql> check table log_YYYYMM;
+-----------------+-------+----------+----------+
| Table           | Op    | Msg_type | Msg_text |
+-----------------+-------+----------+----------+
| bugs.log_YYYYMM | check | status   | OK       |
+-----------------+-------+----------+----------+

mysql> show variables like "myisam_recover%";
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| myisam_recover_options | OFF   |
+------------------------+-------+

Note above sequence!
CHECK first reports "Table is marked as crashed", but when running it for second time table is Ok!
[6 Dec 2003 8:14] Peter Lieven
Hi Alex,

since I don't heard from you almost a week now, is there anything I can you support with?

In my environment I add about 10.000 rows every 5 Minutes. This is enough to make corruption occur. But there are many processes accessing the table and a few other writing to the same table, so it could really be a lock problem.

My current version without the "delayed" has still not crashed.

If you need help I could try to set up a test case with various scripts and things accessing the table, but I think this will not make sense, or does it?

Regards,
Peter
[13 Dec 2003 19:06] Michael Widenius
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

The INSERT DELAYED only works with insert with value lists, not with INSERT ... SELECT.

The bug is that we try to lock the tables with an INSERT_DELAYED lock when one used INSERT DELAYED ... SELECT and this can cause a lot of strange problem as internally INSERT_DELAYED isn't a real lock.

I have now fixed this so that MySQL ignores DELAYED when one does INSERT DELAYED ...  SELECT

Regards,
Monty
[17 Dec 2003 8:17] Michael Widenius
The fix for this will be in 4.0.18
(Should have been in 4.0.17 but becasue of a type this didn't happen :( )
[26 Jan 2004 10:38] Matt Ryan
Is there any way to manually bypass this check?

I insert delayed 10,000 records every few hours, if I have to lock the table for every insert the customer definitly wont be happy.

In my senario, the bulk insert delayed select * from input table is the only source of new records, clients are not allowed to update these tables.