Bug #10437 After large delete repair table causes data loss!!!
Submitted: 7 May 2005 15:11 Modified: 25 Sep 2005 8:33
Reporter: Greg Whalin Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:4.1.11 OS:Linux (Linux 2.4.30)
Assigned to: CPU Architecture:Any

[7 May 2005 15:11] Greg Whalin
Description:
Background:
We have a large table that needs to be archived and then cleaned monthly.  Recently, due to performance and stability problems on Opteron w/ Fedora Core2 (2.6 kernel), we rolled back to Core1 to use a 2.4 kernel.  In doing so, and reading other bug reports here, we found we needed to disable NPTL threading in order to run w/ this config.

After archiving the data (simple select into another table where date is < the 1st of the month), we then needed to clean the data from the table.  We did this by running DELETE FROM email WHERE ctime < '2005-05-01 00:00:00' (this was run yesterday the 6th of the month).  This left 165k rows (aprox) in the table.  This number was verified on all replicants (others are Xeon machines).  And in fact, when we did a SELECT COUNT(*) FROM email;, we indeed saw the correct row count.  We then ran REPAIR TABLE email; (as we usually do after a large delete).  After churning for longer than expected, it came back w/ a message claiming that the row count in the index was wrong (165k) and that it was dropping it to 0.  Sure enough, a SELECT COUNT(*) now shows 0 rows.  Examining the data file on disk, it was 12k.   This exact same scenario played out on our other Opteron + 2.4 kernel box.  Luckily, we had our Xeon replicants around so I was able to recover the missing rows.

How to repeat:
Our table:

| email |CREATE TABLE `email` (
  `email_id` int(16) NOT NULL auto_increment,
  `member_id` int(16) NOT NULL default '0',
  `member_type` varchar(32) NOT NULL default 'member',
  `member_from` varchar(64) NOT NULL default 'info@meetup.com',
  `member_replyto` varchar(64) NOT NULL default '',
  `ip` varchar(16) NOT NULL default '0.0.0.0',
  `recip_id` int(16) NOT NULL default '0',
  `recip_type` varchar(16) NOT NULL default 'member',
  `recip_subset` varchar(32) NOT NULL default '',
  `recip_count` int(16) NOT NULL default '0',
  `cc_member_id` int(16) NOT NULL default '0',
  `subject` varchar(255) NOT NULL default '',
  `body` text NOT NULL,
  `status` int(2) NOT NULL default '0',
  `interpolate` int(2) NOT NULL default '0',
  `handler` varchar(64) NOT NULL default '',
  `mtime` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  `ctime` timestamp NOT NULL default '2002-06-14 15:00:00',
  `stime` datetime default NULL,
  `etime` datetime default NULL,
  `dtime` datetime default NULL,
  `priority` int(2) NOT NULL default '1',
  PRIMARY KEY  (`email_id`),
  KEY `member_id` (`member_id`,`recip_type`),
  KEY `recip_type` (`recip_type`,`recip_id`),
  KEY `member_type` (`member_type`,`recip_type`,`recip_id`),
  KEY `status_priority` (`status`,`priority`,`ctime`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

SQL: DELETE FROM email WHERE ctime < '2005-05-01 00:00:00';  REPAIR TABLE email;

This deletes all rows from the db, even though there were 165k rows created (and indeed had a ctime > 2005-05-01).

Suggested fix:
No idea?  You tell me.  This is a frightening bug as it causes complete data loss.
[16 May 2005 23:22] Greg Whalin
Just confirmed this bug again using custom kernel build (2.4.30) w/ no NPTL enabled.  Also had recompiled mysql.  I am going to try mysql 4.1.12 now to see if still happening in that release.  Here is terminal output from my test:

mysql> delete from email where ctime < '2005-05-16 00:00:00';
Query OK, 478484 rows affected (1 min 48.07 sec)

mysql> select count(*) from email;
+----------+
| count(*) |
+----------+
|   101273 |
+----------+
1 row in set (0.00 sec)

mysql> repair table email;
+-----------------+--------+----------+-----------------------------------------+
| Table           | Op     | Msg_type | Msg_text                                |
+-----------------+--------+----------+-----------------------------------------+
| chapstick.email | repair | warning  | Number of rows changed from 101277 to 0 |
| chapstick.email | repair | status   | OK                                      |
+-----------------+--------+----------+-----------------------------------------+
2 rows in set (0.72 sec)

mysql> select count(*) from email;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql> 

gw
[17 May 2005 0:27] Greg Whalin
I just ran 4.1.12 and same thing ....

mysql> select count(*) from email;
+----------+
| count(*) |
+----------+
|   578709 |
+----------+
1 row in set (0.03 sec)

mysql> delete from email where ctime < '2005-05-16 00:00:00';
Query OK, 478484 rows affected (38.37 sec)

mysql> select count(*) from email;
+----------+
| count(*) |
+----------+
|   100225 |
+----------+
1 row in set (0.00 sec)

mysql> repair table email;
+-----------------+--------+----------+-----------------------------------------+
| Table           | Op     | Msg_type | Msg_text                                |
+-----------------+--------+----------+-----------------------------------------+
| chapstick.email | repair | warning  | Number of rows changed from 100225 to 0 |
| chapstick.email | repair | status   | OK                                      |
+-----------------+--------+----------+-----------------------------------------+
2 rows in set (0.55 sec)

mysql> select count(*) from email;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql>
[17 May 2005 0:31] Greg Whalin
Seems to only happen when I run REPAIR TABLE from mysql client.  If I shut down and run myisamchck then everything is ok.

mysql> select count(*) from email;
+----------+
| count(*) |
+----------+
|   578709 |
+----------+
1 row in set (0.00 sec)

mysql> delete from email where ctime < '2005-05-16 00:00:00';
Query OK, 478484 rows affected (35.84 sec)

mysql> Bye
[db-ro1] [/usr/local/mysql/data/chapstick] > /etc/init.d/mysql.server stop
Killing mysqld with pid 25676
STOPPING server from pid file /usr/local/mysql/data/db-ro1.meetup.com.pid
050516 20:29:23  mysqld ended

[db-ro1] [/usr/local/mysql/data/chapstick] > 
[db-ro1] [/usr/local/mysql/data/chapstick] > 
[db-ro1] [/usr/local/mysql/data/chapstick] > myisamchk 
[db-ro1] [/usr/local/mysql/data/chapstick] > ../../bin/myisamchk -r email.MYI
- recovering (with sort) MyISAM-table 'email.MYI'
Data records: 100225
- Fixing index 1
- Fixing index 2
- Fixing index 3
- Fixing index 4
- Fixing index 5
[db-ro1] [/usr/local/mysql/data/chapstick] > /etc/init.d/mysql.server start
[db-ro1] [/usr/local/mysql/data/chapstick] > Starting mysqld daemon with databases from /usr/local/mysql/data

[db-ro1] [/usr/local/mysql/data/chapstick] > 
[db-ro1] [/usr/local/mysql/data/chapstick] > mysql -u root -p chapstick
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1 to server version: 4.1.12

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select count(*) from email;
+----------+
| count(*) |
+----------+
|   100225 |
+----------+
1 row in set (0.00 sec)

mysql>
[23 Jul 2005 7:45] MySQL Verification Team
I generated test table and tried to reproduce the problem. I wasn't able to repeat it.
Can you upload table files to our ftp?
ftp://ftp.mysql.com/pub/mysql/upload/
[23 Jul 2005 13:37] Greg Whalin
We have since switched to using Innodb and therefore the repair issue is not an issue for us.  You should be able to create a table using the schema def I added to comment earlier.  I did not notice the problem when the delete involved was small.  Try adding a few million rows of dummy data to that schema and doing a ctime based delete (one that will leave you with a few hundred thousand rows) and then try the test again. 

I really suspect that this was an issue w/ Opteron and the version of glibc on Fedora Core 2, so it may not be reproducable if you try something other than FC 2.

I do know that I was able to reproduce every single time I tried and on multiple machines (same OS and config)
[25 Sep 2005 8:33] Valeriy Kravchuk
I tried to repeat you test with newer 4.1.14 version (with default server variables values), but was unable:

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> CREATE TABLE `email` (
    ->   `email_id` int(16) NOT NULL auto_increment,
    ->   `member_id` int(16) NOT NULL default '0',
    ->   `member_type` varchar(32) NOT NULL default 'member',
    ->   `member_from` varchar(64) NOT NULL default 'info@meetup.com',
    ->   `member_replyto` varchar(64) NOT NULL default '',
    ->   `ip` varchar(16) NOT NULL default '0.0.0.0',
    ->   `recip_id` int(16) NOT NULL default '0',
    ->   `recip_type` varchar(16) NOT NULL default 'member',
    ->   `recip_subset` varchar(32) NOT NULL default '',
    ->   `recip_count` int(16) NOT NULL default '0',
    ->   `cc_member_id` int(16) NOT NULL default '0',
    ->   `subject` varchar(255) NOT NULL default '',
    ->   `body` text NOT NULL,
    ->   `status` int(2) NOT NULL default '0',
    ->   `interpolate` int(2) NOT NULL default '0',
    ->   `handler` varchar(64) NOT NULL default '',
    ->   `mtime` timestamp NOT NULL default CURRENT_TIMESTAMP on update
    -> CURRENT_TIMESTAMP,
    ->   `ctime` timestamp NOT NULL default '2002-06-14 15:00:00',
    ->   `stime` datetime default NULL,
    ->   `etime` datetime default NULL,
    ->   `dtime` datetime default NULL,
    ->   `priority` int(2) NOT NULL default '1',
    ->   PRIMARY KEY  (`email_id`),
    ->   KEY `member_id` (`member_id`,`recip_type`),
    ->   KEY `recip_type` (`recip_type`,`recip_id`),
    ->   KEY `member_type` (`member_type`,`recip_type`,`recip_id`),
    ->   KEY `status_priority` (`status`,`priority`,`ctime`)
    -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.09 sec)

mysql> insert into email(body) values('Message');
Query OK, 1 row affected (0.00 sec)

mysql> insert into email(body, ctime) select body, now() from email;
Query OK, 1 row affected (0.02 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into email(body, ctime) select body, now() from email;
Query OK, 2 rows affected (0.03 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into email(body, ctime) select body, now() from email;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0
...
mysql> insert into email(body, ctime) select body, now() from email;
Query OK, 65536 rows affected (12.47 sec)
Records: 65536  Duplicates: 0  Warnings: 0

mysql> select count(*) from email;
+----------+
| count(*) |
+----------+
|   131072 |
+----------+
1 row in set (0.00 sec)

mysql> insert into email(body, ctime) select body, now() from email;
Query OK, 131072 rows affected (26.70 sec)
Records: 131072  Duplicates: 0  Warnings: 0

mysql> insert into email(body, ctime) select body, now() from email limit 100000;
Query OK, 100000 rows affected (22.00 sec)
Records: 100000  Duplicates: 0  Warnings: 0

mysql> select distinct ctime from email;
+---------------------+
| ctime               |
+---------------------+
| 2002-06-14 15:00:00 |
| 2005-09-25 09:37:33 |
| 2005-09-25 09:37:35 |
| 2005-09-25 09:37:37 |
| 2005-09-25 09:43:01 |
| 2005-09-25 09:43:03 |
| 2005-09-25 09:43:05 |
| 2005-09-25 09:43:06 |
| 2005-09-25 09:43:07 |
| 2005-09-25 09:43:08 |
| 2005-09-25 09:43:09 |
| 2005-09-25 09:43:11 |
| 2005-09-25 09:43:12 |
| 2005-09-25 09:43:19 |
| 2005-09-25 09:43:22 |
| 2005-09-25 09:44:06 |
| 2005-09-25 09:44:49 |
| 2005-09-25 09:45:17 |
| 2005-09-25 09:45:57 |
| 2005-09-25 09:47:06 |
+---------------------+
20 rows in set (0.66 sec)

mysql> delete from email where ctime < '2005-09-25 09:47:06';
Query OK, 262144 rows affected (2 min 4.43 sec)

mysql> repair table email;
+------------+--------+----------+----------+
| Table      | Op     | Msg_type | Msg_text |
+------------+--------+----------+----------+
| test.email | repair | status   | OK       |
+------------+--------+----------+----------+
1 row in set (3.17 sec)

mysql> select count(*) from email;
+----------+
| count(*) |
+----------+
|   100000 |
+----------+
1 row in set (0.01 sec)

Then I tried with even larger table:

mysql> insert into email(body, ctime) select body, now() from email;
Query OK, 100000 rows affected (19.16 sec)
Records: 100000  Duplicates: 0  Warnings: 0

mysql> insert into email(body, ctime) select body, now() from email;
Query OK, 200000 rows affected (45.87 sec)
Records: 200000  Duplicates: 0  Warnings: 0

mysql> insert into email(body, ctime) select body, now() from email;
Query OK, 400000 rows affected (1 min 38.48 sec)
Records: 400000  Duplicates: 0  Warnings: 0

mysql> insert into email(body, ctime) select body, now() from email;
Query OK, 800000 rows affected (3 min 45.08 sec)
Records: 800000  Duplicates: 0  Warnings: 0

mysql> select count(*) from email;
+----------+
| count(*) |
+----------+
|  1600000 |
+----------+
1 row in set (0.00 sec)

mysql> insert into email(body, ctime) select body, now() from email limit 100000;
Query OK, 100000 rows affected (31.44 sec)
Records: 100000  Duplicates: 0  Warnings: 0

mysql> select distinct ctime from email;
+---------------------+
| ctime               |
+---------------------+
| 2005-09-25 09:47:06 |
| 2005-09-25 10:00:12 |
| 2005-09-25 10:00:37 |
| 2005-09-25 10:01:27 |
| 2005-09-25 10:03:12 |
| 2005-09-25 10:09:08 |
+---------------------+
6 rows in set (4.19 sec)

mysql> delete from email where ctime < '2005-09-25 10:09:00';

Query OK, 1600000 rows affected (15 min 50.42 sec)

mysql> select count(*) from email;
+----------+
| count(*) |
+----------+
|   100000 |
+----------+
1 row in set (0.00 sec)

mysql> repair table email;
+------------+--------+----------+----------+
| Table      | Op     | Msg_type | Msg_text |
+------------+--------+----------+----------+
| test.email | repair | status   | OK       |
+------------+--------+----------+----------+
1 row in set (3.48 sec)

mysql> select count(*) from email;
+----------+
| count(*) |
+----------+
|   100000 |
+----------+
1 row in set (0.01 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 4.1.14    |
+-----------+
1 row in set (0.00 sec)

mysql> exit
Bye
-bash-2.05b$ uname -a
Linux nocona 2.4.21-32.0.1.ELsmp #1 SMP Tue May 17 17:46:36 EDT 2005 x
86_64 x86_64 x86_64 GNU/Linux

So, I think it was either a hardware-related issue, configuration issue or a problem of 4.1.11. Please, inform us if you will encounter similar problems on 4.1.14.