Bug #68302 MySQL crashed when trying to free page that is already marked as free
Submitted: 7 Feb 2013 13:47 Modified: 31 Dec 2013 11:14
Reporter: Michael Ivanov Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:5.5.32 OS:Linux
Assigned to: CPU Architecture:Any

[7 Feb 2013 13:47] Michael Ivanov
Description:
MySQL crashes with an error in logs:

InnoDB: Dump of the tablespace extent descriptor:  len 40; hex 0000000000000002000980000d1e0004c0000f9e00000004aaaaaaaafaaaaaabaefaaaaaaaaaaaaa; asc                                         ;
InnoDB: Serious error! InnoDB is trying to free page 667538
InnoDB: though it is already marked as free in the tablespace!
InnoDB: The tablespace free space info is corrupt.
InnoDB: You may need to dump your InnoDB tables and recreate the whole
InnoDB: database!

After this error it can't start again even with force_recovery set to 6. The crash also occurred on all slave servers so it is definitely not a hardware data failure. I inspected all *.ibd files with innodbchecksum utility and it reports no problems. After I analyzed the page number 667538 I found that it contains a record inside our customers table. Table structure is the following

CREATE TABLE `customers` (
  `login` varchar(128) NOT NULL,
  `usertype` char(1) NOT NULL DEFAULT '',
  `password` varchar(255) NOT NULL DEFAULT '',
  `password_hint` varchar(128) NOT NULL DEFAULT '',
  `password_hint_answer` varchar(128) NOT NULL DEFAULT '',
  `b_title` varchar(32) NOT NULL DEFAULT '',
  `b_firstname` varchar(128) NOT NULL DEFAULT '',
  `b_lastname` varchar(128) NOT NULL DEFAULT '',
  `b_address` varchar(64) NOT NULL DEFAULT '',
  `b_city` varchar(64) NOT NULL DEFAULT '',
  `b_county` varchar(32) NOT NULL DEFAULT '',
  `b_state` varchar(32) NOT NULL DEFAULT '',
  `b_country` char(2) NOT NULL DEFAULT '',
  `b_zipcode` varchar(32) NOT NULL DEFAULT '',
  `title` varchar(32) NOT NULL DEFAULT '',
  `firstname` varchar(128) NOT NULL DEFAULT '',
  `lastname` varchar(128) NOT NULL DEFAULT '',
  `company` varchar(255) NOT NULL DEFAULT '',
  `s_title` varchar(32) NOT NULL DEFAULT '',
  `s_firstname` varchar(128) NOT NULL DEFAULT '',
  `s_lastname` varchar(128) NOT NULL DEFAULT '',
  `s_address` varchar(255) NOT NULL DEFAULT '',
  `s_city` varchar(255) NOT NULL DEFAULT '',
  `s_county` varchar(32) NOT NULL DEFAULT '',
  `s_state` varchar(32) NOT NULL DEFAULT '',
  `s_country` char(2) NOT NULL DEFAULT '',
  `s_zipcode` varchar(32) NOT NULL DEFAULT '',
  `email` varchar(128) NOT NULL DEFAULT '',
  `phone` varchar(32) NOT NULL DEFAULT '',
  `fax` varchar(32) NOT NULL DEFAULT '',
  `url` varchar(128) NOT NULL DEFAULT '',
  `card_name` varchar(255) NOT NULL DEFAULT '',
  `card_type` varchar(16) NOT NULL DEFAULT '',
  `card_number` varchar(128) NOT NULL DEFAULT '',
  `card_expire` varchar(4) NOT NULL DEFAULT '',
  `card_cvv2` varchar(64) NOT NULL DEFAULT '',
  `last_login` int(11) NOT NULL DEFAULT '0',
  `first_login` int(11) NOT NULL DEFAULT '0',
  `status` char(1) NOT NULL DEFAULT 'Y',
  `referer` varchar(255) NOT NULL DEFAULT '',
  `ssn` varchar(32) NOT NULL DEFAULT '',
  `language` char(2) NOT NULL DEFAULT 'US',
  `cart` mediumtext NOT NULL,
  `change_password` char(1) NOT NULL DEFAULT 'N',
  `parent` varchar(32) NOT NULL DEFAULT '',
  `pending_plan_id` int(11) NOT NULL DEFAULT '0',
  `activity` char(1) NOT NULL DEFAULT 'Y',
  `membershipid` int(11) NOT NULL DEFAULT '0',
  `is_qa` tinyint(1) NOT NULL DEFAULT '0',
  `pending_membershipid` int(11) NOT NULL DEFAULT '0',
  `tax_number` varchar(50) NOT NULL DEFAULT '',
  `tax_exempt` char(1) NOT NULL DEFAULT 'N',
  `s_company` varchar(255) NOT NULL DEFAULT '',
  `s_phone` varchar(32) NOT NULL DEFAULT '',
  `phone2` varchar(32) NOT NULL DEFAULT '',
  `initials` char(3) NOT NULL DEFAULT '',
  `notify` tinyint(1) NOT NULL DEFAULT '0',
  `cart_completed` mediumtext NOT NULL,
  `sales_goal` varchar(64) NOT NULL DEFAULT '',
  `bml_account` varchar(32) NOT NULL DEFAULT '',
  PRIMARY KEY (`login`),
  KEY `usertype` (`usertype`),
  KEY `last_login` (`last_login`),
  KEY `first_login` (`first_login`),
  KEY `status` (`status`),
  KEY `membershipid` (`membershipid`),
  KEY `notify_idx` (`usertype`,`notify`),
  KEY `email` (`email`(16))
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

One of the related query that was found in binlog a few seconds before crash occurred is

update customers set cart_completed='I am a very long text data containing php-serialized array of customer cart' where login='sensitive information' limit 1

The record updated is the record from page 667538.

When I removed customers.ibd file and copied the old version of that table the servers were able to start again.

How to repeat:
Perform the query specified above after the yet unknown conditions.
[7 Feb 2013 15:51] Michael Ivanov
I have done more tests and it seems the above query does not produce errors on clean dump with the table data restored via mysqldump. So it seems to me the real table corruption occurred on this record before and the above update query just provoked a crash.
[15 Feb 2013 18:40] Sveta Smirnova
Thank you for the report.

According to last comment this is not a bug, so I am closing it as such. If you would be able to prove that this is MySQL server causes table corruption feel free to reopen the report.
[18 Feb 2013 9:43] Michael Ivanov
It is clear that it was MySQL causing the corruption because it was propagated to all slave servers as I wrote above.
[18 Feb 2013 11:03] Sveta Smirnova
Thank you for the feedback.

Do you use row-based or mixed replication?
[18 Feb 2013 11:29] Michael Ivanov
I use statement based replication.
[18 Feb 2013 11:35] Sveta Smirnova
Thank you for the feedback.

Could you please send us binary logs, created in time when corruption happened on master first time? We need binary log with statement, caused corruption.
[18 Feb 2013 12:34] Michael Ivanov
uploaded last 15 minutes binlog before crash occurred (/support/incoming/binlog-68302.sql.gz)
[18 Feb 2013 19:09] Sveta Smirnova
Thank you for the feedback.

I don't see the file. Please re-upload.
[19 Feb 2013 6:49] Michael Ivanov
Ok I reuploaded the file. If you still don't see it, I can upload it to my public server and provide you URL.
[19 Feb 2013 11:53] Sveta Smirnova
Thank you for the file!

Now I could access it.
[19 Feb 2013 16:58] Sveta Smirnova
Thank you for the feedback.

I can not repeat corruption with dummy data. Please send us more details about your installation: exact version of MySQL server and configuration file for master and slaves.
[25 Feb 2013 7:54] Michael Ivanov
Exact version of MySQL is 5.5.30-log MySQL Community Server (GPL) by Remi on one of our crashed slaves. Also this crash had happened on 5.5.28 and 5.5.29 versions of MySQL. I will also attach the config file of this server.
[29 Jul 2013 7:30] Michael Ivanov
I've just got the same crash on MySQL 5.5.32
[29 Jul 2013 14:30] Michael Ivanov
Based on http://bugs.mysql.com/bug.php?id=55284 I made an assumption that changing innodb_file_format from Antelope to Barrucuda may help. Im gonna try it.
[24 Dec 2013 14:57] Sveta Smirnova
Thank you for the feedback.

Can you repeat the failure if remove option O_DIRECT?
[30 Dec 2013 13:47] Michael Ivanov
Thanks for your suggestion Sveta. I removed O_DIRECT option from two our slave servers. So far (running for 2 days with this option) no crashes occured, but usually we are getting those crashes once in a month or so. However can you please elaborate how this option can cause the crashes since we are getting the crashes on all servers via the replication?
[31 Dec 2013 11:14] Sveta Smirnova
Thank you for the feedback.

Regarding O_DIRECT please read at http://man7.org/linux/man-pages/man2/open.2.html what this flag does: "The O_DIRECT flag on its own makes an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC flag that data and necessary metadata are transferred.  To guarantee synchronous I/O, O_SYNC must be used in addition to O_DIRECT."

Using this flag you can increase performance for MySQL, but make IO operations less stable which can lead to corruptions.

I'll mark this report as "Can't repeat" for now. If you could repeat the issue without using O_DIRECT feel free to reopen the report.