Bug #10132 | Crashing the server on corrupt InnoDB page is unhelpful | ||
---|---|---|---|
Submitted: | 25 Apr 2005 3:00 | Modified: | 23 Jan 2014 12:37 |
Reporter: | James Day | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S1 (Critical) |
Version: | 5.1, 6.0 | OS: | Linux (Linux) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[25 Apr 2005 3:00]
James Day
[25 Apr 2005 9:21]
Heikki Tuuri
James, nice meeting you at the UC 2005! Also an undo log page, or even worse, an allocation bitmap page can become corrupt. Then there is no associated table. I think InnoDB currently does not know for sure in which index of which table a page is, unless you use the innodb_file_per_table option. Marking a corrupt table 'crashed' like in MyISAM, is something we could possibly do. There is no builtin error return mechanism in InnoDB's B-tree operations. It is a fatal error if it encounters a corrupt page. But we could mark the table as 'crashed', and reboot mysqld, so that further queries would be blocked until the DBA marks the table as ok again. A couple of years ago, typical Linux file corruption was so severe that InnoDB would probably seg fault if it accessed a corrupt page at all. And often file corruption happens on many pages, so that crashes are inevitable. It may be that corruption is less severe nowadays. As a first step, we could change the InnoDB behavior and let it use a corrupt page. Hmm... then there is a risk of further corruption. That is bad :(. Note also that sometimes rebooting the computer fixes the file corruption, if it is only present in the file cache of the OS. We should then remove the 'crashed' mark from the table. As you see, changing the behavior from the current one has also some downsides. I think the MyISAM approach is the best. But we must let the DBA to remove the 'crashed' mark also without running CHECK TABLE, because that can take hours. The next question is where we can store that 'crashed' bit. The .frm file would be an easy place. Another possibility is the InnoDB internal data dictionary. If the bit were set, InnoDB would refuse to open the table at all, unless innodb_force_recovery is used. The plan: 1. Find out if we can always recognize what table a corrupt page belongs to. If innodb_file_per_table is used, at least then we know it for sure. 2. Put a bit in the InnoDB internal data dictionary, to mark the corruption of a table. 3. Refuse to open a table where the bit is set. 4. Give the DBA some SQL command to erase that bit. This is something for 5.1 or 5.2. Regards, Heikki
[25 Apr 2005 22:24]
James Day
Was good to meet you and others there as well - lots of productive discussions! Was interesting to note how many people were recommending InnoDB as the engine of choice. That's what I did as well, for anyone with reliability and availability needs, when asked during the Wikipedia presentation. The handling of corrupt pages is the biggest problem with that, which is why you're reading this report.:) I like your use of the word "currently" for not knowing which table is affected.:) My first reaction was telling the B-tree engine the type of page, database and table it is being asked for. Then it can report and may have ways to attempt a partial repair or crash-proof but "empty" and marked don't save page return, sufficient to allow data to be dumped. Sadly, that is sure to be desirable for some users, who can be expected not to have a backup or binlog. I wonder if it is possible to read enough data to give a good chance of purging the OS cache before trying another read of the page? Still need to tell the DBA but it might work tolerably well in a bad situation. Of course, people would inevitably start reporting "InnoDB is slow" bugs because of this, without fixing the real cause.:( The plan seems good. Very big improvement over current version, when bad things are happening.
[1 Sep 2010 5:13]
James Day
One useful option here would be to skip damaged pages that are in tables. Is that viable? Then we could export all readable pages in a table during disaster recovery. That's what we try to do today anyway, we just have to jump through hoops (ORDER BY PK DESC or secondary keys) to do it instead of using simple mysqldump. Or perhaps a save and erase page command, saves page to disk, replaces all contents with an empty page so no error is produced.
[9 Sep 2010 11:12]
Marko Mäkelä
Bug #56373 is a duplicate (special case) of this one.
[24 Dec 2010 15:23]
MySQL Verification Team
see bug #59156
[10 Feb 2011 9:37]
Marko Mäkelä
I believe that this bug has to be fixed before Bug #59385 (mysqld crashes in various ways when starting to run into OS open files limit) can be fixed.
[10 Dec 2011 0:15]
Roel Van de Paar
As Bug #59385 was marked a duplicate of this one, please make sure that when this bug (#10132) is fixed, any issues reported in Bug #59385 are fixed also. If not, please re-open that bug.
[5 Jun 2012 19:19]
Sveta Smirnova
Bug #65485 was marked as duplicate of this one.
[12 Aug 2013 23:43]
Jonathan Nicol
Which bug is this a duplicate of? Will it ever be fixed?
[23 Jan 2014 12:37]
Marko Mäkelä
Sorry, I think that this bug was incorrectly closed as a duplicate. Jimmy Yang fixed an internally filed bug that implemented some infrastructure that would allow us to flag a table or index corrupted. That was a prerequisite for fixing this bug, but it did not fix this bug. When we read a corrupted page, we currently simply crash the server. Instead, we should propagate the error all way up to the actual client connection. If it was an index page, we should mark the index tree corrupted. If it was a page in the clustered index, we should mark the entire table corrupted. If it was some other page, we should maybe return some warning and mark the whole tablespace file read-only. It could also make sense to make the file (or the whole database instance) read-only. This choice (go read-only, abort, or try to carry on read-write) could be controlled by some configuration parameter. Likewise, when writing to a file, we have different options. If a redo log write fails, we probably should not crash, but instead try to flush the dirty pages from the buffer pool, so that the database can be recovered without the corrupted redo log. On top of this there are some "internal" issues discussed in Bug#59385: what we should do when the InnoDB internal bookkeeping says that we have run out of space in a fixed-size tablespace, or if the internal bookkeeping is corrupted. As you can see, a complete fix will be a huge amount of work. Unfortunately, I cannot promise a complete fix any time soon.
[24 Jan 2014 0:03]
Roel Van de Paar
Marko, great ideas. This rocks. Looking forward to a day when all this is added. This sort of feature would definitely be in my "Top 3 most wanted features" vote. I expect many others to think likewise. Can this be blueprinted (maybe even publicly), community feedback be sought, and then triaged for implementation?
[24 Oct 2014 7:01]
Marko Mäkelä
A duplicate bug was reported: Bug#72645 INNODB SHOULD ABORT WHEN A PAGE IN SYSTEM, UNDO OR TEMP TABLESPACE IS CORRUPTTED
[7 Nov 2014 12:51]
James Day
There's really no one perfect way to respond to this because the desired response depends on the environment: 1: highly redundant systems. These places will tend to want an immediate total failure because their strategy tends to be to replace or reclone a slave. 2: less highly redundant or shared hosting. These places will tend to want to continue limited service to the maximum extent possible. The first set of places may have code to handle a propagated failure but probably won't care. The second set probably won't have very good error handling code, in general. The kill the server immediately and use innodb_force_recovery modes is the way to go for set 1, for set 2, not. So a flag to pick between the two ways is desirable if we're to handle the range of user needs. James Day, MySQL Senior Principal Support Engineer, Oracle