Bug #43964 | Falcon Exception: corrupted record after recovery | ||
---|---|---|---|
Submitted: | 30 Mar 2009 16:21 | Modified: | 26 May 2010 17:45 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Unsupported | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
Version: | 6.0-falcon-team | OS: | Any |
Assigned to: | Christopher Powers | CPU Architecture: | Any |
Tags: | F_ENCODING |
[30 Mar 2009 16:21]
Philip Stoev
[23 Apr 2009 21:13]
Vladislav Vaintroub
Why this is a critical bug? Did it affect any functionality? Is there anything else than some printout in the log file?
[25 Apr 2009 0:57]
Philip Stoev
Vlad, when it comes to cryptic recovery messages, I need to assume the worst. In this case, I assume that one or more records are either inaccessible or have bad data. If you are not happy with any of the triage values, please modify them and/or set the "need triage" flag so that this bug is triaged again.
[29 Apr 2009 20:58]
Kevin Lewis
We need to determine if this message truly indicates data corruption as Philip rightfully assumes. Also, Vlad suggested that this bug may have been fixed by recent RecordLocatorPage bugs, like Bug#39890
[15 May 2009 19:53]
Christopher Powers
Unable to recreate the error with the tablespace files provided. The recovery process cannot initialize the FALCON_USER tablespace because it cannot find the Page Inventory Page. (Note that the RQG testcase is falcon_pagesize_32K and that falcon_page_size = 32K.) Recovering database /home/cpowers/work/dev/dev-03/mysql/mysql-test/var/mysqld.1/data/falcon_master.fts ... First block in the serial log is 202 Last block in the serial log is 3023 First block used in recovery is 3023 Recovery phase 1... Processed: 2 Recovery phase 2... TableSpaceManager::bootstrap() : table space FALCON_USER, id 1, type 0, filename falcon_user.fts TableSpaceManager::bootstrap() : table space FALCON_TEMPORARY, id 2, type 0, filename falcon_temporary.fts Recovering from lost page inventory page 1 Exception: Transaction inventory page not found Can't open tablespace FALCON_USER, id 1 : Transaction inventory page not found Processed: 2 Recovery phase 3... Processed: 2 Recovery complete Exception: write error on page 1 (4096/4096/-1) of "/home/cpowers/work/dev/dev-03/mysql/mysql-test/var/mysqld.1/data/falcon_user.fts": Bad file descriptor (9) Thread::thread: thread 1082132816: write error on page 1 (4096/4096/-1) of "/home/cpowers/work/dev/dev-03/mysql/mysql-test/var/mysqld.1/data/falcon_user.fts": Bad file descriptor (9) The exception occurs when Dbb::getLastPage() is called. Dbb::getLastPage() is not essential to the recovery process, and if it is disabled, the recovery completes but still leaves the tablespaces unusable. It appears that the database is not viable. There have been several Recovery fixes since the bug was opened. I suggest monitoring PB2 for another occurrence.
[24 Jun 2009 8:21]
Olav Sandstå
This situation happened again. This time it was during validation of the tables in the database after a "successful" recovery in the falcon_many_indexes test. Multiple records in a table are reported as "corrupted". The log file contains: # 21:59:56 Verifying table: table100_falcon_key_pk_parts_2_int_autoinc; database: test # 21:59:56 28: TransID=19 CommitNoUpdates (null) # 21:59:56 28: TransID=20 CommitNoUpdates (null) # 21:59:59 corrupted record (overrun), table 26, record 1049 # 21:59:59 Compressed # 21:59:59 11, 27, 11, 27, 11, 27, 11, 27, 11, 27, # 21:59:59 11, 27, 11, 27, 11, 27, 11, 27, 11, 27, # 21:59:59 11, 27, 11, 27, 11, 27, 11, 27, 11, 27, # 21:59:59 11, 27, 11, 27, 11, 27, 11, 27, 11, 27, # 21:59:59 11, 27, 11, 27, 11, 27, 11, 27, 11, 27, # 21:59:59 11, 27, 11, 27, 11, 27, 11, 27, 11, 27, # 21:59:59 11, 27, 11, 27, 11, 27, 11, 27, 11, 27, # 21:59:59 11, 27, 11, 27, 11, 27, 11, 27, 11, 27, # 21:59:59 11, 27, 11, 27, 11, 27, 11, 27, 11, 27, # 21:59:59 11, 27, 11, 27, 11, 27, 11, 27, 11, 27, # 21:59:59 11, 27, 11, 27, 11, 27, 11, 27, 11, 27, # 21:59:59 11, 27, 11, 27, 11, 27, 11, 27, 11, 27, # 21:59:59 5131, 18964, -5120, 26726, 26726, 31076, 31076, 29043, 29043, 26229, # 21:59:59 26229, 28010, 28010, 26722, 26722, 25963, 25963, 26982, 26982, 31091, # 21:59:59 31091, 27499, 27499, 27238, 27238, 29801, 29801, 26477, 26477, 26731, # 21:59:59 26731, 30066, 30066, 26729, 26729, 29295, 29295, 30050, 30050, 24944, # 21:59:59 24944, 30570, 30570, 28534, 28534, 30312, 30312, 27746, 27746, 27249, # 21:59:59 27249, 25716, 25716, 30576, 30576, 27750, 27750, 26218, 26218, 30320, # 21:59:59 30320, 26216, 26216, 28784, 28784, 30832, 30832, 28259, 28259, 25195, # 21:59:59 25195, 30063, 30063, 28262, 28262, 25190, 25190, 30064, 30064, 25700, # 21:59:59 25700, 30070, 30070, 27246, 27246, 28777, 28777, 31092, 31092, 28013, # 21:59:59 28013, 29815, 29815, 28779, 28779, 26988, 26988, 28513, 28513, 24933, # 21:59:59 24933, 27505, 27505, 26724, 26724, 30580, 30580, 25718, 25718, 27237, # 21:59:59 27237, 29556, 29556, 29815, 29815, 28769, 28769, 26726, 26726, 27753, # 21:59:59 27753, 330, 12670, -28989, -15567, 12686, -28989, -15567, 12686, -28989, # 21:59:59 -15567, 12686, -28989, -15567, 12686, -28989, -15567, 12686, -28989, -15567, # 21:59:59 12686, -28989, -15567, 12686, -28989, -15567, 12686, -28989, -15567, 12686, # 21:59:59 -28989, -15567, 12686, -28989, -15567, 12686, -28989, -15567, 12686, -28989, # 21:59:59 -15567, 12686, -28989, -15567, 12686, -28989, -15567, 12686, -28989, -15567,
[28 Jun 2009 21:27]
Christopher Powers
Associated push: http://pb2.norway.sun.com/web.py?template=push_details&push=427310
[29 Jun 2009 23:50]
Christopher Powers
This is not a recovery problem. There appears to be an issue with records that are stored across one or more overflow pages. The error occurs when Stream::decompress() encounters record 1049 in table100_falcon_key_pk_parts_2_int_autoinc. Record 1049 was apparently too large to fit onto the one page, so the record "tail" was relegated to an overflow page. The failure can be easily recreated using the database files noted above. The error occurs when record 1049 is accessed for the first time, after recovery completes. The error does not occur after that. Debugging reveals that record 1049 is stored on primary page 2516 and overflow page is 810. Neither of these pages is accessed during recovery, nor is record 1049 accessed during recovery. Interestingly, none of the record data is stored on the primary page, only the overflow page. The next step is to review the overflow mechanism, especially for records that equal the page size.