Bug #43459 | falcon_ddl fails due to missing record after recovery | ||
---|---|---|---|
Submitted: | 6 Mar 2009 15:52 | Modified: | 21 Feb 2014 18:04 |
Reporter: | Olav Sandstå | Email Updates: | |
Status: | Unsupported | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | 6.0.10-alpha | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | error handling, F_DDL |
[6 Mar 2009 15:52]
Olav Sandstå
[7 Mar 2009 6:04]
Vladislav Vaintroub
philip, please explain the error message. What did fail?
[7 Mar 2009 6:36]
Philip Stoev
One of the SELECT queries that is run after recovery in order to validate the table failed with a "can't find record" message, which is not acceptable. My understanding is that this message is caused by Falcon reporting that a given row exists , e.g. during an index scan, however refusing to retrieve the data from the record at the second pass. This signifies that the table is corrupted. I will try to have the RQG print out the exact SQL of the failing query.
[7 Mar 2009 8:12]
Philip Stoev
Binary: http://loki08.norway.sun.com:8080/372537.mysql-6.0.11-alpha-solaris10-sparc.tar.gz Tablespace: http://loki08.norway.sun.com:8080/372630.mysql-6.0.11-alpha-solaris10-sparc-test.tar.gz , in master-data To reproduce: MTR_VERSION=1 perl mysql-test-run.pl --start-and-exit --mysqld=--datadir=master-data 1st and then issue: mysql> select * from `x`.`c`; Empty set (0.00 sec) mysql> SELECT * FROM `x`.`c` WHERE `pk` >= -9; ERROR 1032: Can't find record in 'c'
[9 Mar 2009 17:32]
Olav Sandstå
Here is an update on what I have found by using the Falcon database from the failing test: The MySQL server claims that is has one index on the table named "c": +-------+---------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------+---------+------+-----+---------+----------------+ | pk | int(11) | NO | PRI | NULL | auto_increment | +-------+---------+------+-----+---------+----------------+ 1 row in set (0.08 sec) mysql> show index from c; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_Comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | c | 0 | PRIMARY | 1 | pk | NULL | NULL | NULL | NULL | | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 1 row in set (0.03 sec) But by adding tracing to the Falcon code it seems like Falcon does not have any index on this table after recovery. Some details: 1. In the call to StorageInterface::index_init() we are not finding the requested index but return 0 (OK). 2. In the call to StorageInterface::read_range_first() does a call to StorageInterface::scanRange() which again calls StorageTable::checkCurrentIndex(). The result of this due to the index not existing is that error code 120 (HA_ERR_KEY_NOT_FOUND). This results in the client get the error message: ERROR 1032 (HY000): Can't find record in 'c'
[9 Mar 2009 19:00]
Olav Sandstå
The failing statement will "work as expected" if we changes the return value in StorageInterface::read_range_first() from HA_ERR_KEY_NOT_FOUND to HA_ERR_END_OF_FILE when StorageInterface::scanRange() returns an error. The same conversion of return value is done a few lines later in the function if index_next() returns an error. Still, I do not consider this as a solution since we should not hide the fact that the index is missing. A "real solution" should involve avoiding that the server and Falcon have different views on what indexes a table has. Or at least we should detect this and either report or repair the missing index after recovery. Setting the status back to verified and un-assign myself until we have a more clearer strategy for how this should be solved.
[9 Mar 2009 20:39]
Ann Harrison
It may "work as expected" if we return EOF, but it probably won't get the right answer - unless we were very lucky and the statement was in fact at the end of the result set.
[10 Mar 2009 12:41]
Philip Stoev
In order for work to continue on this bug, I need to modify the RQG so that it uses kill -11 and not kill -9 when killing the server before recovery. This way a record of the threads that existed at the time of the crash will be preserved (alternatively, gdb can be used to stop the process, list the threads and then kill the server). Once this is in place and the bug has been reproduced again, we will know better what caused it, including the actual query that was in progress at the time of the kill.