Bug #43948 recv_apply_hashed_log_recs() may hang up when meets DB_TABLESPACE_DELETED pages
Submitted: 30 Mar 2009 1:31 Modified: 12 Feb 2013 4:38
Reporter: Yasufumi Kinoshita Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.0.75 5.1.32 OS:Any
Assigned to: Yasufumi Kinoshita CPU Architecture:Any
Tags: Contribution

[30 Mar 2009 1:31] Yasufumi Kinoshita
Description:
Sometimes, the recovery from crash hang up at recv_apply_hashed_log_recs() with waiting (recv_sys->n_addrs == 0).

If buf_page_init_for_read() from buf_read_page_low() results error, recv_recover_page() is not called and recv_sys->n_addrs is not decremented.

How to repeat:
It may be rare case.
But, crash with high load may cause more.

Suggested fix:
--- a/innobase/buf/buf0rea.c	2008-12-19 02:19:35.000000000 +0900
+++ b/innobase/buf/buf0rea.c	2009-03-27 17:34:11.000000000 +0900
@@ -125,6 +125,45 @@
 	block = buf_page_init_for_read(err, mode, space, tablespace_version,
 								offset);
 	if (block == NULL) {
+		if (recv_recovery_is_on() && *err == DB_TABLESPACE_DELETED) {
+			/* hashed log recs must be treated here */
+			recv_addr_t*    recv_addr;
+
+			mutex_enter(&(recv_sys->mutex));
+
+			if (recv_sys->apply_log_recs == FALSE) {
+				mutex_exit(&(recv_sys->mutex));
+				goto not_to_recover;
+			}
+
+			/* recv_get_fil_addr_struct() */
+			recv_addr = HASH_GET_FIRST(recv_sys->addr_hash,
+					hash_calc_hash(ut_fold_ulint_pair(space, offset),
+						recv_sys->addr_hash));
+			while (recv_addr) {
+				if ((recv_addr->space == space)
+					&& (recv_addr->page_no == offset)) {
+					break;
+				}
+				recv_addr = HASH_GET_NEXT(addr_hash, recv_addr);
+			}
+
+			if ((recv_addr == NULL)
+			    || (recv_addr->state == RECV_BEING_PROCESSED)
+			    || (recv_addr->state == RECV_PROCESSED)) {
+				mutex_exit(&(recv_sys->mutex));
+				goto not_to_recover;
+			}
+
+			fprintf(stderr, " (space:%lu is deleted)", space);
+			recv_addr->state = RECV_PROCESSED;
+
+			ut_a(recv_sys->n_addrs);
+			recv_sys->n_addrs--;
+
+			mutex_exit(&(recv_sys->mutex));
+		}
+not_to_recover:
 		
 		return(0);
 	}
[1 Apr 2009 13:06] Heikki Tuuri
Yasufumi, thank you for the bug report! The bug manifests if innodb_file_per_table is used and tables are dropped just prior to the crash. I think I made this bug when I implemented tablespaces back in 2002. Strange that no user has reported this.
[1 Apr 2009 13:06] Heikki Tuuri
Assigning Marko to this.
[27 May 2009 15:49] Vasil Dimov
This could be the same as Bug#45097 Hang during recovery, redo logs for doublewrite buffer pages
[27 May 2009 15:59] Vasil Dimov
Yasufumi, can you reproduce the bug that you have reported here? Maybe you can confirm that the patch for Bug#45097 fixes also the present bug?
[30 Nov 2009 11:32] MySQL Verification Team
Need feedback for question [27 May 17:59] Vasil Dimov. Thanks in advance.
[1 Dec 2009 1:23] Yasufumi Kinoshita
Miguel,

At Bug#45097 after my comment,

> [28 May 15:04] Heikki Tuuri
> 
> Yasufumi's bug is due to .ibd files having been deleted.
> 
> Vasil's bug was due to redo logging to the doublewrite buffer.

Why do you think the fix for the different bug (Bug#45097) fix it?
I am always using the fix I suggested here now.
I have never met such situation since I applied it to our revision.

I think simple way to reproduce is that
removing one of the needed .ibd file before recovery.
("needed" means InnoDB has transaction log which should be applied to the file.)

Regards,
Yasufumi
[21 Oct 2010 9:11] Vasil Dimov
Yasufumi, can you help me to reproduce this. I am trying the following, but in vain:

start mysqld with --innodb-file-per-table=1

CREATE TABLE t (a VARCHAR(64) PRIMARY KEY) ENGINE=INNODB;

DROP PROCEDURE ins;
DELIMITER /
CREATE PROCEDURE ins()
BEGIN
        DECLARE i INT DEFAULT 100000;

        TRUNCATE TABLE t;

        START TRANSACTION;
        WHILE i > 0 DO
                INSERT INTO t VALUES (CONCAT(MD5(RAND()),MD5(RAND())));
                SET i = i - 1;
        END WHILE;
        COMMIT;
END;
/
DELIMITER ;

CALL ins();

While the above ins() is running, killall -9 mysqld.

Then remove t.ibd and start mysqld - but the bug does not happen.

Any ideas on how to reproduce this would be welcome.

Thanks!
[25 May 2011 14:40] Mark Callaghan
Heikki - we might have encountered this
[10 Feb 2013 11:13] Laurynas Biveinis
I don't see how the described situation may ever occur, at least with the current trunks.  The patch handles the situation of recovery trying to read a page from a deleted tablespace into the buffer pool.  But recovery initializes the fil_space structures by looking at the actual ibds on the disk (fil_load_single_table_tablespaces()) and log records are only stored to the recovery hash table if the tablespace is found to exist at the log parse time (recv_add_to_hash_table() calls fil_tablespace_deleted_or_being_deleted_in_mem()).

Thus a tablespace should somehow disappear between the fil_load_single_table_tablespaces() and some recv_add_to_hash_table() call with a redo log rec for that tablespace.  One way for this to happen would be to replay a MLOG_FILE_DELETE, but these aren't replayed during the crash recovery, and even if there is such rec in the redo log, its corresponding ibd is already deleted, meaning that fil_load_single_table_tablespaces() would not load it in the first place.

I'd replace the Yasufumi's patch with something like
ut_ad(!recv_recovery_is_on() || *err != DB_TABLESPACE_DELETED);
to be sure.
[12 Feb 2013 4:38] Yasufumi Kinoshita
> One way for this to happen would be to replay a MLOG_FILE_DELETE,
> but these aren't replayed during the crash recovery,

Yes, MLOG_FILE_DELETE is not replayed at mysqld recovery process, I have confirmed too.

The patch is needed only for backup tools like InnoDB Hot Backup or XtraBackup, because they might replay the MLOG_FILE_DELETE during their recovery process.

Thank you.
[12 Feb 2013 5:45] Laurynas Biveinis
Yasufumi -

Thank you for your comment.

Perhaps you have an opinion re. the last part of my previous comment?

"(assuming that MLOG_FILE_DELETE is replayed), and even if there is such rec in the redo log, its corresponding ibd is already deleted, meaning that fil_load_single_table_tablespaces() would not load it in the first place."