Bug #69356 ibuf_restore pos failed due to wrong tuple constructed for search
Submitted: 30 May 2013 21:30 Modified: 24 Jan 2014 5:11
Reporter: Rongrong Zhong Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.6 OS:Linux
Assigned to: CPU Architecture:Any

[30 May 2013 21:30] Rongrong Zhong
Description:
Marko has confirmed this should be a bug. Opening a bug to officially keep track of it.

ibuf_restore_pos failed when trying to restore cursor to the first record of a leaf page. I looked into the code and here's what I found:
btr_pcur_restore_position_func use the pcur->old_rec and pcur->old_n_fields to build a tuple used later by btr_search_to_nth_level to reposition the cursor. 

// btr0pcur.cc:btr_pcur_restore_position_func
        tuple = dict_index_build_data_tuple(index, cursor->old_rec,
                                            cursor->old_n_fields, heap);

Since the tuple is build based on a full user record on the leaf level, it has both the key fields and the payload. Thus when the tuple is used for comparison on the non-leaf level (in this case the root page of ibuf index, space 0 page 4), it's comparing the full leaf-level record with node-ptr records. In the failure case I have, the user record happens to be the first record on the page so its key is used in page 4 as index. However, since the last field differs, and this time the value of the last field on the leaf page happened to be smaller, search goes on to the previous page and ends up at the last user record on the previous page.

The data tuple used for search in btr_pcur_restore_position:
DATA TUPLE: 11 fields;
0: len 4; hex 0000004a; asc J;;
1: len 1; hex 00; asc ;;
2: len 4; hex 00008b21; asc !;;
3: len 46; hex 00000001860800088000860800088000860100018000860300048000860800088000010f00ff8008860800088000; asc ;;
4: len 8; hex 00005af34106c134; asc Z A 4;;
5: len 8; hex 00007b26960df353; asc {& S;;
6: len 1; hex 80; asc ;;
7: len 4; hex 50767b17; asc Pv{ ;;
8: len 8; hex 0000000000000001; asc ;;
9: len 71; hex 7b22686f775f666f756e64223a31372c22726571756573745f74696d65223a313334393737333538362c22666c696473223a22222c226d6573736167655f6964223a6e756c6c7d; asc {"how_found":17,"request_time":1349773586,"flids":"","message_id":null};;
10: len 8; hex 000000005ce402b3; asc \ ;;

 The node pointer record:
PHYSICAL RECORD: n_fields 11; 2-byte offsets; info bits 0
0: len 4; hex 0000004a; asc J;;
1: len 1; hex 00; asc ;;
2: len 4; hex 00008b21; asc !;;
3: len 46; hex 00000001860800088000860800088000860100018000860300048000860800088000010f00ff8008860800088000; asc ;;
4: len 8; hex 00005af34106c134; asc Z A 4;;
5: len 8; hex 00007b26960df353; asc {& S;;
6: len 1; hex 80; asc ;;
7: len 4; hex 50767b17; asc Pv{ ;;
8: len 8; hex 0000000000000001; asc ;;
9: len 71; hex 7b22686f775f666f756e64223a31372c22726571756573745f74696d65223a313334393737333538362c22666c696473223a22222c226d6573736167655f6964223a6e756c6c7d; asc {"how_found":17,"request_time":1349773586,"flids":"","message_id":null};;
10: len 4; hex 000002a3; asc ;;

Another thing I found, the search_tuple passed to ibuf_restore_pos is not really used other than being printed out after btr_pcur_restore_position failed. I guess that should be the tuple used to reposition the cursor?

How to repeat:
Read the source code. I have a db instance that would fail to start due to this error. But I don't know insert buffer well enough to create a simplified test case for it.
[3 Jun 2013 9:50] Arnaud Adant
Thanks for the code review !

This is a duplicate of this internal published bug : Bug 16884120 - TOO MANY FIELDS PASSED CHANGE BUFFER CURSOR STORE/RESTORE
[12 Sep 2013 9:13] Annamalai Gurusami
Can I have access to the data directory that can reproduce this problem?
[12 Sep 2013 15:40] Annamalai Gurusami
Both the node pointer record and the data tuple record contains 10 fields.  One of them is not correct.  Can you please provide the table definition?
[20 Sep 2013 0:52] Rongrong Zhong
The table definition would be some table in our production that I can not disclose. Unfortunately I don't know insert buffer well enough to reproduce this bug with a simpler test case. But the issue is described pretty clearly in original bug report.
[24 Sep 2013 8:31] Annamalai Gurusami
Hi Rongrong Zhong,

I don't need the exact table definition, a sanitized version of the table definition would also be helpful.  

Our analysis so far suggests that the cursor store/restore will always be made only on a record on a leaf page.   In your bug description, both the search tuple and the node pointer record are having 11 fields (the point is both are having equal number of fields.)   This is possible only if the store/restore of a cursor happens on a non-leaf page.  But I need to confirm this.  Can you calculate and tell me whether the search tuple is including the child pointer too?  Or please provide a sanitized table definition so that I can calculate it myself.

Thank you.

Rgds,
anna
[24 Sep 2013 9:35] Annamalai Gurusami
Would it be possible to provide the following information?

*  The information available in IBUF_DUMMY
*  The printout of the record pcur->old_rec
*  The gdb output for "print *index->fields@index->n_fields"

Since the search tuple that you provided above is not really used for cursor restore, it would be useful to get pcur->old_rec which is actually used for cursor restoration.
[24 Sep 2013 10:10] Marko Mäkelä
Rongrong, you are right that the search_tuple is only being used for diagnostics. In this case, it does not help us, and we need all this output:

		rec_print_old(stderr, btr_pcur_get_rec(pcur));
		rec_print_old(stderr, pcur->old_rec);
		dtuple_print(stderr, search_tuple);

		rec_print_old(stderr,
			      page_rec_get_next(btr_pcur_get_rec(pcur)));

The cursor positioning is based on pcur->old_rec. It is supposed to contain a prefix of the current record.

It looks like in your scenario, the btr_pcur_store_position/btr_pcur_restore_position fails on the very first record (the ibuf operation count is 0 in field 3).

You should be able to trigger this bug easier if you disable the "optimistic restore" (buf_page_optimistic_get) in btr_pcur_restore_position(). This shortcut could be why we have never seen this bug in our internal testing, to the best of my knowledge.
[24 Sep 2013 11:43] Marko Mäkelä
Rongrong, can you please tell us where exactly the
"The data tuple used for search in btr_pcur_restore_position"
in the Description is coming from?

As far as I see the code in ibuf_merge_or_delete_for_page(), it is constructing the initial search tuple like this:

	search_tuple = ibuf_search_tuple_build(space, page_no, heap);

It should contain only 3 fields like this (adapting your description):

DATA TUPLE: 3 fields;
0: len 4; hex 0000004a; asc J;;
1: len 1; hex 00; asc ;;
2: len 4; hex 00008b21; asc !;;

We would then look for the smallest record containing this (space_id,0,page_no).

I would also appreciate the call stack of the failure, from ibuf_merge_or_delete_for_page() on.

Note: I thought that I repeated this bug last May before my vacation, but I fail to repeat it any more, even when using older versions of our code. This is why we need more information.
[24 Sep 2013 12:16] Marko Mäkelä
I was fishing the bug database for possible duplicates of this bug, and found these:

Bug#27276 InnoDB Error: ibuf cursor restoration fails
Bug#28112 delete from table using results from another query fails

Bug#27276 was addressed by suppressing the warning message if the tablespace had been discarded and reimported with the same space_id. We should really have checked with the original reporter if this was the case. Now that I know InnoDB better, I doubt that we fixed a non-bug there. (Instead of assuming that the record did not exist, we should have checked if we are indeed searching for the right record. And we should have reproduced the bug. We were sloppy with test cases back then.)

For Bug#28112, we did not get any feedback. Because the output does not include the original value of pcur->rel_pos, it is difficult to say what really happened there.

When we invoke btr_pcur_restore_position() on the change buffer tree in the two places in ibuf0ibuf.cc, we are really asking for the smallest record that carries (space_id,0,page_no). The change buffer merge should process the records in alphabetic order, and at most one thread should be accessing the records for a given (space_id,0,page_no) prefix.
[11 Oct 2013 16:00] Ben Krug
Hello Rongrong -

Are you able to review the questions from Sep 24 and answer if possible?

Thank you,

Ben Krug
Senior MySQL Support Engineer
[11 Oct 2013 17:33] Rongrong Zhong
Sorry I was on another project for a while so I lost context on this. Now to answer the question:

It is correct that in ibuf_merge_or_delete_for_page the search tuple is correctly created. But when the search_tuple is passed to ibuf_restore_pos, it is not used other than printing out an error message. Instead, ibuf_restore_pos just calls btr_pcur_restore_position (mode, pcur, mtr), inside which a data tuple is created later with:

        tuple = dict_index_build_data_tuple(index, cursor->old_rec,
                                            cursor->old_n_fields, heap);

And this is the one that's actually used in restoring the cursor.

Does this answer your question?
[14 Jan 2014 13:59] Marko Mäkelä
Rongrong,
Sorry for taking so long to answer, but this bug is not assigned to me.

I would still like to have an answer to my question from Sep 24.
In your Description, it looks strange that the DATA TUPLE (which should be built on a change buffer leaf page record) has the same number of fields as the PHYSICAL RECORD (the node pointer).

The change buffer should internally be a non-unique index tree. This means that the key in the change buffer tree should include all fields of the record. But, in your output, the node pointer record seems to be missing one 8-byte column before the 4-byte child page number. Quoting from Description:

10: len 8; hex 000000005ce402b3; asc \ ;;
[snip]
10: len 4; hex 000002a3; asc ;;

The node pointer record should have been created by dict_index_build_node_ptr(). It seems to me that the last field (the one before the page number) could have been removed by incorrectly passing level>0 for a record that is in a leaf page.

We have been unable to reproduce this bug internally.

Did you reproduce this with an unmodified MySQL server?
If not, which patches were applied?

What is the PRIMARY KEY of the table? You can obfuscate the column names; I only need the column types.
[22 Jan 2014 20:43] Rongrong Zhong
Marko,

It's been a long time since I came across this bug so I don't really have the table definition of the table anymore. It's not likely that the level passed is > 0 cause ibuf_delete_rec calls btr_pcur_restore_position which is a macro hard wired to calling btr_pcur_restore_position_func with level = 0. If you are having a hard time reproduce this bug, I'd suggest we put this aside for now. Next time I encounter this bug I will make sure I record more details and figure out how to reproduce it.

Thanks!
Rongrong
[24 Jan 2014 5:11] Annamalai Gurusami
Not enough information was provided for us to be able to handle this bug. Please re-read the instructions at http://bugs.mysql.com/how-to-report.php

If you can provide more information, feel free to add it to this bug and change the status back to 'Open'.

Thank you for your interest in MySQL.