Bug #53756 ALTER TABLE ADD PRIMARY KEY affects crash recovery
Submitted: 18 May 2010 16:49 Modified: 15 Feb 2011 6:48
Reporter: Ingo Strüwing Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.1, 5.6.99 OS:Any
Assigned to: Jimmy Yang
Triage: Needs Triage: D2 (Serious) / R6 (Needs Assessment) / E6 (Needs Assessment)

[18 May 2010 16:49] Ingo Strüwing
Description:
If I create a table like this:
CREATE TABLE t1 (pk INT, c1 INT) ENGINE=InnoDB;
ALTER TABLE t1 ADD PRIMARY KEY (pk);
run a few transactions on it, keep some open, crash the server, and restart,
then the open transactions seem to be implicitly committed.
This doesn't seem right to me.

If I create a table like this:
CREATE TABLE t1 (pk INT PRIMARY KEY, c1 INT) ENGINE=InnoDB;
run a few transactions on it, keep some open, crash the server, and restart,
then the open transactions seem to be implicitly rolled back.
This seems right to me.

How to repeat:
I will add a tar file with the two test cases:

  recover_alter   (first case with ALTER as above)
  recover_simple  (second case as above)

Each test case includes a test script, options file (to suppress MTR stack backtrace after the crash), and result file.

recover_alter does also include the reject file. Diff this against recover_alter.result to see, which two rows are not treated as expected.

Diffs recover_alter.test <-> recover_simple.test and
recover_alter.result <-> recover_simple.result show that only the creation
of the table makes a difference.

If you like to re-run the test, save the tar file, [branch from mysql-5.1,]
cd mysql-test, tar xf <the-tar-file>, cd .., [build,] and run the above
test cases.
[18 May 2010 16:51] Ingo Strüwing
tar file with the two testcases

Attachment: bug53756.tar (application/x-tar, text), 30.00 KiB.

[26 May 2010 6:49] Sveta Smirnova
Thank you for the report.

Verified as described.
[26 May 2010 22:04] Mikhail Izioumtchenko
I was going to try the testcase today but didn't find the time,
will try get back to it.
[27 May 2010 21:25] Mikhail Izioumtchenko
reproduces with mysql-5.1-innodb, both builtin and plugin.
Further experiments with the plugin:

1. will not reproduce if we replace ALTER TABLE t1 ADD PRIMARY KEY (pk);
with ALTER TABLE t1 ADD KEY (pk);
2. will not reproduce if instead of 

SET SESSION debug="+d,crash_commit_before";
COMMIT;

something else is used to kill the server, like replacing it with

--sleep 20

and killing the server manually when it goes to sleep
3. reproduces if crash_commit_before action of abort() is replaced with kill(getpid(),9);
4. no asserts with UNIV*DEBUG
5. with recover_simple, it's 3 txns that are rollback according to the log.
This is to be expected, there are open connections 2,5 and the default one that does the crash. But when the bug reproduces with recover_alter, we have two 
txns being rolled back in the log.

very puzzling. The fact that the bug wouldn't reproduce when PRIMARY 
is removed from ALTER should be investigated.
[27 May 2010 22:10] Mikhail Izioumtchenko
Plugin: we don't have wrong result error with KEY or UNIQUE KEY instead
of PRIMARY key. We do have result error if we then ADD and DROP a column
of the table after UNIQUE KEY, or if we add ADD+DROP after the table
creation in recover_simple.test.
Builtin: we have wrong result with UNIQUE KEY and KEY as well.
This proves that we need to execute a kind of ALTER that rebuilds the table
to trigger the error, if we use FIC, there's no error.
[27 May 2010 23:38] Mikhail Izioumtchenko
reproducibility is affected by timing, this is why killing the server
manually didn't work. This is how inserting sleep before the crashing COMMIT
affects the reproducibility:

sleep 1,2,4,8,9: reproduces, in fact row 666 becomes now visible after restart
sleep 10,12,16,20: no longer reproduces
[31 May 2010 12:14] Marko Mäkelä
If this bug is caused by a table-copying ALTER, as Michael suggests, then the fix could be proper meta-data locking, so that we can get rid of the background DROP TABLE queue. See also Bug #42643, which is about TRUNCATE.
[1 Jun 2010 6:17] Jimmy Yang
The problem is due to following ussue:

When scanning systables row in dict_load_table_on_id() during crash recovery, there are actually two rows for the same table_id in the secondary index of systables for test/t1. The first row is a deleted marked record for a temp table created during alter table operation, the second row is an active row for the actual table in operation (test/t1). However, in dict_load_table_on_id(), it only checks the first row, and found it is "deleted", and thus declare no such table (test/t1) and return, the end result is that the undo log did not apply to the table (test/t1).

Stack for above description:

dict_load_table_on_id
dict_table_get_on_id_low
dict_table_get_on_id
row_undo_ins_parse_undo_rec
row_undo_ins
row_undo
row_undo_step

debug info:

-- first row (with table id of 0,14)
(gdb) x field
0xb6d9c0d7:	0x00000000
(gdb) x/2 field
0xb6d9c0d7:	0x00000000	0x0e000000
(gdb) n
986		field = rec_get_nth_field_old(rec, 1, &len);
(gdb) 
988		table = dict_load_table(mem_heap_strdupl(heap, (char*) field, len));
(gdb) p field
$38 = (unsigned char *) 0xb6d9c0df "test/#sql-2aca_2\031\b "

So the first row is for a table named "test/#sql-2aca_2\031\b" and it is marked as deleted.

Let's call btr_pcur_move_to_next_user_rec(&pcur, &mtr) to get the next row:

(gdb) call btr_pcur_move_to_next_user_rec(&pcur, &mtr)

(gdb) x/2 field
0xb6d9c118:	0x00000000	0x0e000000 <== also (0, 14)
(gdb) n
986		field = rec_get_nth_field_old(rec, 1, &len);
(gdb) 
988		table = dict_load_table(mem_heap_strdupl(heap, (char*) field, len));
(gdb) p field
$49 = (unsigned char *) 0xb6d9c120 "test/t1"

So the next row is also (0, 14) and it is of table "test/t1", and it is not deleted.

After this additional call of btr_pcur_move_to_next_user_rec(), continue the process, and you will find the undo applied correctly, and everything runs fine.

This also explains the timing issue. If we put sleep 10 right after "alter table" operation (no need to put before crash), you will see the problem disappear, most likely the deleted marked record in secondary index got removed. So crash recovery runs without problem.

One solution for this issue is that we shall scan the index and find the first "non-delete marked" record, rather than first record matches the table id.

Thanks
Jimmy
[29 Jun 2010 2:50] Jimmy Yang
Fix checked in mysql-5.1-innodb

3528 Jimmy Yang        2010-06-28
      Check in fix for bug #53756: "ALTER TABLE ADD PRIMARY KEY affects
      crash recovery"
[4 Aug 2010 7:51] Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100731131027-1n61gseejyxsqk5d) (version source revid:alik@sun.com-20100731074942-o840woifuqioxxe4) (merge vers: 5.5.6-m3) (pib:18)
[4 Aug 2010 7:55] Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100731131027-1n61gseejyxsqk5d) (version source revid:alik@sun.com-20100731074942-o840woifuqioxxe4) (merge vers: 5.5.6-m3) (pib:18)
[4 Aug 2010 7:56] Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100731131027-1n61gseejyxsqk5d) (version source revid:alik@sun.com-20100731074942-o840woifuqioxxe4) (merge vers: 5.5.6-m3) (pib:18)
[4 Aug 2010 7:57] Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100731131027-1n61gseejyxsqk5d) (version source revid:alik@sun.com-20100731074942-o840woifuqioxxe4) (merge vers: 5.5.6-m3) (pib:18)
[4 Aug 2010 7:58] Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100731131027-1n61gseejyxsqk5d) (version source revid:alik@sun.com-20100731074942-o840woifuqioxxe4) (merge vers: 5.5.6-m3) (pib:18)
[4 Aug 2010 7:59] Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100731131027-1n61gseejyxsqk5d) (version source revid:alik@sun.com-20100731074942-o840woifuqioxxe4) (merge vers: 5.5.6-m3) (pib:18)
[4 Aug 2010 8:00] Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100731131027-1n61gseejyxsqk5d) (version source revid:alik@sun.com-20100731074942-o840woifuqioxxe4) (merge vers: 5.5.6-m3) (pib:18)
[4 Aug 2010 8:03] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:alik@sun.com-20100731075120-qz9z8c25zum2wgmm) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:13] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:alik@sun.com-20100731075120-qz9z8c25zum2wgmm) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:14] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:alik@sun.com-20100731075120-qz9z8c25zum2wgmm) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:15] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:alik@sun.com-20100731075120-qz9z8c25zum2wgmm) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:16] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:alik@sun.com-20100731075120-qz9z8c25zum2wgmm) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:17] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:alik@sun.com-20100731075120-qz9z8c25zum2wgmm) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:19] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804081533-c1d3rbipo9e8rt1s) (version source revid:alik@sun.com-20100731075120-qz9z8c25zum2wgmm) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 9:00] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804081533-c1d3rbipo9e8rt1s) (version source revid:alik@sun.com-20100731075120-qz9z8c25zum2wgmm) (merge vers: 5.6.99-m4) (pib:20)
[4 Aug 2010 9:01] Bugs System
Pushed into mysql-next-mr (revid:alik@ibmvm-20100804081630-ntapn8bf9pko9vj3) (version source revid:alik@sun.com-20100731075120-qz9z8c25zum2wgmm) (pib:20)
[17 Aug 2010 20:48] John Russell
Added to 5.1, 5.5, 5.6 change log:

Transactions could be incorrectly committed during recovery, rather
than rolled back, if the server crashed and was restarted after
performing ALTER TABLE...ADD PRIMARY KEY on an InnoDB table,
or some other operation that
involves copying the entire table.
[19 Aug 2010 15:41] Bugs System
Pushed into mysql-5.1 5.1.51 (revid:build@mysql.com-20100819151858-muaaor6jojb5ouzj) (version source revid:build@mysql.com-20100819151858-muaaor6jojb5ouzj) (merge vers: 5.1.51) (pib:20)
[30 Aug 2010 17:44] Mark Callaghan
Where is the commit for this? This is broken for the plugin in 5.1.50 and innodb.innodb_bug53576.test fails for the plugin in 5.1.50.
[6 Sep 2010 9:43] Shane Bester
on debug 5.1.50 build, i can confirm 5.1.50 builtin passes but 5.1.50 plugin failed. will test bzr&file new bug as needed

Attachment: bug53756_not_fixed_in_5.1.50_plugin.txt (text/plain), 3.80 KiB.

[14 Oct 2010 8:37] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (version source revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (merge vers: 5.1.51-ndb-7.0.20) (pib:21)
[14 Oct 2010 8:52] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.39 (revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (version source revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (merge vers: 5.1.51-ndb-6.3.39) (pib:21)
[14 Oct 2010 9:07] Bugs System
Pushed into mysql-5.1-telco-6.2 5.1.51-ndb-6.2.19 (revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (version source revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (merge vers: 5.1.51-ndb-6.2.19) (pib:21)
[14 Oct 2010 15:23] Jon Stephens
Looks like this is in 5.1.51. Please verify release version. Thanks!
[1 Nov 2010 19:00] Bugs System
Pushed into mysql-5.1 5.1.53 (revid:build@mysql.com-20101101184443-o2olipi8vkaxzsqk) (version source revid:build@mysql.com-20101101184443-o2olipi8vkaxzsqk) (merge vers: 5.1.53) (pib:21)
[9 Nov 2010 19:48] Bugs System
Pushed into mysql-5.5 5.5.7-rc (revid:sunanda.menon@sun.com-20101109182959-otkxq8vo2dcd13la) (version source revid:marko.makela@oracle.com-20100824081003-v4ecy0tga99cpxw2) (merge vers: 5.1.50) (pib:21)
[13 Nov 2010 16:20] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:marko.makela@oracle.com-20100824081003-v4ecy0tga99cpxw2) (merge vers: 5.1.50) (pib:21)
[13 Nov 2010 16:40] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:marko.makela@oracle.com-20100824081003-v4ecy0tga99cpxw2) (pib:21)
[9 Feb 2011 13:20] Marko Mäkelä
It looks like the fix was never merged to mysql-5.5-innodb and mysql-trunk-innodb. This resurfaced as Bug #59622.
[9 Feb 2011 13:23] Marko Mäkelä
The commits for this bug in 5.1:

revno: 3528
revision-id: jimmy.yang@oracle.com-20100629024137-690sacm5sogruzvb
parent: sunny.bains@oracle.com-20100625081841-ppulnkjk1qlazh82
committer: Jimmy Yang <jimmy.yang@oracle.com>
branch nick: mysql-5.1-innodb
timestamp: Mon 2010-06-28 19:41:37 -0700
message:
  Check in fix for bug #53756: "ALTER TABLE ADD PRIMARY KEY affects
  crash recovery"
  
  rb://369 approved by Marko
------------------------------------------------------------
revno: 3552
revision-id: jimmy.yang@oracle.com-20100907054137-tpuior7hez4f5ghl
parent: vasil.dimov@oracle.com-20100826150607-w5cgvqrrr0ubhily
committer: Jimmy Yang <jimmy.yang@oracle.com>
branch nick: mysql-5.1-innodb
timestamp: Mon 2010-09-06 22:41:37 -0700
message:
  Port fix for bug #53756 from 5.1 built-in to 5.1 plugin.
[10 Feb 2011 6:49] Jimmy Yang
Even though multiple messages state the fix is ported to 5.5 and trunk, it seems the fix is not present in 5.5-innodb and trunk-innodb. Just ported it to 5.5 and trunk again.
[10 Feb 2011 8:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130966

3313 Jimmy Yang	2011-02-10
      Merge a fix for Bug #53756 ALTER TABLE ADD PRIMARY KEY affects crash recovery
      This was already pushed to mysql-5.1-innodb some time ago
      (revision id jimmy.yang@oracle.com-20100907054137-tpuior7hez4f5ghl)
      but it was not merged to the 5.5 trees.
[10 Feb 2011 9:01] Marko Mäkelä
The fix could perform slightly better. I filed Bug #60079 for the improvement.