Bug #57345 btr_pcur_store_position abort for load with concurrent lock/unlock tables
Submitted: 8 Oct 2010 20:42 Modified: 14 Dec 2010 19:25
Reporter: Victor Kirkebo Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.1.52 OS:Any
Assigned to: Jimmy Yang CPU Architecture:Any

[8 Oct 2010 20:42] Victor Kirkebo
Description:
This might be a duplicate of bug#57248. I'm doing release testing of 5.1.52 and I have only tested for this version so far.

Basic steps:
-----------
Start database and do this:

create database test;
use tst:
create table t1(f1 varchar(255));

then run a stress test with two treads:

client1:
use tst;
ALTER TABLE t1 MODIFY f1 CHAR(255) CHARACTER SET latin1;
ALTER TABLE t1 MODIFY f1 CHAR(255) CHARACTER SET latin2;

client2:
use tst;
set autocommit=0;
lock table t1 write;
unlock tables;

trace:
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(my_print_stacktrace+0x2e)[0x8b420e]
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(handle_segfault+0x322)[0x5e37c2]
/lib64/libpthread.so.0[0x33a8c0de70]
/lib64/libc.so.6(gsignal+0x35)[0x33a8030155]
/lib64/libc.so.6(abort+0x110)[0x33a8031bf0]
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(btr_pcur_store_position+0x397)[0x8458b7]
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(row_search_for_mysql+0x71c)[0x810a3c]
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(_ZN11ha_innobase10index_readEPhPKhj16ha_rkey_function+0x192)[0x781c82]
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(_ZN11ha_innobase11index_firstEPh+0x37)[0x780847]
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(_ZN11ha_innobase8rnd_nextEPh+0x37)[0x7816a7]
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(_Z13rr_sequentialP11READ_RECORD+0x1a)[0x6cffca]
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(_Z17mysql_alter_tableP3THDPcS1_P24st_ha_create_informationP10TABLE_LISTP10Alter_infojP8st_orderb+0x2e45)[0x6f1525]
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(_Z21mysql_execute_commandP3THD+0x9dc)[0x5f47bc]
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(_Z11mysql_parseP3THDPcjPPKc+0x3d8)[0x5f9968]
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x542)[0x5f9eb2]
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(_Z10do_commandP3THD+0xe6)[0x5fb1b6]
/export/home/tmp/vk136562/bug/mysql-advanced-5.1.52-linux-x86_64-glibc23/bin/mysqld(handle_one_connection+0x246)[0x5ed816]
/lib64/libpthread.so.0[0x33a8c062f7]
/lib64/libc.so.6(clone+0x6d)[0x33a80d1b6d]

How to repeat:
1) Download the attached 'bug_test.tar.gz' file and untar it in the mysql-test
directory
2) If running in a source tree, edit the run_bug_test' script and change the value
   of CLIENT_PATH=$CUR_PATH/bin to CLIENT_PATH=$CUR_PATH/client
3) Run the ./run_bug_test script
[8 Oct 2010 20:45] Victor Kirkebo
script for reproducing the bug

Attachment: bug_test.tar.gz (application/x-gzip, text), 1009 bytes.

[8 Oct 2010 20:47] Victor Kirkebo
I tested 5.1.51. I did not see the bug for that version.
[9 Oct 2010 5:02] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior with current BZR sources. How long should I run the test?
[10 Oct 2010 11:08] Victor Kirkebo
I am also able to reproduce this bug with a very simple insert/delete load in concurrency with a lock tables <tblname> write;/unlock tables; load. It also seems to be necessary to use "set autocommit=0" before the lock tables command.

I will upload another test script shortly. Unlike the previous test script (for alter table load) this one needs to run with more threads (10 instead of 2) and also the run_load.txt file contains many more entries for the lock tables/unlock table test (10 instead of just 1 before).

The two test types that are running concurrently are doing this:

test1:
------
use tst;
set autocommit=0;
lock tables t1 write;
unlock tables;

test2:
------
use tst;
insert into t1(f1) values('a');
delete from t1;
[10 Oct 2010 11:10] Victor Kirkebo
This test is using an insert/delete load while the table is being locked/unlocked

Attachment: bug57345_test.tar.gz (application/x-gzip, text), 1016 bytes.

[10 Oct 2010 11:13] Victor Kirkebo
I am changing the synopsis of the bug report to better match the issue at hand.
[10 Oct 2010 11:22] Victor Kirkebo
How to repeat:
1) Download the attached 'bug57345_test.tar.gz' file and untar it in the mysql-test directory
2) If running in a source tree, edit the run_wait' script and change the value
   of CLIENT_PATH=$CUR_PATH/bin to CLIENT_PATH=$CUR_PATH/client
3) Run the ./run_bug_test script
4) This should reproduce the bug within a short time; often within a few seconds. However, the test will run for up to 5 minutes if nothing goes wrong - in which case you might consider adding more threads for the load. Do this by editing the line containing "--threads=10" in the ./run_bug_test script.
[11 Oct 2010 6:40] Jimmy Yang
This does seem to related to bug #56716 fix.

When the fix moves lock table code up, it is possible we call "goto lock_wait_or_error" without pcur being positioned and hit assertion failure in btr_pcur_store_position():

if (!prebuilt->sql_stat_start) {
   ....
} else if (prebuilt->select_lock_type == LOCK_NONE) {
   ...
} else {
 err = lock_table(0, index->table,
                  prebuilt->select_lock_type == LOCK_S
                  ? LOCK_IS : LOCK_IX, thr);
  
                if (err != DB_SUCCESS) {

                         goto lock_wait_or_error; <====
                 }
                 prebuilt->sql_stat_start = FALSE;
         }
}

lock_wait_or_error:
    ..
    btr_pcur_store_position();  <====

Assertion failure in btr_pcur_store_position():  ut_a(cursor->pos_state == BTR_PCUR_IS_POSITIONED);

It is confirmed by following debugger session:
Put a break point at above "goto lock_wait_or_error", and run the repro. It immediately hit this breakpoint. And as described, then triggered the assertion failure:

Breakpoint 2, row_search_for_mysql (buf=0x8d5dd00 "\377", mode=1, 
    prebuilt=0xb75b0068, match_mode=0, direction=0) at row/row0sel.c:3631
3631                            goto lock_wait_or_error;
Current language:  auto
The current source language is "auto; currently c".
(gdb) where
#0  row_search_for_mysql (buf=0x8d5dd00 "\377", mode=1, prebuilt=0xb75b0068, 
    match_mode=0, direction=0) at row/row0sel.c:3631
#1  0x084539f6 in ha_innobase::index_read (this=0x8d5dbb0, 
    buf=0x8d5dd00 "\377", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY)
    at handler/ha_innodb.cc:4725
#2  0x0845425c in ha_innobase::index_first (this=0x8d5dbb0, 
    buf=0x8d5dd00 "\377") at handler/ha_innodb.cc:4989
#3  0x084544b7 in ha_innobase::rnd_next (this=0x8d5dbb0, buf=0x8d5dd00 "\377")
    at handler/ha_innodb.cc:5086
#4  0x0836c55b in rr_sequential (info=0xb3177ce4) at records.cc:385
#5  0x083988d6 in copy_data_between_tables (from=0x8d360c0, to=0x8d35610, 
    create=..., ignore=false, order_num=0, order=0x0, copied=0xb31782d0, 
    deleted=0xb31782c8, keys_onoff=LEAVE_AS_IS, error_if_not_empty=false)
    at sql_table.cc:7810
#6  0x08396919 in mysql_alter_table (thd=0x8d5bc18, new_db=0x8d6d708 "tst", 
    new_name=0x8d6d508 "t1", create_info=0xb3179370, table_list=0x8d6d530, 
    alter_info=0xb3179660, order_num=0, order=0x0, ignore=false)
    at sql_table.cc:7279
#7  0x08248740 in mysql_execute_command (thd=0x8d5bc18) at sql_parse.cc:2968
#8  0x082524b1 in mysql_parse (thd=0x8d5bc18, 
    rawbuf=0x8d6d450 "ALTER TABLE t1 MODIFY f1 CHAR(255) CHARACTER SET latin1", length=55, found_semicolon=0xb317a05c) at sql_parse.cc:6051
#9  0x082442c6 in dispatch_command (command=COM_QUERY, thd=0x8d5bc18,

(gdb) p pcur->pos_state
$1 = 2
(gdb) n
4400            if (UNIV_UNLIKELY(prebuilt->row_read_type
(gdb) 
4404            did_semi_consistent_read = FALSE;
(gdb) 
4408            btr_pcur_store_position(pcur, &mtr);
(gdb) p pcur->pos_state
$2 = 2
(gdb) step
btr_pcur_store_position (cursor=0xb75adb68, mtr=0xb31776f0)
    at btr/btr0pcur.c:83
83              ut_a(cursor->pos_state == BTR_PCUR_IS_POSITIONED);
(gdb) n
101011  9:35:17  InnoDB: Assertion failure in thread 3004672880 in file btr/btr0pcur.c line 83
InnoDB: Failing assertion: cursor->pos_state == BTR_PCUR_IS_POSITIONED
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
[11 Oct 2010 7:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120469

3624 Jimmy Yang	2010-10-11
      Fix Bug #57345  btr_pcur_store_position abort for load with concurrent
      lock/unlock tables.
      
      Fix approved by Marko.
[11 Oct 2010 7:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120471

3625 Jimmy Yang	2010-10-11
      Fix Bug #57345 btr_pcur_store_position abort for load with concurrent
      lock/unlock tables
      
      Fix approved by Marko
[11 Oct 2010 8:04] Marko Mäkelä
OK to push. Sorry, this was a bug planted by me in the Bug #56716 fix. The function row_search_for_mysql() is really too big to maintain reliably.
[11 Oct 2010 8:35] Jimmy Yang
Fix checked in mysql-5.1-innodb built-in and zip. Ported the fix to mysql-5.5-innodb.
[11 Oct 2010 12:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120499

3624 Jimmy Yang	2010-10-11
      A more complete fix for bug #57345 btr_pcur_store_position abort for load
      with concurrent lock/unlock tables
      
      Approved by Marko
[11 Oct 2010 12:46] Marko Mäkelä
I approve the complete fix.
[14 Oct 2010 22:04] Omer Barnir
triage: setting to SR51MRU (just for values to be consistent)
[1 Nov 2010 19:00] Bugs System
Pushed into mysql-5.1 5.1.53 (revid:build@mysql.com-20101101184443-o2olipi8vkaxzsqk) (version source revid:build@mysql.com-20101101184443-o2olipi8vkaxzsqk) (merge vers: 5.1.53) (pib:21)
[1 Nov 2010 21:01] John Russell
Added to change log:

The server could crash with a high volume of concurrent LOCK TABLE
and UNLOCK TABLES statements.
[9 Nov 2010 19:44] Bugs System
Pushed into mysql-5.5 5.5.7-rc (revid:sunanda.menon@sun.com-20101109182959-otkxq8vo2dcd13la) (version source revid:sunanda.menon@sun.com-20101109182959-otkxq8vo2dcd13la) (merge vers: 5.5.7-rc) (pib:21)
[13 Nov 2010 16:23] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:alexander.nozdrin@oracle.com-20101113152450-2zzcm50e7i4j35v7) (merge vers: 5.6.1-m4) (pib:21)
[13 Nov 2010 16:31] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:alexander.nozdrin@oracle.com-20101113152540-gxro4g0v29l27f5x) (pib:21)
[18 Nov 2010 15:54] Bugs System
Pushed into mysql-5.1 5.1.54 (revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (version source revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (merge vers: 5.1.54) (pib:21)
[5 Dec 2010 12:38] Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (version source revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (merge vers: 5.6.1) (pib:23)
[16 Dec 2010 22:27] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)