Bug #41641 Entire server not accessible during RESTORE with --log-output=TABLE
Submitted: 19 Dec 2008 14:45 Modified: 4 Dec 2009 13:20
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S2 (Serious)
Version:6.0-backup(2894) OS:Any
Assigned to: Chuck Bell CPU Architecture:Any

[19 Dec 2008 14:45] Philip Stoev
Description:
When executing a RESTORE, the RESTORE thread appears to hold a lock that prevents the log to table feature from working. This means that it becomes impossible for new clients to connect and the old clients are unable to issue any new queries during the RESTORE.

I believe this is an issue for the sysadmin because he will be running RESTORE under pressure and she should not be locked out of the server for the duration of the RESTORE.

How to repeat:
Attempt a long RESTORE operation while running as --log-output=TABLE . Observe that no new connections are accepted and no new queries can be issued on the already connected clients for the duration of the RESTORE.

Suggested fix:
Make RESTORE not hold the lock it is holding or the log-to-table not request that lock while operating.
[19 Dec 2008 14:46] Philip Stoev
Here is the RESTORE thread waiting for Falcon to drop a table:

Thread 3 (Thread 0x7f3571e58950 (LWP 15674)):
#0  0x000000315b00b58d in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000a5ff67 in Synchronize::sleep (this=0x70684b0, milliseconds=10000, callersMutex=0x7f358149c1a0) at Synchronize.cpp:183
---Type <return> to continue, or q <return> to quit---
#2  0x0000000000970a69 in SyncObject::wait (this=0x7f358149c190, type=Exclusive, thread=0x70684b0, sync=0x7f3571e51eb0, timeout=0) at SyncObject.cpp:658
#3  0x0000000000970fea in SyncObject::lock (this=0x7f358149c190, sync=0x7f3571e51eb0, type=Exclusive, timeout=0) at SyncObject.cpp:449
#4  0x000000000096f495 in Sync::lock (this=0x7f3571e51eb0, type=Exclusive) at Sync.cpp:58
#5  0x00000000009b79b1 in Database::dropTable (this=0x7f358130abc8, table=0x7f35815052e8, transaction=0x0) at Database.cpp:1454
#6  0x0000000000a59047 in Nfs::Statement::dropTable (this=0x7f35811185f8, syntax=0x7f358110c980) at Statement.cpp:1145
#7  0x0000000000a5d10b in Nfs::Statement::executeDDL (this=0x7f35811185f8) at Statement.cpp:833
#8  0x0000000000a5e4e1 in Nfs::Statement::execute (this=0x7f35811185f8, sqlString=0x7f3571e52110 "drop table TEST.\"TABLE100000_FALCON_INT_AUTOINC\"",
    isQuery=false) at Statement.cpp:2702
#9  0x0000000000a5e564 in Nfs::Statement::execute (this=0x7f35811185f8, sqlString=0x7f3571e52110 "drop table TEST.\"TABLE100000_FALCON_INT_AUTOINC\"")
    at Statement.cpp:162
#10 0x0000000000960b32 in StorageDatabase::deleteTable (this=0x7f3581109210, storageConnection=0x7f358129f650, tableShare=0x7f3581146078)
    at StorageDatabase.cpp:543
#11 0x000000000096b343 in StorageTableShare::deleteTable (this=0x7f3581146078, storageConnection=0x7f358129f650) at StorageTableShare.cpp:244
#12 0x00000000009688b7 in StorageTable::deleteTable (this=0x72602d8) at StorageTable.cpp:88
#13 0x000000000095afbe in StorageInterface::delete_table (this=0x5e5de18, tableName=0x7f3571e53760 "./test/table100000_falcon_int_autoinc")
    at ha_falcon.cpp:1036
#14 0x0000000000800a53 in handler::ha_delete_table (this=0x5e5de18, name=0x7f3571e53760 "./test/table100000_falcon_int_autoinc") at handler.cc:3268
#15 0x0000000000807155 in ha_delete_table (thd=0x4ffdb88, table_type=0x38690d8, path=0x7f3571e53760 "./test/table100000_falcon_int_autoinc",
    db=0x5e5db98 "test", alias=0x5e5db9d "table100000_falcon_int_autoinc", generate_warning=false) at handler.cc:1910
#16 0x000000000082a62a in mysql_rm_table_part2 (thd=0x4ffdb88, tables=0x5e5d838, if_exists=true, drop_temporary=false, drop_view=true, dont_log_query=true)
    at sql_table.cc:1733
#17 0x00000000008181a5 in mysql_rm_known_files (thd=0x4ffdb88, dirp=0x5be9078, db=0x5e5d830 "test", org_path=0x7f3571e53fc0 "./test/", level=0,
    dropped_tables=0x7f3571e542d0) at sql_db.cc:1160
#18 0x0000000000818545 in mysql_rm_db (thd=0x4ffdb88, db=0x5e5d830 "test", if_exists=true, silent=false) at sql_db.cc:925
#19 0x00000000006cfc68 in mysql_execute_command (thd=0x4ffdb88) at sql_parse.cc:3610
#20 0x0000000000773101 in Execute_sql_statement::execute_server_code (this=0x7f3571e559a0, thd=0x4ffdb88) at sql_prepare.cc:2946
#21 0x0000000000772e82 in Prepared_statement::execute_server_runnable (this=0x7f3571e55530, server_runnable=0x7f3571e559a0) at sql_prepare.cc:3426
#22 0x0000000000773bb5 in Ed_connection::execute_direct (this=0x7f3571e55a00, server_runnable=0x7f3571e559a0) at sql_prepare.cc:3957
#23 0x0000000000773ca3 in Ed_connection::execute_direct (this=0x7f3571e55a00, sql_text={str = 0x41031c8 "DROP DATABASE IF EXISTS `test`", length = 30})
    at sql_prepare.cc:3923
#24 0x00000000008c7834 in obs::Abstract_obj::drop (this=0x415f680, thd=0x4ffdb88) at si_objects.cc:827
#25 0x00000000008c78ff in obs::Abstract_obj::create (this=0x415f680, thd=0x4ffdb88) at si_objects.cc:758
#26 0x0000000000cc4e16 in bcat_create_item (catalogue=0x41fb278, item=0x1d43e00, create_stmt=
      {begin = 0x4103288 "63 CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */\n", end = 0x41032cb ""}, other_meta_data=
      {begin = 0x0, end = 0x0}) at kernel.cc:2028
#27 0x0000000000cd5a03 in read_and_create_items (s=0x5e66b58, cat=0x41fb278, kind=GLOBAL_ITEM) at stream_v1.c:1641
#28 0x0000000000cd5411 in bstream_rd_meta_data (s=0x5e66b58, cat=0x41fb278) at stream_v1.c:1305
warning: (Internal error: pc 0xccb4bb in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0xccb4ba in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0xccb4ba in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0xccb4ba in read in psymtab, but not in symtab.)

#29 0x0000000000ccb4bb in backup::read_meta_data (info=warning: (Internal error: pc 0xccb4ba in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0xccb496 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0xccb4ba in read in psymtab, but not in symtab.)

@0x41fb270, s=@0x5e66b50) at stream.h:210
#30 0x0000000000cc6b0c in Backup_restore_ctx::do_restore (this=0x7f3571e56580, overwrite=true) at kernel.cc:1222
#31 0x0000000000cc80b4 in execute_backup_command (thd=0x4ffdb88, lex=0x4fff678, backupdir=0x7f3571e572a0, overwrite=true) at kernel.cc:225
#32 0x00000000006cbf9c in mysql_execute_command (thd=0x4ffdb88) at sql_parse.cc:2338
#33 0x00000000006d39f9 in mysql_parse (thd=0x4ffdb88, inBuf=0x5e83990 "RESTORE FROM '/tmp/rqg_backup' OVERWRITE", length=40, found_semicolon=0x7f3571e57f00)
    at sql_parse.cc:5732
#34 0x00000000006d45e4 in dispatch_command (command=COM_QUERY, thd=0x4ffdb88, packet=0x52b8429 "RESTORE FROM '/tmp/rqg_backup' OVERWRITE", packet_length=40)
    at sql_parse.cc:1007
---Type <return> to continue, or q <return> to quit---
#35 0x00000000006d5b0d in do_command (thd=0x4ffdb88) at sql_parse.cc:690
#36 0x00000000006c3c91 in handle_one_connection (arg=0x4ffdb88) at sql_connect.cc:1145
#37 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#38 0x000000315a4e627d in clone () from /lib64/libc.so.6
[19 Dec 2008 14:46] Philip Stoev
And here is a new client attempting to connect and hanging because of a lock in logging to table:

#2  0x000000315b008a9e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000c32141 in safe_mutex_lock (mp=0x12dda00, try_lock=0 '\0', file=0xdacf8d "sql_base.cc", line=2549) at thr_mutex.c:153
#4  0x000000000071d60e in open_table (thd=0x5c00288, table_list=0x7f3571c4ea60, mem_root=0x5c02d28, action=0x7f3571c4e99c, flags=0) at sql_base.cc:2549
#5  0x000000000071e11d in open_ltable (thd=0x5c00288, table_list=0x7f3571c4ea60, lock_type=TL_WRITE_CONCURRENT_INSERT, lock_flags=27) at sql_base.cc:4063
#6  0x000000000071e3e5 in open_performance_schema_table (thd=0x5c00288, one_table=0x7f3571c4ea60, backup=0x7f3571c4efd0) at sql_base.cc:8026
#7  0x000000000079a96d in Log_to_csv_event_handler::log_general (this=0x386bd90, thd=0x5c00288, event_time=1229696213,
    user_host=0x7f3571c4f120 "root[root] @ localhost [127.0.0.1]", user_host_len=34, thread_id=69, command_type=0xd5b3c0 "Connect", command_type_len=7,
    sql_text=0x7f3571c4f3b0 "root@localhost on ", sql_text_len=18, client_cs=0x12d2f60) at log.cc:414
#8  0x00000000007975ad in LOGGER::general_log_write (this=0x12e5360, thd=0x5c00288, command=COM_CONNECT, query=0x7f3571c4f3b0 "root@localhost on ",
    query_length=18) at log.cc:2009
#9  0x00000000007976fb in LOGGER::general_log_print (this=0x12e5360, thd=0x5c00288, command=COM_CONNECT, format=0xd59adf "%s@%s on %s", args=0x7f3571c4f800)
    at log.cc:2028
#10 0x000000000079781e in general_log_print (thd=0x5c00288, command=COM_CONNECT, format=0xd59adf "%s@%s on %s") at log.cc:5968
#11 0x00000000006c2784 in check_user (thd=0x5c00288, command=COM_CONNECT, passwd=0x4f81c2e "", passwd_len=0, db=0x0, check_count=true) at sql_connect.cc:435
#12 0x00000000006c3a5f in check_connection (thd=0x5c00288) at sql_connect.cc:916
#13 0x00000000006c3af5 in login_connection (thd=0x5c00288) at sql_connect.cc:974
#14 0x00000000006c3c70 in handle_one_connection (arg=0x5c00288) at sql_connect.cc:1133
#15 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#16 0x000000315a4e627d in clone () from /lib64/libc.so.6
[19 Dec 2008 14:46] Philip Stoev
See related bug http://bugs.mysql.com/bug.php?id=35049
[23 Dec 2008 15:26] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior with test data. Could you please send backup file you have problem with?
[15 Jan 2009 11:39] Susanne Ebrecht
Philip,

I also can't reproduce this.

Also I don't understand if this is Falcon related or not.
[10 Sep 2009 10:47] Philip Stoev
Susanne,

I have been unable to reproduce consistently . I do not have the original backup file and I have no idea if this is Falcon-related.

I will try to build a new test for this in the future and see if I can hit it. In the meantime, let's leave it in Analyzing, do not consider it an active RESTORE bug and do not concern yourself with it.

Also note that this bug is about --log-output=TABLE. This feature is known to have issues, has been known to cause extra deadlocks, and is OFF by default.
[26 Nov 2009 12:54] Philip Stoev
Ranger, please be on the lookout for this situation when you do large-scale backup/restore testing. During RESTORE, the server should not lock up with --log-output=table. Thank you!
[2 Dec 2009 10:29] Lars Thalmann
Most likely this has been fixed.
It is not reproducible.
Closing bug
[2 Dec 2009 16:50] Alexey Stroganov
I was able to repeat this issue with the recent 6.0-backup tree.

I've started server with --log-output=TABLE option and started RESTORE of big enough database(backup image 9G) and tried to execute another statement in parallel that was blocked and aborted with following info:

mysql> create database sbtest;
ERROR 1749 (HY000): The backup wait timeout has expired for query 'create database sbtest'.
+----+------+-----------------+----------+---------+------+-----------------------------+-------------------------------------------------------------------+
| Id | User | Host            | db       | Command | Time | State                       | Info                                                              |
+----+------+-----------------+----------+---------+------+-----------------------------+-------------------------------------------------------------------+
| 6  | root | localhost:48696 | dbt3sf10 | Query   | 357  |                             | restore from '/data0/ab/tmp/dbt3sf10.INNODB.bk-1259770970' |
| 13 | root | localhost       |          | Query   | 14   | BML: waiting until released | create database sbtest                                            |
| 18 | root | localhost       |          | Query   | 0    |                             | show processlist                                                  |
+----+------+-----------------+----------+---------+------+-----------------------------+-------------------------------------------------------------------+

below backtraces for two threads - one that performs backup and blocked one:

Thread 4 (Thread 1079863616 (LWP 24956)):
#0  0x00002b6fa642d457 in _int_malloc () from /lib64/libc.so.6
#1  0x00002b6fa642ed36 in malloc () from /lib64/libc.so.6
#2  0x000000000084041b in mem_heap_create_block (heap=0x0, n=168, init_block=0x0, type=0,
    file_name=0xae4605 "page/page0cur.c", line=934) at mem/mem0mem.c:362
#3  0x0000000000859398 in rec_get_offsets_func (rec=0x2aab3480ef7c "\200K\017�\200", index=0x1936e5e0,
    offsets=<value optimized out>, n_fields=<value optimized out>, heap=0x405d1108, file=0x20 <Address 0x20 out of bounds>,
    line=934) at ../../storage/innobase/include/mem0mem.ic:424
#4  0x000000000084cd39 in page_cur_insert_rec_low (cursor=0x405d2368, tuple=0x2aad95901be0, index=0x1936e5e0, rec=0x0,
    offsets=0x0, mtr=0x405d1780) at page/page0cur.c:933
#5  0x00000000007ebbc3 in btr_cur_optimistic_insert (flags=0, cursor=0x405d2360, entry=<value optimized out>,
    rec=0x405d2408, big_rec=0x405d2400, thr=0x2aad95901488, mtr=0x405d1780) at btr/btr0cur.c:1052
#6  0x000000000085eed3 in row_ins_index_entry_low (mode=2, index=0x1936e5e0, entry=0x2aad95901be0, ext_vec=0x0,
    n_ext_vec=0, thr=0x2aad95901488) at row/row0ins.c:2093
#7  0x000000000085f682 in row_ins (node=0x2aad95901040, thr=0x2aad95901488) at row/row0ins.c:2180
#8  0x000000000085f99b in row_ins_step (thr=0x2aad95901488) at row/row0ins.c:2501
#9  0x0000000000862f26 in row_insert_for_mysql (mysql_rec=0x19392f28 "", prebuilt=0x193942e0) at row/row0mysql.c:1161
#10 0x00000000007e906e in ha_innobase::write_row (this=0x19392b48, record=0x19392f28 "") at handler/ha_innodb.cc:3832
#11 0x000000000071faa3 in handler::ha_write_row (this=0x19392b48, buf=0x19392f28 "") at handler.cc:5583
#12 0x00000000009fbe6a in default_backup::Restore::send_data (this=0x1939c170, buf=@0x405d3680) at be_default.cc:886
#13 0x00000000009fa3b8 in backup::restore_table_data (thd=<value optimized out>, info=@0x1935aff0, s=@0x193520e0)
    at data_backup.cc:1771
#14 0x00000000009f7dfe in Backup_restore_ctx::do_restore (this=0x405d3850, overwrite=<value optimized out>)
    at kernel.cc:1659
#15 0x00000000009f82bc in execute_backup_command (thd=0x19315dc0, lex=0x193175f8, backupdir=0x405d4df0, overwrite=208,
    skip_gap_event=false) at kernel.cc:296
#16 0x00000000006232f7 in mysql_execute_command (thd=0x19315dc0) at sql_parse.cc:2480
---Type <return> to continue, or q <return> to quit---
#17 0x0000000000627065 in mysql_parse (thd=0x19315dc0,
    inBuf=0x19341108 "restore from '/data0/ranger/ab/tmp/dbt3sf10.INNODB.bk-1259770970'", length=65,
    found_semicolon=0x405d6120) at sql_parse.cc:5983
#18 0x0000000000627f64 in dispatch_command (command=COM_QUERY, thd=0x19315dc0,
    packet=0x193390d1 "restore from '/data0/ranger/ab/tmp/dbt3sf10.INNODB.bk-1259770970'",
    packet_length=<value optimized out>) at sql_parse.cc:1078
#19 0x000000000061b555 in handle_one_connection (arg=<value optimized out>) at sql_connect.cc:1164
#20 0x00002b6fa5c07143 in start_thread () from /lib64/libpthread.so.0
#21 0x00002b6fa647f74d in clone () from /lib64/libc.so.6
#22 0x0000000000000000 in ?? ()

Thread 3 (Thread 1080129856 (LWP 25211)):
#0  0x00002b6fa5c0b3b7 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000007b2502 in BML_class::bml_enter (this=0xe1f2d0, thd=0x193210a0) at bml.cc:152
#2  0x0000000000621e1f in mysql_execute_command (thd=0x193210a0) at sql_parse.cc:2065
#3  0x0000000000627065 in mysql_parse (thd=0x193210a0, inBuf=0x19368868 "create database sbtest", length=22,
    found_semicolon=0x40617120) at sql_parse.cc:5983
#4  0x0000000000627f64 in dispatch_command (command=COM_QUERY, thd=0x193210a0, packet=0x19323dd1 "create database sbtest",
    packet_length=<value optimized out>) at sql_parse.cc:1078
#5  0x000000000061b555 in handle_one_connection (arg=<value optimized out>) at sql_connect.cc:1164
#6  0x00002b6fa5c07143 in start_thread () from /lib64/libpthread.so.0
#7  0x00002b6fa647f74d in clone () from /lib64/libc.so.6
#8  0x0000000000000000 in ?? ()
[2 Dec 2009 17:18] Alexey Stroganov
more details:

- both BACKUP and RESTORE are affected
- only DDL statements are blocked with 'BML: waiting until released' and after timeout aborted with The backup wait timeout has expired for query '...'
- i've observed issue while performed backup/restore with both InnoDB and MyISAM engines
[3 Dec 2009 7:33] Philip Stoev
My original complaint was not about the DDL blocker, it was about that under --log-output=table, *ALL* queries hang because the log table remains locked. This includes new connections, DML and DDL regardless of the database. Anything that would cause a write to the general_log.CSV could not proceed.
[4 Dec 2009 13:20] Alexey Stroganov
While I checked --log-output=TABLE I somehow missed that BACKUP/RESTORE blocks DDL completely so please disregard my previous comments. No differences in behavior between --log-output=TABLE or --log-output=FILE so it seems that original issue was fixed so I close the bug.