Bug #47770 100% CPU loop in the MDL_context::wait_for_locks hash
Submitted: 1 Oct 2009 17:46 Modified: 24 Oct 2009 11:38
Reporter: Philip Stoev Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Locking Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Assigned Account CPU Architecture:Any

[1 Oct 2009 17:46] Philip Stoev
Description:
When executing a high-concurrency stress test on MDL, the server hanged with a 100% CPU usage and the following backtrace:

#0  0x0000000000a6269c in my_hash_sort_bin (cs=0x106de20, key=0x7f748b6a3e04 "", len=12, nr1=0x7f748b6a35a8, nr2=0x7f748b6a35a0) at ctype-bin.c:305
#1  0x0000000000a1884d in calc_hash (hash=0x1088f40, key=0x7f748b6a3df8 "", length=12) at hash.c:44
#2  0x0000000000a18c98 in my_hash_first (hash=0x1088f40, key=0x7f748b6a3df8 "", length=12, current_record=0x7f748b6a366c) at hash.c:234
#3  0x0000000000a18c04 in my_hash_search (hash=0x1088f40, key=0x7f748b6a3df8 "", length=12) at hash.c:214
#4  0x00000000008b26bc in MDL_context::wait_for_locks (this=0x7f7498433170, mdl_requests=0x7f748b6a39c0) at mdl.cc:1249
#5  0x00000000006deee1 in Open_table_context::recover_from_failed_open_table_attempt (this=0x7f748b6a39c0, thd=0x7f7498433098, table=0x7f748b6a3a80)
    at sql_base.cc:3696
#6  0x00000000006e0583 in open_ltable (thd=0x7f7498433098, table_list=0x7f748b6a3a80, lock_type=TL_WRITE, lock_flags=0) at sql_base.cc:4653
#7  0x00000000006e0a07 in open_system_table_for_update (thd=0x7f7498433098, one_table=0x7f748b6a3a80) at sql_base.cc:8646
#8  0x000000000087ee61 in open_proc_table_for_update (thd=0x7f7498433098) at sp.cc:270
#9  0x000000000087f614 in sp_create_routine (thd=0x7f7498433098, type=1, sp=0x2da19b0) at sp.cc:758
#10 0x000000000087785d in sp_head::create (this=0x2da19b0, thd=0x7f7498433098) at sp_head.cc:731
#11 0x000000000068a0d4 in mysql_execute_command (thd=0x7f7498433098) at sql_parse.cc:4274
#12 0x000000000068c396 in mysql_parse (thd=0x7f7498433098,
    inBuf=0x2eb8770 "CREATE FUNCTION testdb_S . f1_2_N  () RETURNS INTEGER BEGIN SELECT MAX( `int` ) FROM testdb_S . t1_base_2_N  INTO @my_var ; SELECT MAX( `int` ) FROM testdb_S . t1_temp_2_S  INTO @my_var ; RETURN 1 ; E"..., length=202, found_semicolon=0x7f748b6a5f00) at sql_parse.cc:5991
#13 0x000000000068cfcf in dispatch_command (command=COM_QUERY, thd=0x7f7498433098,
    packet=0x7f7498448b59 "CREATE FUNCTION testdb_S . f1_2_N  () RETURNS INTEGER BEGIN SELECT MAX( `int` ) FROM testdb_S . t1_base_2_N  INTO @my_var ; SELECT MAX( `int` ) FROM testdb_S . t1_temp_2_S  INTO @my_var ; RETURN 1 ; E"..., packet_length=202) at sql_parse.cc:1074
#14 0x000000000068e4be in do_command (thd=0x7f7498433098) at sql_parse.cc:756
#15 0x000000000067b048 in handle_one_connection (arg=0x7f7498433098) at sql_connect.cc:1164
#16 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#17 0x000000315a4e627d in clone () from /lib64/libc.so.6
(gdb) frame 4
#4  0x00000000008b26bc in MDL_context::wait_for_locks (this=0x7f7498433170, mdl_requests=0x7f748b6a39c0) at mdl.cc:1249
1249          if (mdl_request->is_shared() &&
Current language:  auto; currently c++
(gdb) bt
#0  0x0000000000a6269c in my_hash_sort_bin (cs=0x106de20, key=0x7f748b6a3e04 "", len=12, nr1=0x7f748b6a35a8, nr2=0x7f748b6a35a0) at ctype-bin.c:305
#1  0x0000000000a1884d in calc_hash (hash=0x1088f40, key=0x7f748b6a3df8 "", length=12) at hash.c:44
#2  0x0000000000a18c98 in my_hash_first (hash=0x1088f40, key=0x7f748b6a3df8 "", length=12, current_record=0x7f748b6a366c) at hash.c:234
#3  0x0000000000a18c04 in my_hash_search (hash=0x1088f40, key=0x7f748b6a3df8 "", length=12) at hash.c:214
#4  0x00000000008b26bc in MDL_context::wait_for_locks (this=0x7f7498433170, mdl_requests=0x7f748b6a39c0) at mdl.cc:1249
#5  0x00000000006deee1 in Open_table_context::recover_from_failed_open_table_attempt (this=0x7f748b6a39c0, thd=0x7f7498433098, table=0x7f748b6a3a80)
    at sql_base.cc:3696
#6  0x00000000006e0583 in open_ltable (thd=0x7f7498433098, table_list=0x7f748b6a3a80, lock_type=TL_WRITE, lock_flags=0) at sql_base.cc:4653
#7  0x00000000006e0a07 in open_system_table_for_update (thd=0x7f7498433098, one_table=0x7f748b6a3a80) at sql_base.cc:8646
#8  0x000000000087ee61 in open_proc_table_for_update (thd=0x7f7498433098) at sp.cc:270
#9  0x000000000087f614 in sp_create_routine (thd=0x7f7498433098, type=1, sp=0x2da19b0) at sp.cc:758
#10 0x000000000087785d in sp_head::create (this=0x2da19b0, thd=0x7f7498433098) at sp_head.cc:731
#11 0x000000000068a0d4 in mysql_execute_command (thd=0x7f7498433098) at sql_parse.cc:4274
#12 0x000000000068c396 in mysql_parse (thd=0x7f7498433098,
    inBuf=0x2eb8770 "CREATE FUNCTION testdb_S . f1_2_N  () RETURNS INTEGER BEGIN SELECT MAX( `int` ) FROM testdb_S . t1_base_2_N  INTO @my_var ; SELECT MAX( `int` ) FROM testdb_S . t1_temp_2_S  INTO @my_var ; RETURN 1 ; E"..., length=202, found_semicolon=0x7f748b6a5f00) at sql_parse.cc:5991
#13 0x000000000068cfcf in dispatch_command (command=COM_QUERY, thd=0x7f7498433098,
    packet=0x7f7498448b59 "CREATE FUNCTION testdb_S . f1_2_N  () RETURNS INTEGER BEGIN SELECT MAX( `int` ) FROM testdb_S . t1_base_2_N  INTO @my_var ; SELECT MAX( `int` ) FROM testdb_S . t1_temp_2_S  INTO @my_var ; RETURN 1 ; E"..., packet_length=202) at sql_parse.cc:1074
#14 0x000000000068e4be in do_command (thd=0x7f7498433098) at sql_parse.cc:756
#15 0x000000000067b048 in handle_one_connection (arg=0x7f7498433098) at sql_connect.cc:1164
#16 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#17 0x000000315a4e627d in clone () from /lib64/libc.so.6

The reason is that the mdl_request data structure contains an endless loop:

(gdb) print mdl_request
$7 = (MDL_request *) 0x7f748b6a3dd8
(gdb) print * mdl_request
$2 = {type = MDL_SHARED, next_in_list = 0x7f748b6a3dd8, prev_in_list = 0x7f748b6a39c0, ticket = 0x0, key = {
    m_ptr = "\000mysql\000proc", '\0' <repeats 503 times>, m_length = 12, m_db_name_length = 5}}
(gdb) print * mdl_request->next_in_list
$4 = {type = MDL_SHARED, next_in_list = 0x7f748b6a3dd8, prev_in_list = 0x7f748b6a39c0, ticket = 0x0, key = {
    m_ptr = "\000mysql\000proc", '\0' <repeats 503 times>, m_length = 12, m_db_name_length = 5}}
(gdb) print * mdl_request->next_in_list->next_in_list
$6 = {type = MDL_SHARED, next_in_list = 0x7f748b6a3dd8, prev_in_list = 0x7f748b6a39c0, ticket = 0x0, key = {
    m_ptr = "\000mysql\000proc", '\0' <repeats 503 times>, m_length = 12, m_db_name_length = 5}}

The "prev_in_list" pointer also appears to be wrong.

How to repeat:
A test case will be provided once I am able to simplify this.
[4 Oct 2009 13:32] Philip Stoev
The following patch needs to be applied in order to convert the 100% CPU loop into an assertion:

=== modified file 'sql/mdl.cc'
--- sql/mdl.cc  2009-09-23 06:32:02 +0000
+++ sql/mdl.cc  2009-10-01 17:50:34 +0000
@@ -1240,6 +1240,7 @@
     {
       MDL_key *key= &mdl_request->key;
       DBUG_ASSERT(mdl_request->ticket == NULL);
+      DBUG_ASSERT(mdl_request != mdl_request->next_in_list);
       if (!global_lock.is_lock_type_compatible(mdl_request->type, FALSE))
         break;
       /*
[12 Oct 2009 11:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/86558

3647 Magne Mahre	2009-10-12
      Bug #47770 100% CPU loop in the MDL_context::wait_for_locks
                 hash
      
      When executing a high-concurrency stress test on MDL, the 
      server hung with a 100% CPU load. 
      
      At the request of QA, an additional ASSERT has been
      added to MDL_context::wait_for_locks to prevent the
      hang situation and instead bail if an mdl_request
      list object contains a reference back to itself.
[24 Oct 2009 11:38] Philip Stoev
Duplicate of bug #48152
[31 Oct 2009 8:17] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091031081410-qkxmjsdzjmj840aq) (version source revid:magne.mahre@sun.com-20091012115109-idl62rk8hlmxy4xw) (merge vers: 6.0.14-alpha) (pib:13)