Bug #51377 Crash in information_schema / processlist on concurrent DDL workload
Submitted: 22 Feb 2010 9:21 Modified: 18 Jun 2010 2:05
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Information schema Severity:S2 (Serious)
Version:5.1.46-bzr, next-mr OS:Any
Assigned to: Alexey Botchkov CPU Architecture:Any
Tags: rqg_pb2
Triage: Triaged: D1 (Critical)

[22 Feb 2010 9:21] Philip Stoev
Description:
When executing a random DDL workload, mysqld is crashing as follows:

#2  0x000000000064257d in handle_segfault (sig=11) at mysqld.cc:2741
#3  <signal handler called>
#4  0x000000315a4841ae in memcpy () from /lib64/libc.so.6
#5  0x0000000000984475 in _mi_rec_pack (info=0x7f014c015f48, to=0x7f014c0109d3 "\r", from=0x7f014c0150ea "\r") at mi_dynrec.c:978
#6  0x00000000009817fa in _mi_write_blob_record (info=0x7f014c015f48, record=0x7f014c014e38 "Ь╠\021") at mi_dynrec.c:271
#7  0x000000000098875d in mi_write (info=0x7f014c015f48, record=0x7f014c014e38 "Ь╠\021") at mi_write.c:146
#8  0x000000000099f9c0 in ha_myisam::write_row (this=0x7f014c0070a0, buf=0x7f014c014e38 "Ь╠\021") at ha_myisam.cc:785
#9  0x0000000000793e5f in handler::ha_write_row (this=0x7f014c0070a0, buf=0x7f014c014e38 "Ь╠\021") at handler.cc:4670
#10 0x00000000007d5a64 in schema_table_store_record (thd=0x7f0148005328, table=0x7f014c013c28) at sql_show.cc:2317
#11 0x00000000007dfacf in fill_schema_processlist (thd=0x7f0148005328, tables=0x3c49010, cond=0x0) at sql_show.cc:1898
#12 0x00000000007d154e in get_schema_tables_result (join=0x7f014c0028f8, executed_place=PROCESSED_BY_JOIN_EXEC) at sql_show.cc:6630
#13 0x00000000006ed41e in JOIN::exec (this=0x7f014c0028f8) at sql_select.cc:1809
#14 0x00000000005ed37b in subselect_single_select_engine::exec (this=0x3c49618) at item_subselect.cc:1965
#15 0x00000000005efc90 in Item_subselect::exec (this=0x3c49528) at item_subselect.cc:261
#16 0x00000000005eba9f in Item_singlerow_subselect::val_int (this=0x3c49528) at item_subselect.cc:571
#17 0x00000000005aa770 in Arg_comparator::compare_int_signed (this=0x3c49708) at item_cmpfunc.cc:1454
#18 0x000000000058daf2 in Arg_comparator::compare (this=0x3c49708) at item_cmpfunc.h:84
#19 0x00000000005aea65 in Item_func_lt::val_int (this=0x3c49658) at item_cmpfunc.cc:1911
#20 0x0000000000568686 in Item::val_bool (this=0x3c49658) at item.cc:184
#21 0x00000000005ad8f5 in Item_cond_and::val_int (this=0x3c497d0) at item_cmpfunc.cc:4389
#22 0x0000000000594e0a in Item_func_set_user_var::check (this=0x7f014c001760, use_result_field=false) at item_func.cc:4254
#23 0x00000000006608b5 in set_var_user::check (this=0x7f014c001ba0, thd=0x7f0148005328) at set_var.cc:656
#24 0x0000000000660a6b in sql_set_variables (thd=0x7f0148005328, var_list=0x7f0148007770) at set_var.cc:538
#25 0x00000000006585df in mysql_execute_command (thd=0x7f0148005328) at sql_parse.cc:3242
#26 0x000000000065c832 in mysql_parse (thd=0x7f0148005328,
    inBuf=0x3c48028 "SET @kill_id = CONNECTION_ID() AND 8 + 3 < (SELECT MAX(id) FROM information_schema.processlist) AND 10 > (SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME='Uptime')", length=195, found_semicolon=0x7f01570deee0) at sql_parse.cc:5581
#27 0x000000000065d44b in dispatch_command (command=COM_QUERY, thd=0x7f0148005328,
    packet=0x7f0148026769 "SET @kill_id = CONNECTION_ID() AND 8 + 3 < (SELECT MAX(id) FROM information_schema.processlist) AND 10 > (SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME='Uptime') ", packet_length=196) at sql_parse.cc:1023
#28 0x000000000065e8eb in do_command (thd=0x7f0148005328) at sql_parse.cc:709
#29 0x000000000064cacb in do_handle_one_connection (thd_arg=0x7f0148005328) at sql_connect.cc:1174
#30 0x000000000064cb9a in handle_one_connection (arg=0x7f0148005328) at sql_connect.cc:1113
#31 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#32 0x000000315a4e627d in clone () from /lib64/libc.so.6

How to repeat:
The core and the binary will be made available.
[22 Feb 2010 9:27] Philip Stoev
Core and binary:

http://mysql-systemqa.s3.amazonaws.com/var-bug51377.zip

Source:

revision-id: alik@sun.com-20100220092730-paoo5t9gcajs5dq8
date: 2010-02-20 12:27:30 +0300
build-date: 2010-02-22 11:21:38 +0200
revno: 2936
branch-nick: mysql-next-mr
[22 Feb 2010 9:41] Philip Stoev
Here is another type of backtrace that happens on the same test for the same type of queries:

#4  0x0000000000a3285d in my_utf8_uni (cs=0x1146340, pwc=0x7f18107625b8, s=0x0, e=0x26 <Address 0x26 out of bounds>) at ctype-utf8.c:1954
#5  0x0000000000a2a853 in my_well_formed_len_mb (cs=0x1146340, b=0x0, e=0x26 <Address 0x26 out of bounds>, pos=38, error=0x7f181076264c) at ctype-mb.c:401
#6  0x0000000000639f5c in well_formed_copy_nchars (to_cs=0x1146340,
    to=0x26288c8 "SAVEPOINT B@kill_idITH CONSISTENT SNAPSHOTVALUES ( 7 )_S . t1_base2_N   A WHERE A.`pk` IN ( SELECT `pk` FROM testdb_N . t1_merge1_N  AS B WHERE B.`pk` = 5 )` BETWEEN 5 AND 5 + 1 LIMIT 1E='Uptime')4 + "..., to_length=114, from_cs=0x1146340, from=0x0, from_length=38, nchars=38,
    well_formed_error_pos=0x7f1810762780, cannot_convert_error_pos=0x7f1810762778, from_end_pos=0x7f1810762770) at sql_string.cc:978
#7  0x0000000000615140 in Field_blob::store (this=0x29c2908, from=0x0, length=38, cs=0x1146340) at field.cc:7432
#8  0x00000000007dfaa6 in fill_schema_processlist (thd=0x7f180420a418, tables=0x27b0088, cond=0x0) at sql_show.cc:1893
#9  0x00000000007d154e in get_schema_tables_result (join=0x2674a78, executed_place=PROCESSED_BY_JOIN_EXEC) at sql_show.cc:6630
#10 0x00000000006ed41e in JOIN::exec (this=0x2674a78) at sql_select.cc:1809
#11 0x00000000005ed37b in subselect_single_select_engine::exec (this=0x27b0690) at item_subselect.cc:1965
#12 0x00000000005efc90 in Item_subselect::exec (this=0x27b05a0) at item_subselect.cc:261
#13 0x00000000005eba9f in Item_singlerow_subselect::val_int (this=0x27b05a0) at item_subselect.cc:571
#14 0x00000000005aa770 in Arg_comparator::compare_int_signed (this=0x27b0780) at item_cmpfunc.cc:1454
#15 0x000000000058daf2 in Arg_comparator::compare (this=0x27b0780) at item_cmpfunc.h:84
#16 0x00000000005aea65 in Item_func_lt::val_int (this=0x27b06d0) at item_cmpfunc.cc:1911
#17 0x0000000000568686 in Item::val_bool (this=0x27b06d0) at item.cc:184
#18 0x00000000005ad8f5 in Item_cond_and::val_int (this=0x27b0848) at item_cmpfunc.cc:4389
#19 0x0000000000594e0a in Item_func_set_user_var::check (this=0x2798e00, use_result_field=false) at item_func.cc:4254
#20 0x00000000006608b5 in set_var_user::check (this=0x2799240, thd=0x7f180420a418) at set_var.cc:656
#21 0x0000000000660a6b in sql_set_variables (thd=0x7f180420a418, var_list=0x7f180420c860) at set_var.cc:538
#22 0x00000000006585df in mysql_execute_command (thd=0x7f180420a418) at sql_parse.cc:3242
#23 0x000000000065c832 in mysql_parse (thd=0x7f180420a418,
    inBuf=0x27af098 "SET @kill_id = CONNECTION_ID() AND 64 + 3 < (SELECT MAX(id) FROM information_schema.processlist) AND 10 > (SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME='Uptime')", length=196, found_semicolon=0x7f1810764ee0) at sql_parse.cc:5581
#24 0x000000000065d44b in dispatch_command (command=COM_QUERY, thd=0x7f180420a418,
    packet=0x7f18042f52b9 "SET @kill_id = CONNECTION_ID() AND 64 + 3 < (SELECT MAX(id) FROM information_schema.processlist) AND 10 > (SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME='Uptime') ", packet_length=197) at sql_parse.cc:1023
#25 0x000000000065e8eb in do_command (thd=0x7f180420a418) at sql_parse.cc:709
#26 0x000000000064cacb in do_handle_one_connection (thd_arg=0x7f180420a418) at sql_connect.cc:1174
#27 0x000000000064cb9a in handle_one_connection (arg=0x7f180420a418) at sql_connect.cc:1113
#28 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#29 0x000000315a4e627d in clone () from /lib64/libc.so.6

(gdb) list
1949      uchar c;
1950
1951      if (s >= e)
1952        return MY_CS_TOOSMALL;
1953
1954      c= s[0];
1955      if (c < 0x80)
1956      {
1957        *pwc = c;
1958        return 1;

(gdb) print c
$1 = 0 '\0'

(gdb) print s[0]
Cannot access memory at address 0x0
[22 Feb 2010 10:52] Philip Stoev
Valgrind also agrees:

==11358== Thread 19:
==11358== Invalid read of size 1
==11358==    at 0xA2ECF9: my_utf8_uni (ctype-utf8.c:1954)
==11358==    by 0xA26CEE: my_well_formed_len_mb (ctype-mb.c:401)
==11358==    by 0x639DAF: well_formed_copy_nchars(charset_info_st*, char*, unsigned, charset_info_st*, char const*, unsigned, unsigned, char const**, char const**, char const**) (sql_string.cc:978)
==11358==    by 0x6151C1: Field_blob::store(char const*, unsigned, charset_info_st*) (field.cc:7432)
==11358==    by 0x7DF2F9: fill_schema_processlist(THD*, TABLE_LIST*, Item*) (sql_show.cc:1900)
==11358==    by 0x7D0C58: get_schema_tables_result(JOIN*, enum_schema_table_state) (sql_show.cc:6642)
==11358==    by 0x6ED257: JOIN::exec() (sql_select.cc:1809)
==11358==    by 0x5ED3EF: subselect_single_select_engine::exec() (item_subselect.cc:1965)
==11358==    by 0x5EFD03: Item_subselect::exec() (item_subselect.cc:261)
==11358==    by 0x5EBB12: Item_singlerow_subselect::val_int() (item_subselect.cc:571)
==11358==    by 0x5AA756: Arg_comparator::compare_int_signed() (item_cmpfunc.cc:1454)
==11358==    by 0x58DBE1: Arg_comparator::compare() (item_cmpfunc.h:84)
==11358==    by 0x5AEA5A: Item_func_lt::val_int() (item_cmpfunc.cc:1911)
==11358==    by 0x590EC5: eval_const_cond(Item*) (item_func.cc:64)
==11358==    by 0x6D71B9: _ZL24internal_remove_eq_condsP3THDP4ItemPNS1_11cond_resultE (sql_select.cc:9322)
==11358==    by 0x6D6D6B: _ZL24internal_remove_eq_condsP3THDP4ItemPNS1_11cond_resultE (sql_select.cc:9238)
==11358==  Address 0x14f1d42f is 23 bytes inside a block of size 151 free'd
==11358==    at 0x4A0609F: free (vg_replace_malloc.c:323)
==11358==    by 0x9E5939: my_no_flags_free (my_malloc.c:59)
==11358==    by 0x9E6876: free_root (my_alloc.c:349)
==11358==    by 0x65E2F5: dispatch_command(enum_server_command, THD*, char*, unsigned) (sql_parse.cc:1399)
==11358==    by 0x65E557: do_command(THD*) (sql_parse.cc:709)
==11358==    by 0x64C7F2: do_handle_one_connection(THD*) (sql_connect.cc:1174)
==11358==    by 0x64C8C1: handle_one_connection (sql_connect.cc:1113)
==11358==    by 0x315B0073D9: start_thread (in /lib64/libpthread-2.9.so)
==11358==    by 0x315A4E627C: clone (in /lib64/libc-2.9.so)
[4 Mar 2010 11:40] Dmitry Lenev
Preliminary investigation shows that these crashes are caused
by unguarded access to thd->query string in fill_schema_processlist()
function.

The following patch seems to fix problem:

=== modified file 'sql/sql_show.cc'
--- sql/sql_show.cc     2010-02-20 10:07:32 +0000
+++ sql/sql_show.cc     2010-03-02 15:01:21 +0000
@@ -1894,6 +1894,8 @@ int fill_schema_processlist(THD* thd, TA
         mysql_mutex_unlock(&mysys_var->mutex);

       /* INFO */
+      /* Lock THD mutex that protects its data when looking at it. */
+      mysql_mutex_lock(&tmp->LOCK_thd_data);
       if (tmp->query())
       {
         table->field[7]->store(tmp->query(),
@@ -1901,6 +1903,7 @@ int fill_schema_processlist(THD* thd, TA
                                tmp->query_length()), cs);
         table->field[7]->set_notnull();
       }
+      mysql_mutex_unlock(&tmp->LOCK_thd_data);

       if (schema_table_store_record(thd, table))
       {

Note that similar place in mysqld_list_processes() has been causing
bug #38816 which was fixed in similar fashion.

Also given that the above code is present starting from 5.1 and #38816
was repeatable even in 5.0 I strongly suspect that this bug is also
repeatable in 5.1 (may be with a different/adjusted test case).

Asking for re-triage as probably this means that target version should
be reconsidered.
[5 Mar 2010 16:44] Omer Barnir
triage: adding SR51MRU tag as bug should be fixed in 5.1 as well
[10 Mar 2010 10:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/102839

3385 Alexey Botchkov	2010-03-09
      Bug#51377      Crash in information_schema / processlist on concurrent DDL workload
          the fill_schema_processlist function accesses THD::query() without proper protection
          so the parallel thread killing can lead to access to the freed meemory.
      
      per-file comments:
        sql/sql_show.cc
      Bug#51377      Crash in information_schema / processlist on concurrent DDL workload
          protect the THD::query() access with the THD::LOCK_thd_data mutex.
[10 Mar 2010 10:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/102856

3385 Alexey Botchkov	2010-03-09
      Bug#51377      Crash in information_schema / processlist on concurrent DDL workload
          the fill_schema_processlist function accesses THD::query() without proper protection
          so the parallel thread killing can lead to access to the freed meemory.
      
      per-file comments:
        sql/sql_load.cc
      Bug#51377      Crash in information_schema / processlist on concurrent DDL workload
          the THD::set_query_inner() call needs to be protected.
          But here we don't need to change the original thd->query() at all.
        sql/sql_show.cc
      Bug#51377      Crash in information_schema / processlist on concurrent DDL workload
          protect the THD::query() access with the THD::LOCK_thd_data mutex.
[25 Mar 2010 13:13] Alexey Botchkov
pushed into mysql-5.1-bugteam/mysql-pe
[6 Apr 2010 7:57] Bugs System
Pushed into 5.1.46 (revid:sergey.glukhov@sun.com-20100405111026-7kz1p8qlzglqgfmu) (version source revid:holyfoot@mysql.com-20100324120329-4705ek7nnbyx1zru) (merge vers: 5.1.46) (pib:16)
[16 Apr 2010 17:19] Paul Dubois
Noted in 5.1.46 changelog.

The server could crash populating the INFORMATION_SCHEMA.PROCESSLIST
table due to lack of mutex protection. 

Setting report to Need Merge pending push to Celosia.
[28 May 2010 5:54] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:alik@sun.com-20100422150750-vp0n37kp9ywq5ghf) (pib:16)
[28 May 2010 6:23] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:alik@sun.com-20100422150658-fkhgnwwkyugtxrmu) (merge vers: 6.0.14-alpha) (pib:16)
[28 May 2010 6:51] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:alexey.kopytov@sun.com-20100402151743-xowc2u930h729jsy) (merge vers: 5.5.4-m3) (pib:16)
[30 May 2010 0:20] Paul Dubois
Noted in 5.5.5, 6.0.14 changelogs.
[17 Jun 2010 11:55] Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:33] Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:martin.skold@mysql.com-20100609211156-tsac5qhw951miwtt) (merge vers: 5.1.46-ndb-6.2.19) (pib:16)
[17 Jun 2010 13:20] Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[30 Jun 2010 8:05] Shane Bester
MySQL Enterprise Monitor can trigger this crash because there is an advisor that does the following query;

SELECT COUNT(*) AS num_long_running, @@long_query_time AS long_running_time FROM information_schema.processlis
t WHERE time > @@long_query_time AND command != 'Sleep'
[3 Aug 2010 6:34] Shane Bester
bug #55034 is about similar crashes after this was supposedly fixed.
[11 Nov 2010 16:23] Shane Bester
see bug #53822