Bug #51377 | Crash in information_schema / processlist on concurrent DDL workload | ||
---|---|---|---|
Submitted: | 22 Feb 2010 9:21 | Modified: | 18 Jun 2010 2:05 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Information schema | Severity: | S2 (Serious) |
Version: | 5.1.46-bzr, next-mr | OS: | Any |
Assigned to: | Alexey Botchkov | CPU Architecture: | Any |
Tags: | rqg_pb2 |
[22 Feb 2010 9:21]
Philip Stoev
[22 Feb 2010 9:27]
Philip Stoev
Core and binary: http://mysql-systemqa.s3.amazonaws.com/var-bug51377.zip Source: revision-id: alik@sun.com-20100220092730-paoo5t9gcajs5dq8 date: 2010-02-20 12:27:30 +0300 build-date: 2010-02-22 11:21:38 +0200 revno: 2936 branch-nick: mysql-next-mr
[22 Feb 2010 9:41]
Philip Stoev
Here is another type of backtrace that happens on the same test for the same type of queries: #4 0x0000000000a3285d in my_utf8_uni (cs=0x1146340, pwc=0x7f18107625b8, s=0x0, e=0x26 <Address 0x26 out of bounds>) at ctype-utf8.c:1954 #5 0x0000000000a2a853 in my_well_formed_len_mb (cs=0x1146340, b=0x0, e=0x26 <Address 0x26 out of bounds>, pos=38, error=0x7f181076264c) at ctype-mb.c:401 #6 0x0000000000639f5c in well_formed_copy_nchars (to_cs=0x1146340, to=0x26288c8 "SAVEPOINT B@kill_idITH CONSISTENT SNAPSHOTVALUES ( 7 )_S . t1_base2_N A WHERE A.`pk` IN ( SELECT `pk` FROM testdb_N . t1_merge1_N AS B WHERE B.`pk` = 5 )` BETWEEN 5 AND 5 + 1 LIMIT 1E='Uptime')4 + "..., to_length=114, from_cs=0x1146340, from=0x0, from_length=38, nchars=38, well_formed_error_pos=0x7f1810762780, cannot_convert_error_pos=0x7f1810762778, from_end_pos=0x7f1810762770) at sql_string.cc:978 #7 0x0000000000615140 in Field_blob::store (this=0x29c2908, from=0x0, length=38, cs=0x1146340) at field.cc:7432 #8 0x00000000007dfaa6 in fill_schema_processlist (thd=0x7f180420a418, tables=0x27b0088, cond=0x0) at sql_show.cc:1893 #9 0x00000000007d154e in get_schema_tables_result (join=0x2674a78, executed_place=PROCESSED_BY_JOIN_EXEC) at sql_show.cc:6630 #10 0x00000000006ed41e in JOIN::exec (this=0x2674a78) at sql_select.cc:1809 #11 0x00000000005ed37b in subselect_single_select_engine::exec (this=0x27b0690) at item_subselect.cc:1965 #12 0x00000000005efc90 in Item_subselect::exec (this=0x27b05a0) at item_subselect.cc:261 #13 0x00000000005eba9f in Item_singlerow_subselect::val_int (this=0x27b05a0) at item_subselect.cc:571 #14 0x00000000005aa770 in Arg_comparator::compare_int_signed (this=0x27b0780) at item_cmpfunc.cc:1454 #15 0x000000000058daf2 in Arg_comparator::compare (this=0x27b0780) at item_cmpfunc.h:84 #16 0x00000000005aea65 in Item_func_lt::val_int (this=0x27b06d0) at item_cmpfunc.cc:1911 #17 0x0000000000568686 in Item::val_bool (this=0x27b06d0) at item.cc:184 #18 0x00000000005ad8f5 in Item_cond_and::val_int (this=0x27b0848) at item_cmpfunc.cc:4389 #19 0x0000000000594e0a in Item_func_set_user_var::check (this=0x2798e00, use_result_field=false) at item_func.cc:4254 #20 0x00000000006608b5 in set_var_user::check (this=0x2799240, thd=0x7f180420a418) at set_var.cc:656 #21 0x0000000000660a6b in sql_set_variables (thd=0x7f180420a418, var_list=0x7f180420c860) at set_var.cc:538 #22 0x00000000006585df in mysql_execute_command (thd=0x7f180420a418) at sql_parse.cc:3242 #23 0x000000000065c832 in mysql_parse (thd=0x7f180420a418, inBuf=0x27af098 "SET @kill_id = CONNECTION_ID() AND 64 + 3 < (SELECT MAX(id) FROM information_schema.processlist) AND 10 > (SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME='Uptime')", length=196, found_semicolon=0x7f1810764ee0) at sql_parse.cc:5581 #24 0x000000000065d44b in dispatch_command (command=COM_QUERY, thd=0x7f180420a418, packet=0x7f18042f52b9 "SET @kill_id = CONNECTION_ID() AND 64 + 3 < (SELECT MAX(id) FROM information_schema.processlist) AND 10 > (SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME='Uptime') ", packet_length=197) at sql_parse.cc:1023 #25 0x000000000065e8eb in do_command (thd=0x7f180420a418) at sql_parse.cc:709 #26 0x000000000064cacb in do_handle_one_connection (thd_arg=0x7f180420a418) at sql_connect.cc:1174 #27 0x000000000064cb9a in handle_one_connection (arg=0x7f180420a418) at sql_connect.cc:1113 #28 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0 #29 0x000000315a4e627d in clone () from /lib64/libc.so.6 (gdb) list 1949 uchar c; 1950 1951 if (s >= e) 1952 return MY_CS_TOOSMALL; 1953 1954 c= s[0]; 1955 if (c < 0x80) 1956 { 1957 *pwc = c; 1958 return 1; (gdb) print c $1 = 0 '\0' (gdb) print s[0] Cannot access memory at address 0x0
[22 Feb 2010 10:52]
Philip Stoev
Valgrind also agrees: ==11358== Thread 19: ==11358== Invalid read of size 1 ==11358== at 0xA2ECF9: my_utf8_uni (ctype-utf8.c:1954) ==11358== by 0xA26CEE: my_well_formed_len_mb (ctype-mb.c:401) ==11358== by 0x639DAF: well_formed_copy_nchars(charset_info_st*, char*, unsigned, charset_info_st*, char const*, unsigned, unsigned, char const**, char const**, char const**) (sql_string.cc:978) ==11358== by 0x6151C1: Field_blob::store(char const*, unsigned, charset_info_st*) (field.cc:7432) ==11358== by 0x7DF2F9: fill_schema_processlist(THD*, TABLE_LIST*, Item*) (sql_show.cc:1900) ==11358== by 0x7D0C58: get_schema_tables_result(JOIN*, enum_schema_table_state) (sql_show.cc:6642) ==11358== by 0x6ED257: JOIN::exec() (sql_select.cc:1809) ==11358== by 0x5ED3EF: subselect_single_select_engine::exec() (item_subselect.cc:1965) ==11358== by 0x5EFD03: Item_subselect::exec() (item_subselect.cc:261) ==11358== by 0x5EBB12: Item_singlerow_subselect::val_int() (item_subselect.cc:571) ==11358== by 0x5AA756: Arg_comparator::compare_int_signed() (item_cmpfunc.cc:1454) ==11358== by 0x58DBE1: Arg_comparator::compare() (item_cmpfunc.h:84) ==11358== by 0x5AEA5A: Item_func_lt::val_int() (item_cmpfunc.cc:1911) ==11358== by 0x590EC5: eval_const_cond(Item*) (item_func.cc:64) ==11358== by 0x6D71B9: _ZL24internal_remove_eq_condsP3THDP4ItemPNS1_11cond_resultE (sql_select.cc:9322) ==11358== by 0x6D6D6B: _ZL24internal_remove_eq_condsP3THDP4ItemPNS1_11cond_resultE (sql_select.cc:9238) ==11358== Address 0x14f1d42f is 23 bytes inside a block of size 151 free'd ==11358== at 0x4A0609F: free (vg_replace_malloc.c:323) ==11358== by 0x9E5939: my_no_flags_free (my_malloc.c:59) ==11358== by 0x9E6876: free_root (my_alloc.c:349) ==11358== by 0x65E2F5: dispatch_command(enum_server_command, THD*, char*, unsigned) (sql_parse.cc:1399) ==11358== by 0x65E557: do_command(THD*) (sql_parse.cc:709) ==11358== by 0x64C7F2: do_handle_one_connection(THD*) (sql_connect.cc:1174) ==11358== by 0x64C8C1: handle_one_connection (sql_connect.cc:1113) ==11358== by 0x315B0073D9: start_thread (in /lib64/libpthread-2.9.so) ==11358== by 0x315A4E627C: clone (in /lib64/libc-2.9.so)
[4 Mar 2010 11:40]
Dmitry Lenev
Preliminary investigation shows that these crashes are caused by unguarded access to thd->query string in fill_schema_processlist() function. The following patch seems to fix problem: === modified file 'sql/sql_show.cc' --- sql/sql_show.cc 2010-02-20 10:07:32 +0000 +++ sql/sql_show.cc 2010-03-02 15:01:21 +0000 @@ -1894,6 +1894,8 @@ int fill_schema_processlist(THD* thd, TA mysql_mutex_unlock(&mysys_var->mutex); /* INFO */ + /* Lock THD mutex that protects its data when looking at it. */ + mysql_mutex_lock(&tmp->LOCK_thd_data); if (tmp->query()) { table->field[7]->store(tmp->query(), @@ -1901,6 +1903,7 @@ int fill_schema_processlist(THD* thd, TA tmp->query_length()), cs); table->field[7]->set_notnull(); } + mysql_mutex_unlock(&tmp->LOCK_thd_data); if (schema_table_store_record(thd, table)) { Note that similar place in mysqld_list_processes() has been causing bug #38816 which was fixed in similar fashion. Also given that the above code is present starting from 5.1 and #38816 was repeatable even in 5.0 I strongly suspect that this bug is also repeatable in 5.1 (may be with a different/adjusted test case). Asking for re-triage as probably this means that target version should be reconsidered.
[5 Mar 2010 16:44]
Omer Barnir
triage: adding SR51MRU tag as bug should be fixed in 5.1 as well
[10 Mar 2010 10:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/102839 3385 Alexey Botchkov 2010-03-09 Bug#51377 Crash in information_schema / processlist on concurrent DDL workload the fill_schema_processlist function accesses THD::query() without proper protection so the parallel thread killing can lead to access to the freed meemory. per-file comments: sql/sql_show.cc Bug#51377 Crash in information_schema / processlist on concurrent DDL workload protect the THD::query() access with the THD::LOCK_thd_data mutex.
[10 Mar 2010 10:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/102856 3385 Alexey Botchkov 2010-03-09 Bug#51377 Crash in information_schema / processlist on concurrent DDL workload the fill_schema_processlist function accesses THD::query() without proper protection so the parallel thread killing can lead to access to the freed meemory. per-file comments: sql/sql_load.cc Bug#51377 Crash in information_schema / processlist on concurrent DDL workload the THD::set_query_inner() call needs to be protected. But here we don't need to change the original thd->query() at all. sql/sql_show.cc Bug#51377 Crash in information_schema / processlist on concurrent DDL workload protect the THD::query() access with the THD::LOCK_thd_data mutex.
[25 Mar 2010 13:13]
Alexey Botchkov
pushed into mysql-5.1-bugteam/mysql-pe
[6 Apr 2010 7:57]
Bugs System
Pushed into 5.1.46 (revid:sergey.glukhov@sun.com-20100405111026-7kz1p8qlzglqgfmu) (version source revid:holyfoot@mysql.com-20100324120329-4705ek7nnbyx1zru) (merge vers: 5.1.46) (pib:16)
[16 Apr 2010 17:19]
Paul DuBois
Noted in 5.1.46 changelog. The server could crash populating the INFORMATION_SCHEMA.PROCESSLIST table due to lack of mutex protection. Setting report to Need Merge pending push to Celosia.
[28 May 2010 5:54]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:alik@sun.com-20100422150750-vp0n37kp9ywq5ghf) (pib:16)
[28 May 2010 6:23]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:alik@sun.com-20100422150658-fkhgnwwkyugtxrmu) (merge vers: 6.0.14-alpha) (pib:16)
[28 May 2010 6:51]
Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:alexey.kopytov@sun.com-20100402151743-xowc2u930h729jsy) (merge vers: 5.5.4-m3) (pib:16)
[30 May 2010 0:20]
Paul DuBois
Noted in 5.5.5, 6.0.14 changelogs.
[17 Jun 2010 11:55]
Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:33]
Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:martin.skold@mysql.com-20100609211156-tsac5qhw951miwtt) (merge vers: 5.1.46-ndb-6.2.19) (pib:16)
[17 Jun 2010 13:20]
Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[30 Jun 2010 8:05]
MySQL Verification Team
MySQL Enterprise Monitor can trigger this crash because there is an advisor that does the following query; SELECT COUNT(*) AS num_long_running, @@long_query_time AS long_running_time FROM information_schema.processlis t WHERE time > @@long_query_time AND command != 'Sleep'
[3 Aug 2010 6:34]
MySQL Verification Team
bug #55034 is about similar crashes after this was supposedly fixed.
[11 Nov 2010 16:23]
MySQL Verification Team
see bug #53822