Bug #82312 Crash on concurrent access to events_statements_current
Submitted: 21 Jul 2016 21:37 Modified: 22 Jul 2016 5:38
Reporter: Rene' Cannao' Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S1 (Critical)
Version:5.6.30 OS:Ubuntu (12.04)
Assigned to: CPU Architecture:Any

[21 Jul 2016 21:37] Rene' Cannao'
Description:
Concurrent access (from multiple connections) to any of the follow tables in performance_schema can cause a crash:
- events_statements_current
- events_statements_history
- events_statements_history_long

Our initial backtrace was the follow:

Thread pointer: 0x37d9970
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f58d72cae10 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x8e71ec]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x66c3e1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fad7d469cb0]
/usr/sbin/mysqld(_ZN30table_events_statements_common15make_row_part_1EP21PFS_events_statementsP18sql_digest_storage+0xbd)[0xb38fcd]
/usr/sbin/mysqld(_ZN36table_events_statements_history_long8make_rowEP21PFS_events_statements+0x5e)[0xb39cae]
/usr/sbin/mysqld(_ZN36table_events_statements_history_long8rnd_nextEv+0xb0)[0xb39d90]
/usr/sbin/mysqld(_ZN13ha_perfschema8rnd_nextEPh+0x4a)[0xb2215a]
/usr/sbin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x94)[0x5b0254]
/usr/sbin/mysqld(_Z13rr_sequentialP11READ_RECORD+0x20)[0x81a740]
/usr/sbin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x101)[0x6cc171]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0x2b8)[0x6cb3b8]
/usr/sbin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_P10SQL_I_ListI8st_orderESB_S7_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x275)[0x715095]
/usr/sbin/mysqld(_Z13handle_selectP3THDP13select_resultm+0x165)[0x7158f5]
/usr/sbin/mysqld[0x570812]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x28bc)[0x6ef9ac]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x5a8)[0x6f4218]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x106b)[0x6f5a1b]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x162)[0x6c2202]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x6c22f0]
/usr/sbin/mysqld(pfs_spawn_thread+0x143)[0xb4ad83]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fad7d461e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fad7c97d38d]

While investigating how to reproduce it, I noticed that the bug is very easy to reproduce when multiple connections, while cannot be reproduced when running with just one connection.
Furthermore, when running mysqld with valgrind the crash is not reproducible, that suggests that probably the threads are accessing areas of memory without proper locking.

How to repeat:
Run mysqld (in this case I was running it under gdb, but that is not relevant):

gdb /home/rcannao/5630/mysql-5.6.30/mybuild/sql/mysqld
(gdb) run --basedir=/usr --datadir=/mnt/data --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/mnt/data/error.err --open-files-limit=8192 --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306 --skip-slave-start

Run some random traffic, for example using sysbench:
sysbench --test=oltp --mysql-user=rcannao --mysql-password=rcannao --mysql-db=sbtest --oltp-table-size=10000000 --num-threads=16 --max-time=10 --oltp-point-selects=100 --max-requests=0 run

Run simultaneous access to an events_statements_* table, for example:
mysqlslap -u rcannao -prcannao --create-schema=performance_schema -q "SELECT * FROM events_statements_current" -c 16 -i 1000

===

Further details.
When running with gdb I got the following bt:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffea43c1700 (LWP 50729)]
table_events_statements_common::make_row_part_1 (this=0x7ffe700355c0, statement=0x7fffb1fe0840, digest=0x7ffea43bea80) at /home/rcannao/5630/mysql-5.6.30/storage/perfschema/table_events_statements.cc:343
343       if (cs->mbmaxlen > 1)
(gdb) bt
#0  table_events_statements_common::make_row_part_1 (this=0x7ffe700355c0, statement=0x7fffb1fe0840, digest=0x7ffea43bea80) at /home/rcannao/5630/mysql-5.6.30/storage/perfschema/table_events_statements.cc:343
#1  0x0000000000aeeab9 in table_events_statements_current::make_row (this=0x7ffe700355c0, pfs_thread=0x7fffdab31d40, statement=<optimized out>) at /home/rcannao/5630/mysql-5.6.30/storage/perfschema/table_events_statements.cc:750
#2  0x0000000000aeec87 in table_events_statements_current::rnd_next (this=0x7ffe700355c0) at /home/rcannao/5630/mysql-5.6.30/storage/perfschema/table_events_statements.cc:691
#3  0x0000000000ad7307 in rnd_next (buf=0x7ffe440079f0 "\301\336", <incomplete sequence \306>, this=0x7ffe44032320) at /home/rcannao/5630/mysql-5.6.30/storage/perfschema/ha_perfschema.cc:332
#4  ha_perfschema::rnd_next (this=0x7ffe44032320, buf=0x7ffe440079f0 "\301\336", <incomplete sequence \306>) at /home/rcannao/5630/mysql-5.6.30/storage/perfschema/ha_perfschema.cc:320
#5  0x00000000005c3d2c in handler::ha_rnd_next (this=0x7ffe44032320, buf=0x7ffe440079f0 "\301\336", <incomplete sequence \306>) at /home/rcannao/5630/mysql-5.6.30/sql/handler.cc:2688
#6  0x00000000008016de in rr_sequential (info=0x7ffe7000d340) at /home/rcannao/5630/mysql-5.6.30/sql/records.cc:480
#7  0x00000000006c5de9 in sub_select (join_tab=0x7ffe7000d2b0, join=0x7ffe70005438, end_of_records=<optimized out>) at /home/rcannao/5630/mysql-5.6.30/sql/sql_executor.cc:1262
#8  sub_select (join=0x7ffe70005438, join_tab=0x7ffe7000d2b0, end_of_records=<optimized out>) at /home/rcannao/5630/mysql-5.6.30/sql/sql_executor.cc:1204
#9  0x00000000006c50fa in do_select (join=0x7ffe70005438) at /home/rcannao/5630/mysql-5.6.30/sql/sql_executor.cc:936
#10 JOIN::exec (this=0x7ffe70005438) at /home/rcannao/5630/mysql-5.6.30/sql/sql_executor.cc:194
#11 0x000000000070b195 in mysql_execute_select (free_join=true, select_lex=0x68f86b0, thd=0x68f6200) at /home/rcannao/5630/mysql-5.6.30/sql/sql_select.cc:1101
#12 mysql_select (thd=0x68f6200, tables=0x7ffe70004de8, wild_num=1, fields=..., conds=0x0, order=<optimized out>, group=0x68f87b0, having=0x0, select_options=2147748608, result=0x7ffe70005410, unit=0x68f8068, select_lex=0x68f86b0)
    at /home/rcannao/5630/mysql-5.6.30/sql/sql_select.cc:1222
#13 0x000000000070ba0f in handle_select (thd=0x68f6200, result=0x7ffe70005410, setup_tables_done_option=0) at /home/rcannao/5630/mysql-5.6.30/sql/sql_select.cc:110
#14 0x000000000058f444 in execute_sqlcom_select (thd=0x68f6200, all_tables=<optimized out>) at /home/rcannao/5630/mysql-5.6.30/sql/sql_parse.cc:5133
#15 0x00000000006e7400 in mysql_execute_command (thd=0x68f6200) at /home/rcannao/5630/mysql-5.6.30/sql/sql_parse.cc:2655
#16 0x00000000006ec930 in mysql_parse (thd=0x68f6200, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /home/rcannao/5630/mysql-5.6.30/sql/sql_parse.cc:6385
#17 0x00000000006ee459 in dispatch_command (command=COM_QUERY, thd=0x68f6200, packet=0x7ffe70004bb7 "", packet_length=39) at /home/rcannao/5630/mysql-5.6.30/sql/sql_parse.cc:1339
#18 0x00000000006bc1af in do_handle_one_connection (thd_arg=<optimized out>) at /home/rcannao/5630/mysql-5.6.30/sql/sql_connect.cc:982
#19 0x00000000006bc257 in handle_one_connection (arg=0x68d5600) at /home/rcannao/5630/mysql-5.6.30/sql/sql_connect.cc:898
#20 0x0000000000b02f83 in pfs_spawn_thread (arg=0x68dd730) at /home/rcannao/5630/mysql-5.6.30/storage/perfschema/pfs.cc:1860
#21 0x00007ffff737de9a in start_thread (arg=0x7ffea43c1700) at pthread_create.c:308
#22 0x00007ffff689938d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#23 0x0000000000000000 in ?? ()

===

I run several hours trying to better understand what was triggering the bug and it seems that get_internal_charset() returns NULL .
I modified charset.c just to add an extra brakepoint :

rcannao@sjc8c-rq3-3a:~/5630/mysql-5.6.30/mybuild$ cp ../mysys/charset.c ../mysys/charset.c.orig
rcannao@sjc8c-rq3-3a:~/5630/mysql-5.6.30/mybuild$ vim ../mysys/charset.c 

rcannao@sjc8c-rq3-3a:~/5630/mysql-5.6.30/mybuild$ diff ../mysys/charset.c ../mysys/charset.c.orig -u
--- ../mysys/charset.c  2016-07-21 20:21:36.648133242 +0000
+++ ../mysys/charset.c.orig     2016-07-21 20:20:21.665569631 +0000
@@ -639,11 +639,7 @@

     mysql_mutex_unlock(&THR_LOCK_charset);
   }
-  if (cs) {
-    return cs;
-  } else {
-    return cs;
-  }
+  return cs;
 }

Then recompiled without any optimization (maybe there is a better way of doing this) :
rcannao@sjc8c-rq3-3a:~/5630/mysql-5.6.30/mybuild$ for i in `grep '\-O3' * -r | awk -F ':' '{print $1}' | grep make` ; do sed -i -e 's/\-O3/\-O0/g' $i ; done
rcannao@sjc8c-rq3-3a:~/5630/mysql-5.6.30/mybuild$ for i in `grep '\-O2' * -r | awk -F ':' '{print $1}' | grep make` ; do sed -i -e 's/\-O2/\-O0/g' $i ; done
rcannao@sjc8c-rq3-3a:~/5630/mysql-5.6.30/mybuild$
rcannao@sjc8c-rq3-3a:~/5630/mysql-5.6.30/mybuild$ make clean && make

Then set a breakpoint in case get_internal_charset() wants to return NULL and restarted mysqld:
(gdb) b charset.c:645
Breakpoint 1 at 0x8a84b0: file /home/rcannao/5630/mysql-5.6.30/mysys/charset.c, line 645.
(gdb) run --basedir=/usr --datadir=/mnt/data --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/mnt/data/error.err --open-files-limit=8192 --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306 --skip-slave-start

After running sysbench for few seconds and again mysqlslap, the new backtrace is interesting:

Breakpoint 1, get_internal_charset (loader=0x7ffea1b70580, cs_number=0, flags=0) at /home/rcannao/5630/mysql-5.6.30/mysys/charset.c:645
645         return cs;
(gdb) bt
#0  get_internal_charset (loader=0x7ffea1b70580, cs_number=0, flags=0) at /home/rcannao/5630/mysql-5.6.30/mysys/charset.c:645
#1  0x0000000000a01b21 in get_charset (cs_number=0, flags=0) at /home/rcannao/5630/mysql-5.6.30/mysys/charset.c:664
#2  0x0000000000c80586 in table_events_statements_common::make_row_part_1 (this=0x7ffe480350f0, statement=0x7fffb2211840, digest=0x7ffea1b70720) at /home/rcannao/5630/mysql-5.6.30/storage/perfschema/table_events_statements.cc:340
#3  0x0000000000c817ae in table_events_statements_current::make_row (this=0x7ffe480350f0, pfs_thread=0x7fffdae35340, statement=0x7fffb2211840) at /home/rcannao/5630/mysql-5.6.30/storage/perfschema/table_events_statements.cc:750
#4  0x0000000000c815a8 in table_events_statements_current::rnd_next (this=0x7ffe480350f0) at /home/rcannao/5630/mysql-5.6.30/storage/perfschema/table_events_statements.cc:691
#5  0x0000000000c66830 in ha_perfschema::rnd_next (this=0x7ffe5c008ae0, buf=0x7ffe5c009aa0 "\301\336\003(\005") at /home/rcannao/5630/mysql-5.6.30/storage/perfschema/ha_perfschema.cc:332
#6  0x00000000006415bb in handler::ha_rnd_next (this=0x7ffe5c008ae0, buf=0x7ffe5c009aa0 "\301\336\003(\005") at /home/rcannao/5630/mysql-5.6.30/sql/handler.cc:2688
#7  0x0000000000931a3d in rr_sequential (info=0x7ffe4800d360) at /home/rcannao/5630/mysql-5.6.30/sql/records.cc:480
#8  0x000000000078785b in sub_select (join=0x7ffe480053b8, join_tab=0x7ffe4800d2d0, end_of_records=false) at /home/rcannao/5630/mysql-5.6.30/sql/sql_executor.cc:1262
#9  0x0000000000787452 in do_select (join=0x7ffe480053b8) at /home/rcannao/5630/mysql-5.6.30/sql/sql_executor.cc:936
#10 0x00000000007858e4 in JOIN::exec (this=0x7ffe480053b8) at /home/rcannao/5630/mysql-5.6.30/sql/sql_executor.cc:194
#11 0x00000000007dc1b1 in mysql_execute_select (thd=0x6b01cf0, select_lex=0x6b041a0, free_join=true) at /home/rcannao/5630/mysql-5.6.30/sql/sql_select.cc:1101
#12 0x00000000007dc43a in mysql_select (thd=0x6b01cf0, tables=0x7ffe48004d68, wild_num=1, fields=..., conds=0x0, order=0x6b04368, group=0x6b042a0, having=0x0, select_options=2147748608, result=0x7ffe48005390, unit=0x6b03b58, select_lex=0x6b041a0)
    at /home/rcannao/5630/mysql-5.6.30/sql/sql_select.cc:1222
#13 0x00000000007da8fa in handle_select (thd=0x6b01cf0, result=0x7ffe48005390, setup_tables_done_option=0) at /home/rcannao/5630/mysql-5.6.30/sql/sql_select.cc:110
#14 0x00000000007b8c4d in execute_sqlcom_select (thd=0x6b01cf0, all_tables=0x7ffe48004d68) at /home/rcannao/5630/mysql-5.6.30/sql/sql_parse.cc:5133
#15 0x00000000007b2304 in mysql_execute_command (thd=0x6b01cf0) at /home/rcannao/5630/mysql-5.6.30/sql/sql_parse.cc:2655
#16 0x00000000007bb11a in mysql_parse (thd=0x6b01cf0, rawbuf=0x7ffe48004b10 "SELECT * FROM events_statements_current", length=39, parser_state=0x7ffea1b72170) at /home/rcannao/5630/mysql-5.6.30/sql/sql_parse.cc:6385
#17 0x00000000007af88e in dispatch_command (command=COM_QUERY, thd=0x6b01cf0, packet=0x6b745b1 "", packet_length=39) at /home/rcannao/5630/mysql-5.6.30/sql/sql_parse.cc:1339
#18 0x00000000007aeb3e in do_command (thd=0x6b01cf0) at /home/rcannao/5630/mysql-5.6.30/sql/sql_parse.cc:1036
#19 0x000000000077b466 in do_handle_one_connection (thd_arg=0x6b63fb0) at /home/rcannao/5630/mysql-5.6.30/sql/sql_connect.cc:982
#20 0x000000000077aef9 in handle_one_connection (arg=0x6b63fb0) at /home/rcannao/5630/mysql-5.6.30/sql/sql_connect.cc:898
#21 0x0000000000c95db6 in pfs_spawn_thread (arg=0x6b6c0e0) at /home/rcannao/5630/mysql-5.6.30/storage/perfschema/pfs.cc:1860
#22 0x00007ffff737de9a in start_thread (arg=0x7ffea1b73700) at pthread_create.c:308
#23 0x00007ffff689938d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#24 0x0000000000000000 in ?? ()
(gdb) up
#1  0x0000000000a01b21 in get_charset (cs_number=0, flags=0) at /home/rcannao/5630/mysql-5.6.30/mysys/charset.c:664
664       cs= get_internal_charset(&loader, cs_number, flags);
(gdb) up
#2  0x0000000000c80586 in table_events_statements_common::make_row_part_1 (this=0x7ffe480350f0, statement=0x7fffb2211840, digest=0x7ffea1b70720) at /home/rcannao/5630/mysql-5.6.30/storage/perfschema/table_events_statements.cc:340
340       CHARSET_INFO *cs= get_charset(statement->m_sqltext_cs_number, MYF(0));
(gdb) p *statement
$3 = {<PFS_events> = {m_thread_internal_id = 1321, m_event_id = 1, m_end_event_id = 0, m_event_type = EVENT_TYPE_STATEMENT, m_nesting_event_id = 0, m_nesting_event_type = 0, m_class = 0x15d9dc0, m_timer_start = 1469133034913692442, m_timer_end = 0,
    m_source_file = 0xd4d890 "/home/rcannao/5630/mysql-5.6.30/sql/mysqld.cc", m_source_line = 962}, m_current_schema_name = "performance_schema", '\000' <repeats 173 times>, m_current_schema_name_length = 18,
  m_sqltext = "SELECT * FROM events_statements_current", '\000' <repeats 984 times>, m_sqltext_length = 39, m_lock_time = 67, m_message_text = '\000' <repeats 512 times>, m_sql_errno = 0, m_sqlstate = "\000\000\000\000", m_error_count = 0, m_warning_count = 0, m_rows_affected = 0,
  m_rows_sent = 0, m_rows_examined = 0, m_created_tmp_disk_tables = 0, m_created_tmp_tables = 0, m_select_full_join = 0, m_select_full_range_join = 0, m_select_range = 0, m_select_range_check = 0, m_select_scan = 1, m_sort_merge_passes = 0, m_sort_range = 0, m_sort_rows = 0,
  m_sort_scan = 0, m_no_index_used = 1, m_no_good_index_used = 0, m_sqltext_truncated = false, m_sqltext_cs_number = 192, m_digest_storage = {m_full = false, m_byte_count = 0, m_md5 = '\000' <repeats 15 times>, m_charset_number = 0, m_token_array = 0x7fff92d52440 "",
    m_token_array_length = 1024}}
(gdb) p statement->m_sqltext_cs_number
$4 = 192

table_events_statements_common::make_row_part_1() calls get_charset() with cs_number=0, but now statement->m_sqltext_cs_number isn't 0 but 192 .

Not sure what causes this, but seems that while a thread is reading and processing a PFS_events_statements , another thread is modifying it.

Suggested fix:
If my hypothesis is correct and the issue is that the bug is caused by multiple threads unsafely accessing shared memory, I am not sure how to fix this properly.

Although, as a hot fix, I would be happy with something like the following:

--- ../mysys/charset.c.orig     2016-07-21 20:20:21.665569631 +0000
+++ ../mysys/charset.c  2016-07-21 21:30:21.635697937 +0000
@@ -648,7 +648,7 @@
   CHARSET_INFO *cs;
   MY_CHARSET_LOADER loader;

-  if (cs_number == default_charset_info->number)
+  if (cs_number == default_charset_info->number || cs_number == 0)
     return default_charset_info;

   my_pthread_once(&charsets_initialized, init_available_charsets);

In this way get_internal_charset() will return the default charset info instead of a NULL pointer.
Although this is not a perfect fix, it will prevent the server to crash.
[22 Jul 2016 5:35] MySQL Verification Team
Hello Rene,

Thank you for the report.
This is duplicate of Mikiya-San's internal Bug#23540008  and fixed and noted in upcoming 5.6.32, 5.7.14 changelogs. 

The code for reading character set information from Performance 
Schema statement events tables (for example, 
events_statements_current) did not prevent simultaneous writing to 
that information. As a result, the SQL query text character set could 
be invalid, which could result in a server exit. Now an invalid 
character set causes SQL_TEXT column truncation. 

Thanks,
Umesh