Bug #97593 mysql 5.6 crashes at fill_trx_row
Submitted: 12 Nov 2019 9:19 Modified: 16 Dec 2019 13:54
Reporter: jeff.ding jeff.ding Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.6 OS:CentOS
Assigned to: CPU Architecture:x86

[12 Nov 2019 9:19] jeff.ding jeff.ding
Description:
mysql 5.6 crashes at fill_trx_row when query trx state from information_schema,query string is like: 

"SELECT trx_state, count(*) as count FROM information_schema.innodb_trx             WHERE trx_started <  date_sub(now(), interval 1 second) group by trx_state"

How to repeat:
query trx state from information_schema, the query string is like 

"SELECT trx_state, count(*) as count FROM information_schema.innodb_trx             WHERE trx_started <  date_sub(now(), interval 1 second) group by trx_state"

This will crash mysql server occasionally

Suggested fix:
function 'fetch_data_into_cache_low' should call trx_mutex_enter(trx) before 'add_trx_relevant_locks_to_cache'. I find this issue is fixed in mysql 8.0
[12 Nov 2019 12:44] MySQL Verification Team
Hi Mr. jeff.ding,

Thank you for your bug report.

However, we need additional info from you. First, I have run your query 10 (ten) times and it never crashed our server running latest 5.6.

Hence, the questions:

* Are you using latest available 5.6 and if not, please try that one, first of all

* What are the exact conditions under which you can get the crash. We need a fully repeatable test case in order to verify this report. That means a test case that would ALWAYS crash a server.
[13 Nov 2019 1:42] jeff.ding jeff.ding
Hi, Sinisa Milivojevic

This crash only one time, but got the core stack:

#0  0x00007fe395bc79b1 in pthread_kill () from /lib64/libpthread.so.0
#1  0x00000000006aff8e in handle_fatal_signal (sig=6)
    at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/sql/signal_handler.cc:321
#2  <signal handler called>
#3  0x00007fe3949d91f7 in raise () from /lib64/libc.so.6
#4  0x00007fe3949da8e8 in abort () from /lib64/libc.so.6
#5  0x0000000000af9af3 in fill_trx_row (row=0x7fc9645f8af8, trx=trx@entry=0x7fc24f7fc168, requested_lock_row=requested_lock_row@entry=0x0,
    cache=cache@entry=0x156ac00 <trx_i_s_cache_static>)
    at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/storage/innobase/trx/trx0i_s.cc:488
#6  0x0000000000afa78b in fetch_data_into_cache_low (cache=cache@entry=0x156ac00 <trx_i_s_cache_static>, trx_list=<optimized out>,
    read_write=true) at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/storage/innobase/trx/trx0i_s.cc:1331
#7  0x0000000000afae30 in fetch_data_into_cache (cache=0x156ac00 <trx_i_s_cache_static>)
    at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/storage/innobase/trx/trx0i_s.cc:1357
#8  trx_i_s_possibly_fetch_data_into_cache (cache=cache@entry=0x156ac00 <trx_i_s_cache_static>)
    at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/storage/innobase/trx/trx0i_s.cc:1386
#9  0x0000000000a16eb5 in trx_i_s_common_fill_table (thd=thd@entry=0x7fc5023f0000, tables=tables@entry=0x7fc78656d6a0)
    at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/storage/innobase/handler/i_s.cc:1340
#10 0x0000000000795052 in do_fill_table (join_table=0x7fc4e86120b8, table_list=0x7fc78656d6a0, thd=0x7fc5023f0000)
    at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/sql/sql_show.cc:8012
#11 get_schema_tables_result (join=join@entry=0x7fc3a7dea1d0, executed_place=executed_place@entry=PROCESSED_BY_JOIN_EXEC)
    at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/sql/sql_show.cc:8113
#12 0x0000000000773745 in JOIN::prepare_result (this=this@entry=0x7fc3a7dea1d0, columns_list=columns_list@entry=0x7fc87c5e5370)
    at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/sql/sql_select.cc:824
#13 0x0000000000719f87 in JOIN::exec (this=0x7fc3a7dea1d0)
    at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/sql/sql_executor.cc:116
#14 0x000000000077440d in mysql_execute_select (free_join=true, select_lex=0x7fc5023f2c80, thd=0x7fc5023f0000)
    at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/sql/sql_select.cc:1114
#15 mysql_select (thd=thd@entry=0x7fc5023f0000, tables=0x7fc78656d6a0, wild_num=0, fields=..., conds=0x7fc78656e058,
    order=order@entry=0x7fc5023f2e50, group=group@entry=0x7fc5023f2d88, having=0x0, select_options=2684619520,
    result=result@entry=0x7fc3a7dea1a8, unit=unit@entry=0x7fc5023f2620, select_lex=select_lex@entry=0x7fc5023f2c80)
    at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/sql/sql_select.cc:1235
#16 0x0000000000774665 in handle_select (thd=thd@entry=0x7fc5023f0000, result=result@entry=0x7fc3a7dea1a8,
    setup_tables_done_option=setup_tables_done_option@entry=0)
    at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/sql/sql_select.cc:111
#17 0x0000000000597518 in execute_sqlcom_select (thd=thd@entry=0x7fc5023f0000, all_tables=<optimized out>)
    at /home/admin/35_20191015101601795_107527411_code/rpm_workspace/sql/sql_parse.cc:6188

At the time of this crash, trx->lock.wait_lock(0x7fc8deb583c8) is not nullptr while requested_lock_row is 0x0(this means trx->lock.que_state is TRX_QUE_RUNNING). So I checked this and found trx->lock.wait_lock and  trx->lock.que_state are modified under trx->mutex(please see function lock_cancel_waiting_and_release). When we query trx state from information_schema, fetch_data_into_cache_low access these two variables with no trx->mutex acquired, make the server crashes, i think.

I checked fetch_data_into_cache_low in mysql 8.0, fetch_data_into_cache_low had called trx_mutex_enter(trx) before add_trx_relevant_locks_to_cache, please check this :)
[13 Nov 2019 12:48] MySQL Verification Team
Hi Mr. ding,

Please re-read and answer to the questions that I asked in my previous comment.
[13 Nov 2019 12:49] MySQL Verification Team
We simply have to make sure that you use the latest release and if you do, we must be able to repeat the behaviour.
[14 Dec 2019 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".