Bug #75782 Failing assertion: trx->read_only && trx->auto_commit && trx->isolation_level==1
Submitted: 5 Feb 2015 3:23 Modified: 20 May 2015 18:32
Reporter: Roel Van de Paar Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: InnoDB storage engine Severity:S6 (Debug Builds)
Version:5.7.5-m15 OS:Any
Assigned to: CPU Architecture:Any

[5 Feb 2015 3:23] Roel Van de Paar
2015-02-05 14:07:57 0x7f76274ac700  InnoDB: Assertion failure in thread 140145442080512 in file trx0trx.cc line 1270
InnoDB: Failing assertion: trx->read_only && trx->auto_commit && trx->isolation_level == 1


(gdb) bt
#0  0x00007fe20efbd771 in pthread_kill () from /lib64/libpthread.so.0
#1  0x0000000000e15d58 in my_write_core (sig=6) at /bzr/mysql-5.7-percona-patches_dbg/mysys/stacktrace.c:247
#2  0x0000000000822044 in handle_fatal_signal (sig=6) at /bzr/mysql-5.7-percona-patches_dbg/sql/signal_handler.cc:219
#3  <signal handler called>
#4  0x00007fe20dbc15c9 in raise () from /lib64/libc.so.6
#5  0x00007fe20dbc2cd8 in abort () from /lib64/libc.so.6
#6  0x00000000010d4ccf in ut_dbg_assertion_failed (expr=0x14e8448 "trx->read_only && trx->auto_commit && trx->isolation_level == 1", file=0x14e7870 "/bzr/mysql-5.7-percona-patches_dbg/storage/innobase/trx/trx0trx.cc", line=1270) at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/ut/ut0dbg.cc:68
#7  0x00000000010c1818 in trx_start_low (trx=0x7fe1e9c00c90, read_write=true) at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/trx/trx0trx.cc:1269
#8  0x00000000010c55e0 in trx_start_internal_low (trx=0x7fe1e9c00c90) at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/trx/trx0trx.cc:3012
#9  0x0000000001197483 in dict_stats_fetch_from_ps (table=0x7fe1d711d598) at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/dict/dict0stats.cc:2919
#10 0x0000000001197cb1 in dict_stats_update (table=0x7fe1ef3b0c98, stats_upd_option=DICT_STATS_FETCH_ONLY_IF_NOT_IN_MEMORY) at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/dict/dict0stats.cc:3201
#11 0x0000000000f084ae in dict_stats_init (table=0x7fe1ef3b0c98) at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/include/dict0stats.ic:173
#12 0x0000000000f0e859 in ha_innobase::open (this=0x7fe1d5ede420, name=0x7fe1d72c9390 "./test/t3", mode=2, test_if_locked=2) at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/handler/ha_innodb.cc:5098
#13 0x0000000000880baf in handler::ha_open (this=0x7fe1d5ede420, table_arg=0x7fe1d721ec10, name=0x7fe1d72c9390 "./test/t3", mode=2, test_if_locked=2) at /bzr/mysql-5.7-percona-patches_dbg/sql/handler.cc:2471
#14 0x0000000000c179e0 in open_table_from_share (thd=0x7fe1d701c000, share=0x7fe1d72c9020, alias=0x7fe1d7028ae8 "t3", db_stat=39, prgflag=44, ha_open_flags=0, outparam=0x7fe1d721ec10, is_create_table=false) at /bzr/mysql-5.7-percona-patches_dbg/sql/table.cc:2378
#15 0x0000000000acafb7 in open_table (thd=0x7fe1d701c000, table_list=0x7fe1d734e020, ot_ctx=0x7fe20f572a40) at /bzr/mysql-5.7-percona-patches_dbg/sql/sql_base.cc:3417
#16 0x0000000000acd770 in open_and_process_table (thd=0x7fe1d701c000, lex=0x7fe1d701e370, tables=0x7fe1d734e020, counter=0x7fe1d701e420, flags=0, prelocking_strategy=0x7fe20f572b40, has_prelocking_list=false, ot_ctx=0x7fe20f572a40) at /bzr/mysql-5.7-percona-patches_dbg/sql/sql_base.cc:4903
#17 0x0000000000ace4fe in open_tables (thd=0x7fe1d701c000, start=0x7fe20f572b00, counter=0x7fe1d701e420, flags=0, prelocking_strategy=0x7fe20f572b40) at /bzr/mysql-5.7-percona-patches_dbg/sql/sql_base.cc:5357
#18 0x0000000000acf745 in open_normal_and_derived_tables (thd=0x7fe1d701c000, tables=0x7fe1d7028558, flags=0) at /bzr/mysql-5.7-percona-patches_dbg/sql/sql_base.cc:6085
#19 0x0000000000b4ecca in execute_sqlcom_select (thd=0x7fe1d701c000, all_tables=0x7fe1d7028558) at /bzr/mysql-5.7-percona-patches_dbg/sql/sql_parse.cc:5004
#20 0x0000000000b47775 in mysql_execute_command (thd=0x7fe1d701c000) at /bzr/mysql-5.7-percona-patches_dbg/sql/sql_parse.cc:2488
#21 0x0000000000b4fcc1 in mysql_parse (thd=0x7fe1d701c000, parser_state=0x7fe20f573e60) at /bzr/mysql-5.7-percona-patches_dbg/sql/sql_parse.cc:5427
#22 0x0000000000b44814 in dispatch_command (command=COM_QUERY, thd=0x7fe1d701c000, packet=0x7fe1d7033011 "SELECT t1.c1,t3.c1 FROM t1 NATURAL RIGHT JOIN t3  WHERE t1.c1 <=> 5;", packet_length=68) at /bzr/mysql-5.7-percona-patches_dbg/sql/sql_parse.cc:1250
#23 0x0000000000b4354d in do_command (thd=0x7fe1d701c000) at /bzr/mysql-5.7-percona-patches_dbg/sql/sql_parse.cc:834
#24 0x0000000000c4d49e in handle_connection (arg=0x7fe1f0fffee0) at /bzr/mysql-5.7-percona-patches_dbg/sql/conn_handler/connection_handler_per_thread.cc:298
#25 0x0000000000e39538 in pfs_spawn_thread (arg=0x7fe1f17f70d0) at /bzr/mysql-5.7-percona-patches_dbg/storage/perfschema/pfs.cc:2137
#26 0x00007fe20efb8df3 in start_thread () from /lib64/libpthread.so.0
#27 0x00007fe20dc821ad in clone () from /lib64/libc.so.6

How to repeat:
This bug is extremely hard to reproduce. The attached scripts will make it semi-easy, but expect to see some hacky methods employed (you'll see lot of unrelated errors both in the error log and in pquery output...)

$ vi {epoch}_mybase     # Update base path in this file (the only change required!)
$ ./{epoch}_init        # Initializes the data dir
$ ./{epoch}_start       # Starts mysqld
$ ./{epoch}_cl          # To check mysqld is up
$ while(true); do ./1422600023_run_pquery; done
  # This runs the testcase in a loop. Expect to see lots of unrelated errors (Error 1040: Too many connections/double free or corruption/trying to destroy uninialized mutex/invalid next size/segmentation faults/Aborts/... and so on). Ignore them all. ** Watch/wait ** for the occurence of "Error 2002: Can't connect to local MySQL server through socket '/dev/shm/1422600023/socket.sock' (111)" - maybe quickly, maybe after a while - it will scroll very fast over the screen (much faster then other original errors):

Error 2002: Can't connect to local MySQL server through socket '/dev/shm/1422600023/socket.sock' (111)
* PQUERY: Unable to continue [2], exiting
Executing testcase ./1422600023.sql against mysqld with socket /dev/shm/1422600023/socket.sock using pquery...
Infile is ./1422600023.sql
Database is test
Starting with 300 threads
Query limit per thread is 100000
User is root
Socket is /dev/shm/1422600023/socket.sock
Error 2002: Can't connect to local MySQL server through socket '/dev/shm/1422600023/socket.sock' (111)

 - This is your sign that the mysqld server is gone/has crashed. Press CTRL+C and review error log/gdb to confirm.
$ vi /dev/shm/{epoch}/error.log.out  # Verify the error log
$ ./{epoch}_gdb         # Brings you to a gdb prompt attached to correct mysqld & generated core
$ ./{epoch}_parse_core  # Create {epoch}_STD.gdb and {epoch}_FULL.gdb; standard and full var gdb stack traces
[5 Feb 2015 3:23] Roel Van de Paar
Stopping the while() loop a few times with CTRL+C and restarting may also aid reproduction.
[5 Feb 2015 3:31] Roel Van de Paar
Tarball with all scripts required

Attachment: bug_75782.tar.gz (application/gzip, text), 93.78 KiB.

[9 Feb 2015 20:27] Sveta Smirnova
Thank you for the report.

Verified as described.

To repeat report I had to modify paths in the scripts provided to reflect them my environment, change export LD_LIBRARY_PATH=$(cat ${SCRIPT_PWD}/1422600023_mybase)/lib to export LD_LIBRARY_PATH=$(cat ${SCRIPT_PWD}/1422600023_mybase)/lib:$LD_LIBRARY_PATH in 1422600023_run_pquery , so my custom LD_LIBRARY_PATH still works and re-compile pquery, taken from http://bazaar.launchpad.net/~percona-core/percona-qa/trunk/files

Also I had to run test for 3 hours before crash occurred.
[9 Feb 2015 20:30] Sveta Smirnova
Posted by developer:
I have different back trace:

2015-02-09 21:19:27 0x50c39940  InnoDB: Assertion failure in thread 1354996032 in file ha_innodb.cc line 3537
InnoDB: Failing assertion: !trx_is_started(trx)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
20:19:27 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68128 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x2aaab83c6000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 50c39010 thread_stack 0x40000

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Connection ID (thread ID): 83905

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
[19 Feb 2015 7:11] Naga Satyanarayana Bodapati
Posted by developer:
From the bug reporter stack trace, it looks like

1. attachable trx is started
2. trx->is_dd_trx is set
3. trx returns to pool without clearing the is_dd_trx flag <--- Bug probably here
4. Some other thread(dict stats thread in this case) requesting trx, got the trx which is earlier used for attachable trx
5. Assertion raised because of left over flag trx->is_dd_trx
[19 Feb 2015 7:32] Naga Satyanarayana Bodapati
Posted by developer:
I was wrong, We do have a assertion to make sure trx->is_dd_trx is set to false before returning the trx to pool.

trx_free(trx_t*& trx)
  ut_ad(trx->is_dd_trx == false);

So there is some mystery around this
[23 Feb 2015 9:46] Naga Satyanarayana Bodapati
On Ctrl-C pquery is crashing frequently :)

(gdb) bt
#0  0x00007f5b490d4cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f5b490d80d8 in __GI_abort () at abort.c:89
#2  0x00007f5b49111f24 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7f5b492206c8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007f5b4911e1fe in malloc_printerr (ptr=<optimized out>, str=0x7f5b492207b0 "double free or corruption (!prev)", action=1) at malloc.c:4996
#4  _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840
#5  0x00007f5b49cd6a3b in my_once_free () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#6  0x00007f5b49cd5ebd in my_end () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#7  0x00007f5b49cb1f07 in mysql_server_end () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#8  0x0000000000403674 in executor (number=261, qlist=...) at pquery.cpp:83
#9  0x00007f5b49a31bf0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007f5b48e88182 in start_thread (arg=0x7f5a4672c700) at pthread_create.c:312
#11 0x00007f5b4919900d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[24 Feb 2015 12:42] Naga Satyanarayana Bodapati
Posted by developer:
Am not able to reproduce :(
[24 Feb 2015 12:45] Naga Satyanarayana Bodapati
I have few questions to the bug reporter:

1) Is it possible to compile pquery with gcc-4.4?
2) Is it possible to fix the pquery crash(not important, it may help)
3) Is it possible to get minimum sequence of queries in .sql to reproduce the crash? (similar to RQG simplifier)
[24 Feb 2015 12:46] Naga Satyanarayana Bodapati
Setting the status to Need Feedback to get attention
[24 Feb 2015 18:19] Sveta Smirnova

I waited 3 hours for crash to occur.

Regarding to q.1 - yes, it is possible: just download sources from Launchpad repo, modify paths and compile the tool. Building is not automated, but it is possible. Ping me if you need help.
[25 Feb 2015 5:41] Naga Satyanarayana Bodapati

gcc-4.4 doesn't support the option -std=gnu++11

I've tried on my laptop and pquery completes its run successfully. At the end it prints summary

Also, the assert you have got is different from the assert reported by bug reporter.
[25 Mar 2015 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[27 Mar 2015 5:46] Naga Satyanarayana Bodapati
Using the sql provided and pumping that SQL using multiple threads, we are able to reproduce the bug.
[27 Mar 2015 6:13] Roel Van de Paar
Very nice Naga! Very appreciated too. 

We are in the process of building 3 new pquery binaries (oracle/maria/percona), so there is no more lib issues (and hopefully no more of the issues you saw). More on this on the Percona blog when it's released.
[1 Apr 2015 0:21] Roel Van de Paar
Small update, not directly related to bug, pquery binaries are as good as ready, and we also resolved a memory bug, which may have been the one you were seeing. So hopefully reproduction issues like the one seen in this bug are a thing of the past.
[20 May 2015 18:32] Daniel Price
Posted by developer:
Fixed as of the upcoming 5.7.8, 5.8.0 releases, and here's the changelog entry:

Attempting to kill statements that use attachable transactions caused
subsequent statements to function improperly, resulting in assertion
failures. A "max_statement_time" timeout on such statements could produce
a similar outcome. 

Thanks for the bug report.