MySQL Bugs: #85659: mysqld got signal 11

Bug #85659	mysqld got signal 11
Submitted:	27 Mar 2017 19:36	Modified:	24 Mar 2022 14:46
Reporter:	Pranab Bordoloi	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.7.15	OS:	Red Hat (Red Hat Enterprise Linux Server release 6.8 (Santiago))
Assigned to:		CPU Architecture:	Any

Description:
MySQL got signal 11 (detailed error log attached below). Had to restart with force_recovery=4 and found following info on corruption :
<schema>.<Table1>
Warning  : InnoDB: Index '<Table1>_TO_<table2>_FK' contains 4720 entries, should be 4721.
error    : Corrupt

Had to take backup, drop the schema and restore from dump - to get it working again. It crashed once before as well -similar behavior but a different table & FK_Index were reported in mysqlcheck.

Unfortunately, general logs were not ON and could not gather exact query details. Have set it ON now and will edit/update the bug if the issue recurs.

Please do let me know if I could provide further details to debug it. This a UAT environment.

09:08:54 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=101
max_threads=151
thread_count=76
connection_count=76
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68190 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fe3fc2ccf10
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fe4075c4e28 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0xf40b55]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x7cdf64]
/lib64/libpthread.so.0(+0xf7e0)[0x7fe43928e7e0]
/usr/sbin/mysqld(_Z32btr_free_externally_stored_fieldP12dict_index_tPhPKhPKmP14page_zip_des_tmbP5mtr_t+0x66)[0x1146e66]
/usr/sbin/mysqld(_Z26btr_cur_pessimistic_updatemP9btr_cur_tPPmPP16mem_block_info_tS4_PP9big_rec_tP5upd_tmP9que_thr_tmP5mtr_t+0x2ef)[0x114aedf]
/usr/sbin/mysqld[0x1235ac4]
/usr/sbin/mysqld[0x1236121]
/usr/sbin/mysqld(_Z12row_undo_modP11undo_node_tP9que_thr_t+0x254)[0x1237ed4]
/usr/sbin/mysqld(_Z13row_undo_stepP9que_thr_t+0x6c)[0x10c6f6c]
/usr/sbin/mysqld(_Z15que_run_threadsP9que_thr_t+0x877)[0x1061b57]
/usr/sbin/mysqld[0x11087df]
/usr/sbin/mysqld[0x11092c5]
/usr/sbin/mysqld(_Z22trx_rollback_for_mysqlP5trx_t+0x4f)[0x110c49f]
/usr/sbin/mysqld[0xfc8497]
/usr/sbin/mysqld(_Z15ha_rollback_lowP3THDb+0xa7)[0x819407]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG8rollbackEP3THDb+0x6d)[0xee832d]
/usr/sbin/mysqld(_Z17ha_rollback_transP3THDb+0x8e)[0x81921e]
/usr/sbin/mysqld(_Z14trans_rollbackP3THD+0x3a)[0xdc686a]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x26c7)[0xd0bf27]
/usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x3ed)[0xd0ed0d]
/usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x111e)[0xd0fe8e]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x194)[0xd10af4]
/usr/sbin/mysqld(handle_connection+0x29c)[0xde2e4c]
/usr/sbin/mysqld(pfs_spawn_thread+0x174)[0x1252bf4]
/lib64/libpthread.so.0(+0x7aa1)[0x7fe439286aa1]
/lib64/libc.so.6(clone+0x6d)[0x7fe437de4aad]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fe3fc01ba40): is an invalid pointer
Connection ID (thread ID): 37338
Status: NOT_KILLED

How to repeat:
Unknown

Hi!

We have encountered many reports of this kind from the inception of the InnoDB storage engine. 

In 95 % of the cases, problem was in the hardware. Please, do provide us with feedback, so that we could proceed further.

Do you use ECC RAM modules, 2 bits checking 1 bit correcting ??? Do you use mirrored disks or disks with some redundancy for the parity checking , like RAID.

If not, can you analyze your system logs, which are readily available on Red Hat OS. Please inspect them all on any report of any kind of hardware problem, be it CPU, caches, RAM, disk controllers, disks , caches on controllers or disks etc ..... Check whether there were any reports on files related to warnings or errors. Check whether there were reports on running out of space ....

Do analyze them for the period of the last three months.

If you do not find anything, can you make a thorough hardware analysis. I guess that Red Hat 6.8 is a fully stable version ????

Let me inform you also that sometimes, commodity RAM without ECC can have glitches, which change memory contents. Those glitches are rare and practically never repeat in the same addresses.

Thank you very much in advance.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Readable stack trace:
---------------------
Using mysqld from mysql-community-server-5.7.15-1.el6.x86_64.rpm

$ addr2line --addresses --inlines --pretty-print --basenames --functions --demangle --exe=./mysqld 0xf40b55 0x7cdf64 0x1146e66 0x114aedf 0x1235ac4 0x1236121 0x1237ed4 0x10c6f6c 0x1061b57 0x11087df 0x11092c5 0x110c49f 0xfc8497 0x819407 0xee832d 0x81921e 0xdc686a 0xd0bf27 0xd0ed0d 0xd0fe8e 0xd10af4 0xde2e4c 0x1252bf4

0x0000000000f40b55: my_print_stacktrace at stacktrace.c:225
0x00000000007cdf64: handle_fatal_signal at signal_handler.cc:150
0x0000000001146e66: mach_read_from_4 at mach0data.ic:194
 (inlined by) btr_free_externally_stored_field at btr0cur.cc:4381
0x0000000001235ac4: row_undo_mod_clust_low at row0umod.cc:147
0x0000000001236121: row_undo_mod_clust at row0umod.cc:320
0x0000000001237ed4: row_undo_mod at row0umod.cc:1235
0x00000000010c6f6c: row_undo at row0undo.cc:329
 (inlined by) row_undo_step at row0undo.cc:370
0x0000000001061b57: que_thr_step at que0que.cc:1054
 (inlined by) que_run_threads_low at que0que.cc:1118
 (inlined by) que_run_threads at que0que.cc:1158
0x00000000011087df: trx_rollback_to_savepoint_low at trx0roll.cc:122
0x00000000011092c5: trx_rollback_for_mysql_low at trx0roll.cc:184
 (inlined by) trx_rollback_low at trx0roll.cc:212
0x000000000110c49f: TrxInInnoDB::exit at trx0trx.h:1488
 (inlined by) ~TrxInInnoDB at trx0trx.h:1369
 (inlined by) trx_rollback_for_mysql at trx0roll.cc:289
0x0000000000fc8497: innobase_rollback at ha_innodb.cc:4424
0x0000000000819407: ha_rollback_low at handler.cc:1955
0x0000000000ee832d: MYSQL_BIN_LOG::rollback at binlog.cc:2044
0x000000000081921e: ha_rollback_trans at handler.cc:2031
0x0000000000dc686a: trans_rollback at transaction.cc:356
0x0000000000d0bf27: mysql_execute_command at sql_parse.cc:4280
0x0000000000d0ed0d: mysql_parse at sql_parse.cc:5559
0x0000000000d0fe8e: Parser_state::reset at sql_lex.h:3645
 (inlined by) dispatch_command at sql_parse.cc:1506
0x0000000000d10af4: do_command at sql_parse.cc:997
0x0000000000de2e4c: handle_connection at connection_handler_per_thread.cc:300
0x0000000001252bf4: pfs_spawn_thread at pfs.cc:2191

Shane,

Your stacktrace tells us that crash happened here:

ulint
mach_read_from_4(
/*=============*/
        const byte*     b)      /*!< in: pointer to four bytes */
{
        ut_ad(b);
        return( ((ulint)(b[0]) << 24)
                | ((ulint)(b[1]) << 16)
                | ((ulint)(b[2]) << 8)
                | (ulint)(b[3])
                );
}

So, it is ether that b equals 0 or that we ran out of the thread stack.

If b == 0, then this is some hardware problem, because that pointer is checked in the calling function.

Another possibility is that thread stack is too low.

Last, but not least, code in 5.7.18 differs significantly, so , most likely, if there is bug it is solved already.

mysqld got signal 11 at 5.7.31

2022-03-21T14:20:08.309665+08:00 105 [Note] Start semi-sync binlog_dump to slave (server_id: -1019961774), pos(./mysql-bin.007533
8, 4) binlog end pos(mysql-bin.007538, 613512)
06:20:10 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=76
max_threads=0
thread_count=75
connection_count=75
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 154509 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f3f0e0e9800
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f3efab09520 thread_stack 0x40000
/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x3c)[0xe4585c]
/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x509)[0x66d499]
/lib64/libpthread.so.0(+0xf700)[0x7f4165eac700]
/usr/local/mysql/bin/mysqld(_Z32btr_free_externally_stored_fieldP12dict_index_tPhPKhPKmP14page_zip_des_tmbP5mtr_t+0x5d)[0x1017544
d]
/usr/local/mysql/bin/mysqld(_Z26btr_cur_pessimistic_updatemP9btr_cur_tPPmPP16mem_block_info_tS4_PP9big_rec_tP5upd_tmP9que_thr_tmm
P5mtr_t+0x939)[0x1018e49]
/usr/local/mysql/bin/mysqld[0x110b6d7]
/usr/local/mysql/bin/mysqld[0x110bca6]
/usr/local/mysql/bin/mysqld(_Z12row_undo_modP11undo_node_tP9que_thr_t+0xb6d)[0x110ea5d]
/usr/local/mysql/bin/mysqld(_Z13row_undo_stepP9que_thr_t+0x74)[0xf993b4]
/usr/local/mysql/bin/mysqld(_Z15que_run_threadsP9que_thr_t+0x7e8)[0xf355a8]
/usr/local/mysql/bin/mysqld[0xfd1ebb]
/usr/local/mysql/bin/mysqld[0xfd474d]
/usr/local/mysql/bin/mysqld(_Z22trx_rollback_for_mysqlP5trx_t+0x318)[0xfd4b38]
/usr/local/mysql/bin/mysqld[0xe8eca0]
/usr/local/mysql/bin/mysqld(_Z15ha_rollback_lowP3THDb+0x94)[0x6bdd34]
/usr/local/mysql/bin/mysqld(_ZN13MYSQL_BIN_LOG8rollbackEP3THDb+0xef)[0xde3e1f]
/usr/local/mysql/bin/mysqld(_Z17ha_rollback_transP3THDb+0x79)[0x6bdec9]
/usr/local/mysql/bin/mysqld(_Z14trans_rollbackP3THD+0x2c)[0xc8e75c]
/usr/local/mysql/bin/mysqld(_Z21mysql_execute_commandP3THDb+0x3bc5)[0xbd71d5]
/usr/local/mysql/bin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x2ed)[0xbdac7d]
/usr/local/mysql/bin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x201b)[0xbdcebb]
/usr/local/mysql/bin/mysqld(_Z10do_commandP3THD+0x267)[0xbddc97]
/lib64/libc.so.6(clone+0x6d)[0x7f4164747f9d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f3f0e0ab038): is an invalid pointer
Connection ID (thread ID): 50
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Hi Mr. long,

Thank you for your comment.

However, our comments from 2017 still stand unchanged. This is most likely some transient error in the hardware, that can not be detected by hardware diagnostic tools. MySQL server can not fix problems in hardware or in the operating system.

We do hope that you are using ECC RAM modules, 2 bits checking 1 bit correcting. We are also hoping that you are using RAID disk arrays.

If this is indeed a bug, then please send us a fully repeatable test case . It should consist of the set of SQL statements that always lead to the crash that you reported.

We are not able to repeat the behaviour that you are reporting and, hence, we can not process further this report.

I met the same crash(signal 11) by using mysql 5.7.31(compiled by myself) but result of addr2line are the same with this issue(every line matched), it seems the value passed to match_read_from4 is not correct. I'm using a ECC memory server and 
multiple copy storage. It's very unlikely due to a hardware problem.

It crash on both Primary and Standby, but I can't find a test case to repeat this.

Sorry, Mr. Hua,

But we can not repeat the behaviour without a test case.

Hi All,

We did some search and found out that this is a duplicate bug of the bug that is not accessible to the public.

That bug is fixed in latest 8.0 and affects only debug builds.

I am pertty sure that I am using a release build(release with debug info). Can you describe some details that how to avoid this problem? (BTW, it crash again for mysql 5.7.33)

I think I found some more details.
1. Use a json column type.
2. virtual columns on json's inner columns.
3. keys on these virtual columns.
4. a lot rollback statement, may in parallel.

Is these info helps?

Hi!  Please try 8.0.28 and let us know.

It sounds like internal BUG 32529561 

which was fixed here:

https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-24.html

"InnoDB: Rollback of a transaction that modified generated columns raised an assertion failure. The failure occurred when attempting to free space occupied by externally stored columns. The update vector containing the externally stored columns did not account for the generated columns. (Bug #32529561)"

I don't see any fix for this in 5.7.

https://github.com/mysql/mysql-server/commit/c2183ffd319469685e0b2a9ff46f7522855b4136

this commit fix this problem. When will 5.7 fix this?

Hi Mr. Bordoloi,

When a bug is fixed only in 8.0, it means that 5.7 does not have the necessary infrastructure for a fix to be implemented.

Hence, you should start planning to upgrade to 8.0.

Duplicate of an internal bug.