Description:
In production, running a large GROUP BY query involving many MAX(CONCAT(...)) columns caused mysqld to receive a SIGSEGV and crash the entire server (master outage). The crash stack (captured from the error log) shows memcmp in libc being invoked from the calc_row_difference path.
08:06:51 UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0xfffaaaf756000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = fff940553010 thread_stack 0x46000
/mysql/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2c) [0x1c99324]
/mysql/bin/mysqld(handle_fatal_signal+0x284) [0xf5e07c]
linux-vdso.so.1(__kernel_rt_sigreturn+0) [0xfffc292f07e0]
/lib64/libc.so.6(memcmp+0x10c) [0xfffc28852f0c]
/mysql/bin/mysqld() [0x1d8c158]
/mysql/bin/mysqld(ha_innobase::update_row(unsigned char const*, unsigned char*)+0xb0) [0x1d8cd78]
/mysql/bin/mysqld(handler::ha_update_row(unsigned char const*, unsigned char*)+0x184) [0x103e114]
/mysql/bin/mysqld(TempTableAggregateIterator::Init()+0x28c) [0x125086c]
/mysql/bin/mysqld(filesort(THD*, Filesort*, RowIterator*, unsigned long, unsigned long long, Filesort_info*, Sort_result*, unsigned long long*)+0x408) [0x1023c30]
/mysql/bin/mysqld(SortingIterator::DoSort()+0x4c) [0xdbea1c]
/mysql/bin/mysqld(SortingIterator::Init()+0x1c) [0xdbea7c]
/mysql/bin/mysqld(Query_expression::ExecuteIteratorQuery(THD*)+0x2e4) [0xef6a1c]
/mysql/bin/mysqld(Query_expression::execute(THD*)+0x2c) [0xef6ccc]
/mysql/bin/mysqld(Sql_cmd_dml::execute_inner(THD*)+0x234) [0x9e9e21c]
/mysql/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x4b4) [0xea7054]
/mysql/bin/mysqld(mysql_execute_command(THD*, bool)+0xd4c) [0xe557a44]
/mysql/bin/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x324) [0xe5a4f4]
/mysql/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x12c4) [0xe5bb8c]
/mysql/bin/mysqld(do_command(THD*)+0x160) [0xe5cado]
/mysql/bin/mysqld() [0xf51428]
/mysql/bin/mysqld() [0x21f14f4]
/lib64/libpthread.so.0(+0x88cc) [0xfffc291788cc]
/lib64/libc.so.6(+0xda1ec) [0xfffc288aa1ec]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (fffaaf431828): is an invalid pointer
Connection ID (thread ID): 24189625
Status: NOT_KILLED
This is the SQL we ran in production. After running this query, the database immediately coredumped and exited.
The error log contains the stack trace shown above.
The table structure is similar to the one in production:
CREATE TABLE test_large_tmp (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
col_01 VARCHAR(64),
col_02 VARCHAR(64),
col_03 VARCHAR(64),
col_04 VARCHAR(64),
col_05 VARCHAR(64),
-- ... many similar columns omitted
col_99 VARCHAR(64),
col_100 VARCHAR(64),
grp_key1 VARCHAR(20),
grp_key2 VARCHAR(20),
grp_key3 VARCHAR(20),
grp_key4 VARCHAR(20),
grp_key5 VARCHAR(20)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
The SQL we ran in production:
-- Table size: approximately 5.15 million rows
SELECT
COUNT(1),
MAX(CONCAT(CONCAT('', ''), col_01)) AS col1,
MAX(CONCAT(CONCAT('', ''), col_02)) AS col2,
MAX(CONCAT(CONCAT('', ''), col_03)) AS col3,
MAX(CONCAT(CONCAT('', ''), col_04)) AS col4,
MAX(CONCAT(CONCAT('', ''), col_05)) AS col5,
-- ... many similar columns omitted
MAX(CONCAT(CONCAT('', ''), col_68)) AS col68,
MAX(CONCAT(CONCAT('', ''), col_69)) AS col69,
MAX(CONCAT(CONCAT('', ''), col_70)) AS col70
FROM test_large_tmp
GROUP BY grp_key1, grp_key2, grp_key3, grp_key4, grp_key5;
Why we think this is a bug:
By reviewing the source code of version 8.0.25, I inferred the related function call path:
TempTableAggregateIterator::Init() -> handler::ha_update_row() -> ha_innobase::update_row() -> calc_row_difference() -> memcmp()
We can confirm that the function at 0x1d8c158 is indeed calc_row_difference().
After analyzing this function, I found that it calls memcmp() in two places.
The relevant code is as follows:
/* for virtual columns, we only materialize
its index, and index field length would not
exceed max_field_len. So continue if the
first max_field_len bytes are matched up */
if (o_len != UNIV_SQL_NULL && n_len != UNIV_SQL_NULL &&
o_len >= max_field_len && n_len >= max_field_len &&
memcmp(o_ptr, n_ptr, max_field_len) == 0) {
dfield_t *vfield = dtuple_get_nth_v_field(uvect->old_vrow, num_v);
col->copy_type(dfield_get_type(vfield));
if (is_multi_value) {
innobase_get_multi_value(prebuilt->m_mysql_table, i, vfield, nullptr,
static_cast<uint>(old_row - new_row), comp,
uvect->heap);
} else {
buf = innodb_fill_old_vcol_val(prebuilt, vfield, o_len, col,
old_mysql_row_col, col_pack_len, buf);
}
num_v++;
continue;
}
}
// After analysis, I believe that this SQL should enter the branch below, rather than the one above, so the crash probably occurs in this branch:
if (o_len != n_len || (o_len != UNIV_SQL_NULL && o_len != 0 &&
0 != memcmp(o_ptr, n_ptr, o_len))) {
/* The field has changed */
bool multi_value_calc_by_diff = false;
dfield_t old_field, new_field;
ufield = uvect->fields + n_changed;
UNIV_MEM_INVALID(ufield, sizeof *ufield);
/* Let us use a dummy dfield to make the conversion
from the MySQL column format to the InnoDB format */
/* If the length of new geometry object is 0, means
this object is invalid geometry object, we need
to block it. */
if (DATA_GEOMETRY_MTYPE(col_type) && o_len != 0 && n_len == 0) {
return (DB_CANT_CREATE_GEOMETRY_OBJECT);
}
// rest of the code omitted
How to repeat:
I'm sorry that I currently cannot reproduce this bug reliably. I suspect that during execution, o_len might be assigned an incorrect value, causing memcmp to access out-of-bounds memory.
I have also reviewed the source code but have not found a complete chain of evidence to support this yet. I am continuing to investigate this bug.
If your team has any internal insights or ideas, I would greatly appreciate if you could share them.
I will continue to study and attempt to reproduce this bug. If I can reproduce it reliably, I will continue to update and refine the details here.