MySQL Bugs: #110343: Reproducible crash with assertion failure when updating

Bug #110343	Reproducible crash with assertion failure when updating
Submitted:	10 Mar 2023 22:00	Modified:	20 Aug 2024 15:27
Reporter:	Tobias Liefke	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	8.0.32-1.el8	OS:	Linux (Official MySQL Docker image)
Assigned to:		CPU Architecture:	x86

Description:
I've added a new bit column to an existing table.

To initialize the column I run
UPDATE MySchema.MyTable SET myColumn = 0

Upon the statement the server crashes with:

2023-03-10T18:09:11.630668Z 10 [ERROR] [MY-011855] [InnoDB] Page old data size 13724 new data size 8356, page old max ins size 2522 new max ins size 7890
2023-03-10T18:09:11.630690Z 10 [ERROR] [MY-011856] [InnoDB] Submit a detailed bug report to http://bugs.mysql.com
2023-03-10T18:09:11.630702Z 10 [ERROR] [MY-013183] [InnoDB] Assertion failure: btr0cur.cc:3699:rec thread 140355438716672
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2023-03-10T18:09:11Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=6b049f17400f850658b2eb3ff165ec9a085d9655
Thread pointer: 0x7fa690000fe0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fa70c111c50 thread_stack 0x100000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x213de91]
/usr/sbin/mysqld(print_fatal_signal(int)+0x387) [0xfdeac7]
/usr/sbin/mysqld(my_server_abort()+0x7e) [0xfdec1e]
/usr/sbin/mysqld(my_abort()+0xe) [0x2137c7e]
/usr/sbin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x33a) [0x243b28a]
/usr/sbin/mysqld(btr_cur_optimistic_update(unsigned long, btr_cur_t*, unsigned long**, mem_block_info_t**, upd_t const*, unsigned long, que_thr_t*, unsigned long, mtr_t*)+0x91a) [0x247de9a]
/usr/sbin/mysqld() [0x23be33b]
/usr/sbin/mysqld() [0x23beead]
/usr/sbin/mysqld(row_upd_step(que_thr_t*)+0xb28) [0x23c3e98]
/usr/sbin/mysqld() [0x238bb11]
/usr/sbin/mysqld(ha_innobase::update_row(unsigned char const*, unsigned char*)+0x360) [0x224eeb0]
/usr/sbin/mysqld(handler::ha_update_row(unsigned char const*, unsigned char*)+0x20b) [0x10f444b]
/usr/sbin/mysqld(Sql_cmd_update::update_single_table(THD*)+0x1d05) [0xf52df5]
/usr/sbin/mysqld(Sql_cmd_dml::execute(THD*)+0x189) [0xed3799]
/usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0xb92) [0xe70522]
/usr/sbin/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x4de) [0xe73f8e]
/usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0xd4c) [0xe7526c]
/usr/sbin/mysqld(do_command(THD*)+0x207) [0xe77517]
/usr/sbin/mysqld() [0xfcea40]
/usr/sbin/mysqld() [0x290c7f9]
/lib64/libpthread.so.0(+0x81da) [0x7fa71ebe11da]
/lib64/libc.so.6(clone+0x43) [0x7fa71d190e73]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fa690268670): UPDATE MySchema.MyTable SET MyColumn = 0
Connection ID (thread ID): 10
Status: NOT_KILLED

If I run with innodb_force_recovery = 1 the server comes up and is working normally.
If I run the same statement again, the server crashes.
If I remove the column, add it again and set the value to 0, the server crashes.

On another host (with the same schema and nearly the same content) I had run the same statements without any problems.
I've got a dump from which I could restart with a new tablespace, but I wanted to report this to prevent the same problem in the future.

How to repeat:
I don't know how to offer steps to repeat without providing my (private) database.

Hi Mr. Liefke,

Thank you for your bug report.

However, we can not reproduce what you report.

We have used one of our tables, added a column of the BIT(8) type, updated it and had no problems.

Hence, what we need is a fully repeatable test case. 

CREATE TABLE, all necessary INSERTs that are needed to recreate the crash and the exact DDL that added the BIT columns. The update statement we already have.

When we have all these data from you, then we shall try to repeat the behaviour.

Can't repeat.

When I rebuild the database from the backup, I can't reproduce the error either. So somehow the tablespace seems to be corrupt. But I had hoped that the stacktrace would help to narrow the problem down.

Hi,

No , stacktrace could not help as it is unique. The only way that it could help is if already had a verified bug with the exact (or very similar) stacktrace .......

This could be the effect of the transient error on the HDD or RAM, but do remember that transient errors are not detectable by diagnostic software.

This is a reason why many of us use ECC parity-checking RAM and RAID HDD.

Thinking about it, I had a similar problem with another system two months ago. Unfortunately I didn't keep the error log so I can't tell if it is exactly the same problem.

But both tablespaces were migrated from 5.7.38 to 8.0.31 not long ago and had the error during schema updates. Which lets me believe, that it has more to do with the migration procedure and is no hardware problem.

This would also explain, why I can't reproduce the problem after restoring the dump, because this recreates the structures with version 8.0 instead of reusing the ones from 5.7

Which makes it still difficult to produce a test case.

Sinisa,
We see nearly identical crash, and we have completed running memcheck86, and the server uses ECC memory... so it is not a hardware error.

2024-08-09T09:36:27.771135Z 7 [ERROR] [MY-011855] [InnoDB] Page old data size 12121 new data size 12097, page old max ins size 4116 new max ins size 4140
2024-08-09T09:36:27.771192Z 7 [ERROR] [MY-013183] [InnoDB] Assertion failure: btr0cur.cc:3758:rec thread 140001571804928
...
Thread pointer: 0x7f439840e000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f54a7f48d58 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2e) [0x560e362c7a1e]
/usr/sbin/mysqld(handle_fatal_signal+0x3b3) [0x560e355a88a3]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7f54b8193730]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b) [0x7f54b78a77bb]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x121) [0x7f54b7892535]
/usr/sbin/mysqld(+0xf1337f) [0x560e3511137f]
/usr/sbin/mysqld(btr_cur_optimistic_update(unsigned long, btr_cur_t*, unsigned long**, mem_block_info_t**, upd_t const*, unsigned long, que_thr_t*, unsigned long, mtr_t*)+0x6ff) [0x560e365cc14f]
/usr/sbin/mysqld(+0x230bd87) [0x560e36509d87]
/usr/sbin/mysqld(+0x23110a8) [0x560e3650f0a8]
/usr/sbin/mysqld(row_upd_step(que_thr_t*)+0xb9c) [0x560e3651089c]
/usr/sbin/mysqld(+0x22d0ded) [0x560e364ceded]
/usr/sbin/mysqld(row_update_for_mysql(unsigned char const*, row_prebuilt_t*)+0x30) [0x560e364d1bb0]
/usr/sbin/mysqld(ha_innobase::update_row(unsigned char const*, unsigned char*)+0x1d1) [0x560e363bfda1]
/usr/sbin/mysqld(handler::ha_update_row(unsigned char const*, unsigned char*)+0x1b3) [0x560e35184143]
/usr/sbin/mysqld(Update_rows_log_event::do_exec_row(Relay_log_info const*)+0xb3) [0x560e35f83433]
/usr/sbin/mysqld(Rows_log_event::do_apply_row(Relay_log_info const*)+0x26) [0x560e35f70266]
/usr/sbin/mysqld(Rows_log_event::do_index_scan_and_update(Relay_log_info const*)+0x1eb) [0x560e35f8367b]
/usr/sbin/mysqld(Rows_log_event::do_apply_event(Relay_log_info const*)+0x135b) [0x560e35f8ebfb]
/usr/sbin/mysqld(slave_worker_exec_job_group(Slave_worker*, Relay_log_info*)+0x161) [0x560e36008f21]
/usr/sbin/mysqld(+0x1e11de3) [0x560e3600fde3]
/usr/sbin/mysqld(+0x25b9884) [0x560e367b7884]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f54b8188fa3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f54b7968eff]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 7
Status: NOT_KILLED

This is a bit older, 8.0.27, but obviously the record-size calculation issue is still present in newer releases as 8.0.32, and https://bugs.mysql.com/bug.php?id=109301 tells about the same story than this bug report: upgrade 5.7 -> 8.0 causes corruption.
Also both cases appear to involve compressed tables.

I fully understand a bug without a reproducible test case is a weak bug, but repeating crashes across variety of systems and versions should be not be disregarded.

Hi Mr. Liefke,

We have analysed your latest stacktrace and that assertion failure makes your bug report a duplicate of the following one:

https://bugs.mysql.com/bug.php?id=110771

All the essential info in that bug are hidden, because those can be used for the malicious attacks.

Duplicate.