MySQL Bugs: #85327: MySQL crash on InnoDB: Failing assertion: btr_page_get_prev(get

Bug #85327	MySQL crash on InnoDB: Failing assertion: btr_page_get_prev(get_page, mtr) == bu
Submitted:	6 Mar 2017 18:09	Modified:	17 Feb 2022 13:40
Reporter:	Jose Chinchilla	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5	OS:	Linux (btr_page_get_prev and btr_page_get_next)
Assigned to:		CPU Architecture:	Any

Description:
I'm having this issue on alert log that is restarting mysqld service:

InnoDB: Failing assertion: btr_page_get_next(get_page, mtr) == buf_frame_get_page_no(page)

InnoDB: Failing assertion: btr_page_get_prev(get_page, mtr) == buf_frame_get_page_no(page)

This is the complete alert log:

170304  8:54:50  InnoDB: Assertion failure in thread 140493445449472 in file btr/btr0cur.c line 178
InnoDB: Failing assertion: btr_page_get_next(get_page, mtr) == buf_frame_get_page_no(page)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
14:54:50 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

key_buffer_size=8384512
read_buffer_size=131072
max_used_connections=24
max_threads=1000
thread_count=23
connection_count=23
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2194578 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x34a8580
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fc72de8bd98 thread_stack 0x40000
/usr/libexec/mysqld(my_print_stacktrace+0x29) [0x84e0a9]
/usr/libexec/mysqld(handle_fatal_signal+0x483) [0x6a2743]
/lib64/libpthread.so.0() [0x399780f500]
/lib64/libc.so.6(gsignal+0x35) [0x39974328a5]
/lib64/libc.so.6(abort+0x175) [0x3997434085]
/usr/libexec/mysqld() [0x738842]
/usr/libexec/mysqld(btr_cur_search_to_nth_level+0xa95) [0x739755]
/usr/libexec/mysqld(row_ins_index_entry_low+0x13b) [0x7a711b]
/usr/libexec/mysqld(row_ins+0x143) [0x7a8623]
/usr/libexec/mysqld(row_ins_step+0x129) [0x7a8969]
/usr/libexec/mysqld(row_insert_for_mysql+0x236) [0x7aa4f6]
/usr/libexec/mysqld(ha_innobase::write_row(unsigned char*)+0xfb) [0x72fe9b]
/usr/libexec/mysqld(handler::ha_write_row(unsigned char*)+0x59) [0x698649]
/usr/libexec/mysqld(write_record(THD*, st_table*, st_copy_info*)+0x6f) [0x636e7f]
/usr/libexec/mysqld(mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool)+0xb37) [0x63b257]
/usr/libexec/mysqld(mysql_execute_command(THD*)+0xb89) [0x5c9c19]
/usr/libexec/mysqld(mysql_parse(THD*, char*, unsigned int, char const**)+0x2d3) [0x5ceaf3]
/usr/libexec/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xd15) [0x5d0b95]
/usr/libexec/mysqld(do_command(THD*)+0xea) [0x5d16ea]
/usr/libexec/mysqld(handle_one_connection+0x23e) [0x5c49ce]
/lib64/libpthread.so.0() [0x3997807851]
/lib64/libc.so.6(clone+0x6d) [0x39974e890d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fc6dc004a20): is an invalid pointer
Connection ID (thread ID): 19
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
170304 08:54:50 mysqld_safe Number of processes running now: 0
170304 08:54:50 mysqld_safe mysqld restarted

How to repeat:

17-03-04	2:30:48	Failing assertion: btr_page_get_prev(get_page, mtr) == buf_frame_get_page_no(page)
17-03-04	8:50:51	Failing assertion: btr_page_get_prev(get_page, mtr) == buf_frame_get_page_no(page)
17-03-04	8:54:50	Failing assertion: btr_page_get_next(get_page, mtr) == buf_frame_get_page_no(page)
17-03-04	8:59:35	Failing assertion: btr_page_get_prev(get_page, mtr) == buf_frame_get_page_no(page)
17-03-04	9:09:04	Failing assertion: btr_page_get_next(get_page, mtr) == buf_frame_get_page_no(page)
17-03-05	2:30:50	Failing assertion: btr_page_get_prev(get_page, mtr) == buf_frame_get_page_no(page)
17-03-05	2:30:54	Failing assertion: btr_page_get_prev(get_page, mtr) == buf_frame_get_page_no(page)
17-03-06	8:46:37	Failing assertion: btr_page_get_next(get_page, mtr) == buf_frame_get_page_no(page)
17-03-06	8:51:21	Failing assertion: btr_page_get_next(get_page, mtr) == buf_frame_get_page_no(page)
17-03-06		10:04:42	Failing assertion: btr_page_get_prev(get_page, mtr) == buf_frame_get_page_no(page)
17-03-06	10:05:38	Failing assertion: btr_page_get_prev(get_page, mtr) == buf_frame_get_page_no(page)

Hi!

What you see are the consequences of the InnoDB page corruption. For your information, the crashes that you experience, are forced by InnoDB. InnoDB HAS to crash the server in order to prevent that bad data gets written into the table(s).  This is due to the ACID conformance.

Corruption is discovered, most likely, because the root page of the index tree of the clustered index of db1.table10 is full of zero bytes. This can be caused by the errors in the RAM modules, errors in the disk system, errors in the disk controller or the errors in the cache on the controller or the disk.

In order to locate the problem, you should first analyze your RAM modules. If you use ECC RAM, two bits checking, one bit correcting, then problem is not there. If you do not use the reliable ECC RAM, then you have to check your RAM very thoroughly, with specialized tools, which will take  lot's of time.

Next on the list is checking hard disks or SSD. If you have mirrored disks , or disks with parity, then the report should be logged. Otherwise, you are facing another extensive testing.

Checking of the dedicated caches should be done in accordance with manufacturers recommendations. 

Last, but not least, check the system logs, for the last couple of weeks for the report of any hardware problem. Especially check logs around the time when asserts occurred.

Hi Sinisa, thank you very much for your answer, we will be using our hardware diagnostic tool, I'll be in contact if I find something related.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

we are also hit it .   slave database crash.
 
2022-02-17T12:15:54.786989+08:00 5 [Note] Multi-threaded slave statistics for channel '': seconds elapsed = 130; events assigned = 11587807233; worker queues filled over overrun level = 706483; waited due a Worker queue full = 72764; waited due the total size = 12263875; waited at clock conflicts = 384564899141400 waited (count) when Workers occupied = 31291857 waited when Workers occupied = 380547160900
2022-02-17 12:16:52 0x7f9f48448700  InnoDB: Assertion failure in thread 140322088978176 in file btr0cur.cc line 325
InnoDB: Failing assertion: btr_page_get_next( latch_leaves.blocks[0]->frame, mtr) == page_get_page_no(page)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
04:16:52 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=2097152
read_buffer_size=1048576
max_used_connections=11
max_threads=214
thread_count=6
connection_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 443178 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f9f280008c0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f9f48447e30 thread_stack 0x40000
/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x2c)[0xeb750c]
/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x451)[0x7a5601]
/lib64/libpthread.so.0(+0xf100)[0x7fa05e3df100]
/lib64/libc.so.6(gsignal+0x37)[0x7fa05cfe05f7]
/lib64/libc.so.6(abort+0x148)[0x7fa05cfe1ce8]
/usr/local/mysql/bin/mysqld[0x775ee3]
/usr/local/mysql/bin/mysqld(_Z20btr_cur_latch_leavesP11buf_block_tRK9page_id_tRK11page_size_tmP9btr_cur_tP5mtr_t+0x8ac)[0x109790c]
/usr/local/mysql/bin/mysqld(_Z27btr_cur_search_to_nth_levelP12dict_index_tmPK8dtuple_t15page_cur_mode_tmP9btr_cur_tmPKcmP5mtr_t+0x19fc)[0x109e90c]
/usr/local/mysql/bin/mysqld(_Z30btr_pcur_restore_position_funcmP10btr_pcur_tPKcmP5mtr_t+0x249)[0x10a5459]
/usr/local/mysql/bin/mysqld[0xfeb230]
/usr/local/mysql/bin/mysqld(_Z14row_purge_stepP9que_thr_t+0x659)[0xfedb59]
/usr/local/mysql/bin/mysqld(_Z15que_run_threadsP9que_thr_t+0x83a)[0xf9d64a]
/usr/local/mysql/bin/mysqld(srv_worker_thread+0x255)[0x10236a5]
/lib64/libpthread.so.0(+0x7dc5)[0x7fa05e3d7dc5]
/lib64/libc.so.6(clone+0x6d)[0x7fa05d0a11cd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): Connection ID (thread ID): 0
Status: NOT_KILLED

Hi All,

Please read our comments on the most probable cause of this assert.

Errors in hardware or OS can be diagnosed or they can be transient. Glitches that are transient can not be diagnosed. But, if you use reliable hardware, most of all, ECC RAM 2 bytes checking, 1 byte correcting, you will be much safer.

Otherwise, we can not do anything without a proper and fully repeatable test case.