MySQL Bugs: #85585: InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush

Bug #85585	InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0
Submitted:	22 Mar 2017 21:36	Modified:	25 Nov 2020 3:39
Reporter:	Roel Van de Paar	Email Updates:
Status:	In progress	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	5.6.35-80.0-debug-log	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
This is a placeholder for a bug I cannot repeat at the moment. The output is from Percona Server, but I expect the same exists in MySQL.

2017-03-20 21:24:26 8253 [Note] /sda/PS200317-percona-server-5.6.35-80.0-linux-x86_64-debug/bin/mysqld: Normal shutdown

2017-03-20 21:24:26 8253 [Note] Giving 2 client threads a chance to die gracefully
2017-03-20 21:24:26 8253 [Note] Event Scheduler: Killing the scheduler thread, thread id 7
2017-03-20 21:24:26 8253 [Note] Event Scheduler: Waiting for the scheduler thread to reply
2017-03-20 21:24:26 8253 [Note] Event Scheduler: Stopped
2017-03-20 21:24:26 8253 [Note] Event Scheduler: Purging the queue. 0 events
2017-03-20 21:24:26 8253 [Note] Shutting down slave threads
2017-03-20 21:24:27 8253 [ERROR] Could not delete table /dev/shm/802831/148/tmp/#sql203d_5_48 because another transaction has accessed the table. To drop the table, make sure no transactions touch the table.
2017-03-20 21:24:27 8253 [Warning] Could not remove temporary table: '/dev/shm/802831/148/tmp/#sql203d_5_48', error: 0
2017-03-20 21:24:28 8253 [Note] Forcefully disconnecting 0 remaining clients
2017-03-20 21:24:28 8253 [Note] Binlog end
...
2017-03-20 21:24:30 8253 [Note] InnoDB: FTS optimize thread exiting.
2017-03-20 21:24:30 8253 [Note] InnoDB: Starting shutdown...
2017-03-20 21:24:31 8253 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
2017-03-20 21:24:31 7f00e3fff700  InnoDB: Assertion failure in thread 139641801930496 in file buf0flu.cc line 2807
InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0

Core was generated by `/sda/PS200317-percona-server-5.6.35-80.0-linux-x86_64-debug/bin/mysqld --no-def'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimised out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
62	../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x7f00e3fff700 (LWP 8334))]
(gdb) bt
#0  __pthread_kill (threadid=<optimised out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
#1  0x000055a0a0bd070b in my_write_core (sig=6) at /git/PS56_dbg/mysys/stacktrace.c:424
#2  0x000055a0a0811278 in handle_fatal_signal (sig=6) at /git/PS56_dbg/sql/signal_handler.cc:236
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#5  0x00007f00ffd9d3ea in __GI_abort () at abort.c:89
#6  0x000055a0a0e17752 in buf_flush_page_cleaner_thread (arg=0x0) at /git/PS56_dbg/storage/innobase/buf/buf0flu.cc:2807
#7  0x00007f01009dc6ca in start_thread (arg=0x7f00e3fff700) at pthread_create.c:333
#8  0x00007f00ffe6e0af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

How to repeat:
No testcase yet, feel free to leave bug as open/expired

Perhaps related to bug 58895

Please re-open when you will able to provide the test case. Thanks.

Uploaded a core dump (file: 'core') + mysqld (Percona Server) + ldd files. This should allow core analysis for triage.

Uploaded bug85585.tar.gz to //support/incoming/bug85585.tar.gz

Hi Roel,

I truly doubt that we will be able to repeat the bug without a proper test case. When can give it a try, however.

Still, couple of questions:

1. What exactly is the OS on which this happened, version and other detail

2. Has this ever occurred with 5.7 ???

Hi Sinisa,

Thank you! May be worthwile for a developer to analyze the core too, perhaps it is easy to see in combination with checking the code what the issue is.

1) Ubuntu 16.10, fully updated as of yesterday
2) I do not recall seeing it there yet, but the occurance rate of this bug is low so it may be in there too

One of our senior developers is also having a look at the core.

Thanks again

Hi Roel,

Sorry for disturbing you, but some things changed interim. Could you please, reload that file with a different name:

mysql-bug-data-85585.zip

that also means different format, but that is OK for me.

Thanks in advance.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

$ sftp -oPort=2021 -oUser=myemail@mydomain.com sftp.oracle.com:/support/incoming
ssh_dispatch_run_fatal: Connection to 141.146.1.165 port 2021: DH GEX group out of range
Couldn't read packet: Connection reset by peer

emailed the requested bug file to Sinisa Milivojevic

Just to confirm that I have got ZIP file.

Roel,

Please, do confirm that you have sent me what I see in that ZIP file. You have sent me mysqld binary , with core file and shared libraries that you used. 

Am I correct ???

I know that you are aware that we would much more prefer a repeatable test case, because this is post-mortem info, so its findings do not necessarily have to be revealing !!!

Hence, we might require a test case, after taking a look at the stacktrace.

Hi Roel,

This is just to inform you that this is, eventually, some possible race condition.

This will require lot's of time for analysis, while the results could still be inconclusive, since we have only post-mortem values.

We shall let you know of the result.

Sinisa, thank you! Yes, in this case I could not reproduce it. This happens very infrequently. In such cases, I try and provide a core dump and hopefully some developer or engineer can see what happened.

Feel free to make this public.  I wrote some GDB macros to print out info.

(gdb) bt
#0  0x00007f01009e36ff in pthread_kill () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000055a0a0bd070b in my_write_core (sig=6) at /git/PS56_dbg/mysys/stacktrace.c:424
#2  0x000055a0a0811278 in handle_fatal_signal (sig=6) at /git/PS56_dbg/sql/signal_handler.cc:236
#3  <signal handler called>
#4  0x00007f00ffd9b7ef in raise () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libc.so.6
#5  0x00007f00ffd9d3ea in abort () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libc.so.6
#6  0x000055a0a0e17752 in buf_flush_page_cleaner_thread (arg=0x0) at /git/PS56_dbg/storage/innobase/buf/buf0flu.cc:2807
#7  0x00007f01009dc6ca in start_thread () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libpthread.so.0
#8  0x00007f00ffe6e0af in clone () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libc.so.6
(gdb) frame 6
#6  0x000055a0a0e17752 in buf_flush_page_cleaner_thread (arg=0x0) at /git/PS56_dbg/storage/innobase/buf/buf0flu.cc:2807
2807                    ut_a(UT_LIST_GET_LEN(buf_pool->flush_list) == 0);
(gdb)

(gdb) print_innodb_flush_list56
------------------
InnoDB Flush Lists
------------------
Number of buffer pool instances: 1
Flush list in buffer pool instance 0 (print (buf_pool_t) 0x7f00fafc5878 )
  Count: 9 , Start: 0x7f00ef7d4ee0 , End: 0x7f00ef7fdf68
  Printing 9 buf_page_t Items...
---------------------------------------------------------------------------------------------------------------------
|      Print Structure           | Space | Offset | BufFixCnt | IOFix | State | FlushType | InFlushList | AccessTime |
---------------------------------------------------------------------------------------------------------------------
| p *(buf_page_t*)0x7f00ef7d4ee0 |  000  |  012   |    000    |  000  |  005  |    001    |     001     | 3946763205 |
| p *(buf_page_t*)0x7f00ef7fd170 |  000  |  006   |    000    |  000  |  005  |    001    |     001     | 3946762145 |
| p *(buf_page_t*)0x7f00ef7d0908 |  000  |  009   |    000    |  000  |  005  |    001    |     001     | 3946764754 |
| p *(buf_page_t*)0x7f00ef7d5830 |  000  |  008   |    000    |  000  |  005  |    001    |     001     | 3946762190 |
| p *(buf_page_t*)0x7f00ef7d5388 |  000  |  010   |    000    |  000  |  005  |    001    |     001     | 3946763204 |
| p *(buf_page_t*)0x7f00ef7cffb8 |  000  |  308   |    000    |  000  |  005  |    001    |     001     | 3946764755 |
| p *(buf_page_t*)0x7f00ef7d0460 |  000  |  307   |    000    |  000  |  005  |    001    |     001     | 3946764755 |
| p *(buf_page_t*)0x7f00ef79ecd0 |  000  |  411   |    000    |  000  |  005  |    001    |     001     | 3946765204 |
| p *(buf_page_t*)0x7f00ef7fdf68 |  000  |  011   |    000    |  000  |  005  |    001    |     001     | 3946762145 |
---------------------------------------------------------------------------------------------------------------------
(gdb)

Wonder if https://bugs.mysql.com/bug.php?id=77214 is related...

Shane,

Thanks a  lot. I have also got the same info from the core file and some more, which I might upload once we resolve this bug, one way or another.

Roel,

This bug is in queue to be analyzed by developers, so nothing can be reported now. When the data from core file are fully analyzed, we shall write a feedback here. We do not yet know whether core file contains info that would explain this race or not. 

What we can definitely do is let you know when the thorough analysis starts, as we are pretty booked up , right now.

Shane,

I analyzed #77214 and I do not think that it is the same problem.

Thanks for the info, though ...

Thank you Shane & Sinisa

Shane, great on the gdb info macro's - are you at liberty to share them?

Any updates?

No, not yet ......

This bug requires lots of time and effort, which is not available yet ......

I was able to reproduce it on MariaDB 10.4.10. Couple of times already. It doesn't always happen on our hosts, it happen once in 10 executions on average.

The way to reproduce on my end was creating 8 processes in python that will delete 100 records per each delete query in one table. 

This procedure is taking place when database is prepared for a shutdown, meaning replication is OFF and no traffic coming in or going out.

Additional parameters:
sql_log_bin=0 and innodb_flush_log_at_trx_commit=0

Next code is capable of deleting ~5 million records in 2 minutes:

```
...
    func_args = list()
        for i in range(from_id, to_id, 100000):
            last_id = (i + 100000) if (i + 100000) < to_id else to_id
            func_args.append((table_name, i, last_id))

    with multiprocessing.Pool(8) as p:
        p.starmap(self.delete_range, func_args)

    def delete_range(self, table_name, from_id, to_id):
        conn = mysql.connector.connect(**self.connect_args)
        cursor = conn.cursor()
        cursor.execute("SET sql_log_bin=0;")
        cursor.execute("SET GLOBAL innodb_flush_log_at_trx_commit=0;")

        for i in range(from_id, to_id, 100):
            last_id = (i + 100) if (i + 100) < to_id else to_id
            #  delete multiple ids at the time
            cursor.execute("DELETE FROM {} WHERE id IN {};".format(table_name, tuple(range(i, last_id))))

        cursor.close()
        conn.close()
```

On my end the bug is still for the same function: UT_LIST_GET_LEN but a different location

```
2023-05-31  8:11:21 0 [Note] InnoDB: Starting shutdown...
2023-05-31 08:11:28 0x7eb6d979d700  InnoDB: Assertion failure in file src/storage/innobase/buf/buf0flu.cc line 3353
InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
230531  8:11:28 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.4.10-MariaDB-log
key_buffer_size=8388608
read_buffer_size=16777216
max_used_connections=29
max_threads=65537
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1208067437 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
<Path>/bin/mysqld(my_print_stacktrace+0x29)[0x55d43315d7b9]
<Path>/bin/mysqld(handle_fatal_signal+0x4d0)[0x55d432c2e8f0]
/lib64/libpthread.so.0(+0x118e0)[0x7f01a7fc38e0]
/lib64/libc.so.6(gsignal+0x110)[0x7f01a715cca0]
/lib64/libc.so.6(abort+0x148)[0x7f01a715e148]
<Path>/bin/mysqld(+0x565705)[0x55d432965705]
<Path>/bin/mysqld(+0xb43a06)[0x55d432f43a06]
/lib64/libpthread.so.0(+0x744b)[0x7f01a7fb944b]
/lib64/libc.so.6(clone+0x3f)[0x7f01a721852f]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /mysql/data
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             10485760             bytes
Max core file size        0                    0                    bytes
Max resident set          unlimited            unlimited            bytes
Max processes             unlimited            unlimited            processes
Max open files            65535                65535                files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       30446                30446                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
Core pattern: core
```

Hi Mr.  Vitaliy Zyukin,

You are on the wrong forum.

This is a forum for MySQL products, made by Oracle.