Bug #85585 InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0
Submitted: 22 Mar 2017 21:36 Modified: 27 Apr 2017 9:09
Reporter: Roel Van de Paar (OCA) Email Updates:
Status: In progress Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:5.6.35-80.0-debug-log OS:Any
Assigned to: CPU Architecture:Any

[22 Mar 2017 21:36] Roel Van de Paar
Description:
This is a placeholder for a bug I cannot repeat at the moment. The output is from Percona Server, but I expect the same exists in MySQL.

2017-03-20 21:24:26 8253 [Note] /sda/PS200317-percona-server-5.6.35-80.0-linux-x86_64-debug/bin/mysqld: Normal shutdown

2017-03-20 21:24:26 8253 [Note] Giving 2 client threads a chance to die gracefully
2017-03-20 21:24:26 8253 [Note] Event Scheduler: Killing the scheduler thread, thread id 7
2017-03-20 21:24:26 8253 [Note] Event Scheduler: Waiting for the scheduler thread to reply
2017-03-20 21:24:26 8253 [Note] Event Scheduler: Stopped
2017-03-20 21:24:26 8253 [Note] Event Scheduler: Purging the queue. 0 events
2017-03-20 21:24:26 8253 [Note] Shutting down slave threads
2017-03-20 21:24:27 8253 [ERROR] Could not delete table /dev/shm/802831/148/tmp/#sql203d_5_48 because another transaction has accessed the table. To drop the table, make sure no transactions touch the table.
2017-03-20 21:24:27 8253 [Warning] Could not remove temporary table: '/dev/shm/802831/148/tmp/#sql203d_5_48', error: 0
2017-03-20 21:24:28 8253 [Note] Forcefully disconnecting 0 remaining clients
2017-03-20 21:24:28 8253 [Note] Binlog end
...
2017-03-20 21:24:30 8253 [Note] InnoDB: FTS optimize thread exiting.
2017-03-20 21:24:30 8253 [Note] InnoDB: Starting shutdown...
2017-03-20 21:24:31 8253 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
2017-03-20 21:24:31 7f00e3fff700  InnoDB: Assertion failure in thread 139641801930496 in file buf0flu.cc line 2807
InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0

Core was generated by `/sda/PS200317-percona-server-5.6.35-80.0-linux-x86_64-debug/bin/mysqld --no-def'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimised out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
62	../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x7f00e3fff700 (LWP 8334))]
(gdb) bt
#0  __pthread_kill (threadid=<optimised out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
#1  0x000055a0a0bd070b in my_write_core (sig=6) at /git/PS56_dbg/mysys/stacktrace.c:424
#2  0x000055a0a0811278 in handle_fatal_signal (sig=6) at /git/PS56_dbg/sql/signal_handler.cc:236
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#5  0x00007f00ffd9d3ea in __GI_abort () at abort.c:89
#6  0x000055a0a0e17752 in buf_flush_page_cleaner_thread (arg=0x0) at /git/PS56_dbg/storage/innobase/buf/buf0flu.cc:2807
#7  0x00007f01009dc6ca in start_thread (arg=0x7f00e3fff700) at pthread_create.c:333
#8  0x00007f00ffe6e0af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

How to repeat:
No testcase yet, feel free to leave bug as open/expired
[22 Mar 2017 21:37] Roel Van de Paar
Perhaps related to bug 58895
[22 Mar 2017 21:43] Miguel Solorzano
Please re-open when you will able to provide the test case. Thanks.
[22 Mar 2017 22:11] Roel Van de Paar
Uploaded a core dump (file: 'core') + mysqld (Percona Server) + ldd files. This should allow core analysis for triage.

Uploaded bug85585.tar.gz to //support/incoming/bug85585.tar.gz
[23 Mar 2017 15:05] Sinisa Milivojevic
Hi Roel,

I truly doubt that we will be able to repeat the bug without a proper test case. When can give it a try, however.

Still, couple of questions:

1. What exactly is the OS on which this happened, version and other detail

2. Has this ever occurred with 5.7 ???
[24 Mar 2017 1:13] Roel Van de Paar
Hi Sinisa,

Thank you! May be worthwile for a developer to analyze the core too, perhaps it is easy to see in combination with checking the code what the issue is.

1) Ubuntu 16.10, fully updated as of yesterday
2) I do not recall seeing it there yet, but the occurance rate of this bug is low so it may be in there too

One of our senior developers is also having a look at the core.

Thanks again
[24 Mar 2017 17:21] Sinisa Milivojevic
Hi Roel,

Sorry for disturbing you, but some things changed interim. Could you please, reload that file with a different name:

mysql-bug-data-85585.zip

that also means different format, but that is OK for me.

Thanks in advance.
[25 Apr 2017 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[25 Apr 2017 6:40] Roel Van de Paar
$ sftp -oPort=2021 -oUser=myemail@mydomain.com sftp.oracle.com:/support/incoming
ssh_dispatch_run_fatal: Connection to 141.146.1.165 port 2021: DH GEX group out of range
Couldn't read packet: Connection reset by peer
[25 Apr 2017 6:43] Roel Van de Paar
emailed the requested bug file to Sinisa Milivojevic
[25 Apr 2017 14:42] Sinisa Milivojevic
Just to confirm that I have got ZIP file.
[25 Apr 2017 14:59] Sinisa Milivojevic
Roel,

Please, do confirm that you have sent me what I see in that ZIP file. You have sent me mysqld binary , with core file and shared libraries that you used. 

Am I correct ???

I know that you are aware that we would much more prefer a repeatable test case, because this is post-mortem info, so its findings do not necessarily have to be revealing !!!

Hence, we might require a test case, after taking a look at the stacktrace.
[25 Apr 2017 17:21] Sinisa Milivojevic
Hi Roel,

This is just to inform you that this is, eventually, some possible race condition.

This will require lot's of time for analysis, while the results could still be inconclusive, since we have only post-mortem values.

We shall let you know of the result.
[26 Apr 2017 1:16] Roel Van de Paar
Sinisa, thank you! Yes, in this case I could not reproduce it. This happens very infrequently. In such cases, I try and provide a core dump and hopefully some developer or engineer can see what happened.
[26 Apr 2017 7:15] Shane Bester
Feel free to make this public.  I wrote some GDB macros to print out info.

(gdb) bt
#0  0x00007f01009e36ff in pthread_kill () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000055a0a0bd070b in my_write_core (sig=6) at /git/PS56_dbg/mysys/stacktrace.c:424
#2  0x000055a0a0811278 in handle_fatal_signal (sig=6) at /git/PS56_dbg/sql/signal_handler.cc:236
#3  <signal handler called>
#4  0x00007f00ffd9b7ef in raise () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libc.so.6
#5  0x00007f00ffd9d3ea in abort () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libc.so.6
#6  0x000055a0a0e17752 in buf_flush_page_cleaner_thread (arg=0x0) at /git/PS56_dbg/storage/innobase/buf/buf0flu.cc:2807
#7  0x00007f01009dc6ca in start_thread () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libpthread.so.0
#8  0x00007f00ffe6e0af in clone () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libc.so.6
(gdb) frame 6
#6  0x000055a0a0e17752 in buf_flush_page_cleaner_thread (arg=0x0) at /git/PS56_dbg/storage/innobase/buf/buf0flu.cc:2807
2807                    ut_a(UT_LIST_GET_LEN(buf_pool->flush_list) == 0);
(gdb)

(gdb) print_innodb_flush_list56
------------------
InnoDB Flush Lists
------------------
Number of buffer pool instances: 1
Flush list in buffer pool instance 0 (print (buf_pool_t) 0x7f00fafc5878 )
  Count: 9 , Start: 0x7f00ef7d4ee0 , End: 0x7f00ef7fdf68
  Printing 9 buf_page_t Items...
---------------------------------------------------------------------------------------------------------------------
|      Print Structure           | Space | Offset | BufFixCnt | IOFix | State | FlushType | InFlushList | AccessTime |
---------------------------------------------------------------------------------------------------------------------
| p *(buf_page_t*)0x7f00ef7d4ee0 |  000  |  012   |    000    |  000  |  005  |    001    |     001     | 3946763205 |
| p *(buf_page_t*)0x7f00ef7fd170 |  000  |  006   |    000    |  000  |  005  |    001    |     001     | 3946762145 |
| p *(buf_page_t*)0x7f00ef7d0908 |  000  |  009   |    000    |  000  |  005  |    001    |     001     | 3946764754 |
| p *(buf_page_t*)0x7f00ef7d5830 |  000  |  008   |    000    |  000  |  005  |    001    |     001     | 3946762190 |
| p *(buf_page_t*)0x7f00ef7d5388 |  000  |  010   |    000    |  000  |  005  |    001    |     001     | 3946763204 |
| p *(buf_page_t*)0x7f00ef7cffb8 |  000  |  308   |    000    |  000  |  005  |    001    |     001     | 3946764755 |
| p *(buf_page_t*)0x7f00ef7d0460 |  000  |  307   |    000    |  000  |  005  |    001    |     001     | 3946764755 |
| p *(buf_page_t*)0x7f00ef79ecd0 |  000  |  411   |    000    |  000  |  005  |    001    |     001     | 3946765204 |
| p *(buf_page_t*)0x7f00ef7fdf68 |  000  |  011   |    000    |  000  |  005  |    001    |     001     | 3946762145 |
---------------------------------------------------------------------------------------------------------------------
(gdb)
[26 Apr 2017 7:42] Shane Bester
Wonder if https://bugs.mysql.com/bug.php?id=77214 is related...
[26 Apr 2017 15:09] Sinisa Milivojevic
Shane,

Thanks a  lot. I have also got the same info from the core file and some more, which I might upload once we resolve this bug, one way or another.

Roel,

This bug is in queue to be analyzed by developers, so nothing can be reported now. When the data from core file are fully analyzed, we shall write a feedback here. We do not yet know whether core file contains info that would explain this race or not. 

What we can definitely do is let you know when the thorough analysis starts, as we are pretty booked up , right now.
[26 Apr 2017 15:10] Sinisa Milivojevic
Shane,

I analyzed #77214 and I do not think that it is the same problem.

Thanks for the info, though ...
[27 Apr 2017 9:09] Roel Van de Paar
Thank you Shane & Sinisa
[27 Apr 2017 9:09] Roel Van de Paar
Shane, great on the gdb info macro's - are you at liberty to share them?