Bug #85585 | InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0 | ||
---|---|---|---|
Submitted: | 22 Mar 2017 21:36 | Modified: | 25 Nov 2020 3:39 |
Reporter: | Roel Van de Paar | Email Updates: | |
Status: | In progress | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S1 (Critical) |
Version: | 5.6.35-80.0-debug-log | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[22 Mar 2017 21:36]
Roel Van de Paar
[22 Mar 2017 21:37]
Roel Van de Paar
Perhaps related to bug 58895
[22 Mar 2017 21:43]
MySQL Verification Team
Please re-open when you will able to provide the test case. Thanks.
[22 Mar 2017 22:11]
Roel Van de Paar
Uploaded a core dump (file: 'core') + mysqld (Percona Server) + ldd files. This should allow core analysis for triage. Uploaded bug85585.tar.gz to //support/incoming/bug85585.tar.gz
[23 Mar 2017 15:05]
MySQL Verification Team
Hi Roel, I truly doubt that we will be able to repeat the bug without a proper test case. When can give it a try, however. Still, couple of questions: 1. What exactly is the OS on which this happened, version and other detail 2. Has this ever occurred with 5.7 ???
[24 Mar 2017 1:13]
Roel Van de Paar
Hi Sinisa, Thank you! May be worthwile for a developer to analyze the core too, perhaps it is easy to see in combination with checking the code what the issue is. 1) Ubuntu 16.10, fully updated as of yesterday 2) I do not recall seeing it there yet, but the occurance rate of this bug is low so it may be in there too One of our senior developers is also having a look at the core. Thanks again
[24 Mar 2017 17:21]
MySQL Verification Team
Hi Roel, Sorry for disturbing you, but some things changed interim. Could you please, reload that file with a different name: mysql-bug-data-85585.zip that also means different format, but that is OK for me. Thanks in advance.
[25 Apr 2017 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[25 Apr 2017 6:40]
Roel Van de Paar
$ sftp -oPort=2021 -oUser=myemail@mydomain.com sftp.oracle.com:/support/incoming ssh_dispatch_run_fatal: Connection to 141.146.1.165 port 2021: DH GEX group out of range Couldn't read packet: Connection reset by peer
[25 Apr 2017 6:43]
Roel Van de Paar
emailed the requested bug file to Sinisa Milivojevic
[25 Apr 2017 14:42]
MySQL Verification Team
Just to confirm that I have got ZIP file.
[25 Apr 2017 14:59]
MySQL Verification Team
Roel, Please, do confirm that you have sent me what I see in that ZIP file. You have sent me mysqld binary , with core file and shared libraries that you used. Am I correct ??? I know that you are aware that we would much more prefer a repeatable test case, because this is post-mortem info, so its findings do not necessarily have to be revealing !!! Hence, we might require a test case, after taking a look at the stacktrace.
[25 Apr 2017 17:21]
MySQL Verification Team
Hi Roel, This is just to inform you that this is, eventually, some possible race condition. This will require lot's of time for analysis, while the results could still be inconclusive, since we have only post-mortem values. We shall let you know of the result.
[26 Apr 2017 1:16]
Roel Van de Paar
Sinisa, thank you! Yes, in this case I could not reproduce it. This happens very infrequently. In such cases, I try and provide a core dump and hopefully some developer or engineer can see what happened.
[26 Apr 2017 7:15]
MySQL Verification Team
Feel free to make this public. I wrote some GDB macros to print out info. (gdb) bt #0 0x00007f01009e36ff in pthread_kill () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libpthread.so.0 #1 0x000055a0a0bd070b in my_write_core (sig=6) at /git/PS56_dbg/mysys/stacktrace.c:424 #2 0x000055a0a0811278 in handle_fatal_signal (sig=6) at /git/PS56_dbg/sql/signal_handler.cc:236 #3 <signal handler called> #4 0x00007f00ffd9b7ef in raise () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libc.so.6 #5 0x00007f00ffd9d3ea in abort () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libc.so.6 #6 0x000055a0a0e17752 in buf_flush_page_cleaner_thread (arg=0x0) at /git/PS56_dbg/storage/innobase/buf/buf0flu.cc:2807 #7 0x00007f01009dc6ca in start_thread () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libpthread.so.0 #8 0x00007f00ffe6e0af in clone () from /home/sbester/bug85585/lib/x86_64-linux-gnu/libc.so.6 (gdb) frame 6 #6 0x000055a0a0e17752 in buf_flush_page_cleaner_thread (arg=0x0) at /git/PS56_dbg/storage/innobase/buf/buf0flu.cc:2807 2807 ut_a(UT_LIST_GET_LEN(buf_pool->flush_list) == 0); (gdb) (gdb) print_innodb_flush_list56 ------------------ InnoDB Flush Lists ------------------ Number of buffer pool instances: 1 Flush list in buffer pool instance 0 (print (buf_pool_t) 0x7f00fafc5878 ) Count: 9 , Start: 0x7f00ef7d4ee0 , End: 0x7f00ef7fdf68 Printing 9 buf_page_t Items... --------------------------------------------------------------------------------------------------------------------- | Print Structure | Space | Offset | BufFixCnt | IOFix | State | FlushType | InFlushList | AccessTime | --------------------------------------------------------------------------------------------------------------------- | p *(buf_page_t*)0x7f00ef7d4ee0 | 000 | 012 | 000 | 000 | 005 | 001 | 001 | 3946763205 | | p *(buf_page_t*)0x7f00ef7fd170 | 000 | 006 | 000 | 000 | 005 | 001 | 001 | 3946762145 | | p *(buf_page_t*)0x7f00ef7d0908 | 000 | 009 | 000 | 000 | 005 | 001 | 001 | 3946764754 | | p *(buf_page_t*)0x7f00ef7d5830 | 000 | 008 | 000 | 000 | 005 | 001 | 001 | 3946762190 | | p *(buf_page_t*)0x7f00ef7d5388 | 000 | 010 | 000 | 000 | 005 | 001 | 001 | 3946763204 | | p *(buf_page_t*)0x7f00ef7cffb8 | 000 | 308 | 000 | 000 | 005 | 001 | 001 | 3946764755 | | p *(buf_page_t*)0x7f00ef7d0460 | 000 | 307 | 000 | 000 | 005 | 001 | 001 | 3946764755 | | p *(buf_page_t*)0x7f00ef79ecd0 | 000 | 411 | 000 | 000 | 005 | 001 | 001 | 3946765204 | | p *(buf_page_t*)0x7f00ef7fdf68 | 000 | 011 | 000 | 000 | 005 | 001 | 001 | 3946762145 | --------------------------------------------------------------------------------------------------------------------- (gdb)
[26 Apr 2017 7:42]
MySQL Verification Team
Wonder if https://bugs.mysql.com/bug.php?id=77214 is related...
[26 Apr 2017 15:09]
MySQL Verification Team
Shane, Thanks a lot. I have also got the same info from the core file and some more, which I might upload once we resolve this bug, one way or another. Roel, This bug is in queue to be analyzed by developers, so nothing can be reported now. When the data from core file are fully analyzed, we shall write a feedback here. We do not yet know whether core file contains info that would explain this race or not. What we can definitely do is let you know when the thorough analysis starts, as we are pretty booked up , right now.
[26 Apr 2017 15:10]
MySQL Verification Team
Shane, I analyzed #77214 and I do not think that it is the same problem. Thanks for the info, though ...
[27 Apr 2017 9:09]
Roel Van de Paar
Thank you Shane & Sinisa
[27 Apr 2017 9:09]
Roel Van de Paar
Shane, great on the gdb info macro's - are you at liberty to share them?
[25 Nov 2020 3:39]
Roel Van de Paar
Any updates?
[30 Nov 2020 14:36]
MySQL Verification Team
No, not yet ...... This bug requires lots of time and effort, which is not available yet ......
[31 May 2023 20:06]
Vitaliy Zyukin
I was able to reproduce it on MariaDB 10.4.10. Couple of times already. It doesn't always happen on our hosts, it happen once in 10 executions on average. The way to reproduce on my end was creating 8 processes in python that will delete 100 records per each delete query in one table. This procedure is taking place when database is prepared for a shutdown, meaning replication is OFF and no traffic coming in or going out. Additional parameters: sql_log_bin=0 and innodb_flush_log_at_trx_commit=0 Next code is capable of deleting ~5 million records in 2 minutes: ``` ... func_args = list() for i in range(from_id, to_id, 100000): last_id = (i + 100000) if (i + 100000) < to_id else to_id func_args.append((table_name, i, last_id)) with multiprocessing.Pool(8) as p: p.starmap(self.delete_range, func_args) def delete_range(self, table_name, from_id, to_id): conn = mysql.connector.connect(**self.connect_args) cursor = conn.cursor() cursor.execute("SET sql_log_bin=0;") cursor.execute("SET GLOBAL innodb_flush_log_at_trx_commit=0;") for i in range(from_id, to_id, 100): last_id = (i + 100) if (i + 100) < to_id else to_id # delete multiple ids at the time cursor.execute("DELETE FROM {} WHERE id IN {};".format(table_name, tuple(range(i, last_id)))) cursor.close() conn.close() ``` On my end the bug is still for the same function: UT_LIST_GET_LEN but a different location ``` 2023-05-31 8:11:21 0 [Note] InnoDB: Starting shutdown... 2023-05-31 08:11:28 0x7eb6d979d700 InnoDB: Assertion failure in file src/storage/innobase/buf/buf0flu.cc line 3353 InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/ InnoDB: about forcing recovery. 230531 8:11:28 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.4.10-MariaDB-log key_buffer_size=8388608 read_buffer_size=16777216 max_used_connections=29 max_threads=65537 thread_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1208067437 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 <Path>/bin/mysqld(my_print_stacktrace+0x29)[0x55d43315d7b9] <Path>/bin/mysqld(handle_fatal_signal+0x4d0)[0x55d432c2e8f0] /lib64/libpthread.so.0(+0x118e0)[0x7f01a7fc38e0] /lib64/libc.so.6(gsignal+0x110)[0x7f01a715cca0] /lib64/libc.so.6(abort+0x148)[0x7f01a715e148] <Path>/bin/mysqld(+0x565705)[0x55d432965705] <Path>/bin/mysqld(+0xb43a06)[0x55d432f43a06] /lib64/libpthread.so.0(+0x744b)[0x7f01a7fb944b] /lib64/libc.so.6(clone+0x3f)[0x7f01a721852f] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /mysql/data Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 10485760 10485760 bytes Max core file size 0 0 bytes Max resident set unlimited unlimited bytes Max processes unlimited unlimited processes Max open files 65535 65535 files Max locked memory unlimited unlimited bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 30446 30446 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: core ```
[1 Jun 2023 12:05]
MySQL Verification Team
Hi Mr. Vitaliy Zyukin, You are on the wrong forum. This is a forum for MySQL products, made by Oracle.