Bug #74003 | The server was crashed because of long semaphore wait | ||
---|---|---|---|
Submitted: | 22 Sep 2014 3:08 | Modified: | 25 Sep 2018 12:27 |
Reporter: | zhai weixiang (OCA) | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 5.6.16 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[22 Sep 2014 3:08]
zhai weixiang
[22 Sep 2014 3:26]
zhai weixiang
I think it will be a great help if the mysqld can print backtraces of all threads while crashed because of long semaphore wait... :(
[24 Sep 2014 7:43]
zhai weixiang
We found the exact reason toady: 1. Run mysqldump on slave. mysqldump opened a readview and start to `SELECT` all tables. But this instance is very large. It need a long time to backup all datasets. The readview became very old. 2.The worker threads frequently update the slave_worker_info table. So the undo list became very long and can't be purged because there the data dump is not finished. 3.Finally mysqldump began to dump SLAVE_WORKER_INFO table, and tried to built old version of the record, HOLDING "S" lock ON the block. 4. The worker threads tried to update SLAVE_WORKER_INFO , but can not acquire X LOCK of the related block. 5. A long semaphore waiting happened and the server was crashed... Some advice: 1. For mysqldump, dump tables in mysql database first and then others.. 2. For innodb, release S or X lock on the page while taking too long time to build old version of the record and then retry..
[25 Sep 2014 3:20]
liu hickey
Why not add backtrace dumping before aborting due to long semaphore waiting? Just like: diff --git a/storage/innobase/srv/srv0srv.cc b/storage/innobase/srv/srv0srv.cc index cf8b288..26d97ff 100644 --- a/storage/innobase/srv/srv0srv.cc +++ b/storage/innobase/srv/srv0srv.cc @@ -1661,6 +1661,25 @@ exit_func: } /*********************************************************************//** +Dump the backtrace for self for debugging. +*/ +UNIV_INTERN +void dump_backtrace() +{ + pid_t pid; + char cmd[512]; + pid = getpid(); + snprintf(cmd, 512, + "gdb -ex 'set pagination 0' -ex 'thread apply all bt' -batch -p %d", + pid); + fprintf(stderr, + "Start dumping backtrace for self with pid=%d\n", pid); + system(cmd); + fprintf(stderr, + "Finish dumping backtrace\n"); +} + +/*********************************************************************//** A thread which prints warnings about semaphore waits which have lasted too long. These can be used to track bugs which cause hangs. @return a dummy parameter */ @@ -1739,7 +1758,6 @@ loop: && sema == old_sema && os_thread_eq(waiter, old_waiter)) { fatal_cnt++; if (fatal_cnt > 10) { - fprintf(stderr, "InnoDB: Error: semaphore wait has lasted" " > %lu seconds\n" @@ -1747,6 +1765,8 @@ loop: " because it appears to be hung.\n", (ulong) srv_fatal_semaphore_wait_threshold); + dump_backtrace(); + ut_error; } } else {
[10 Oct 2014 5:23]
zhai weixiang
At least I think mysqldump can be modified to always dump mysql database first to avoid this problem.
[10 Oct 2014 6:19]
MySQL Verification Team
I think dumping mysql database first would help prevent this specific issue. But what if some user is doing millions of updates to his zzz database tables, then i suppose you'd meet similar symptoms? (also note http://bugs.mysql.com/bug.php?id=54455 )
[10 Oct 2014 6:49]
zhai weixiang
Both have the same root cause. Modifying mysqldump is simple. Fixing the root cause is complicate (possibly) :) This is a real life problem. We backup data using mysqldump on slave and the dataset per instance is usually very very large (so may need several days to finish the backup) .
[21 Aug 2018 15:31]
MySQL Verification Team
Hi, Thank you for your bug report. First of all, long semaphore waits are quite common. In some cases it is due to a bug, but in some other cases it is a problem with misconfiguration. What is most important is that you have reported this with one very, very old version / release. Can you please tell us whether you observe the same behaviour with latest 5.7 or 8.0 ??? Thanks in advance.
[21 Aug 2018 18:38]
Mark Callaghan
I assume any table that is small and frequently updated has a chance of reproducing this problem. Dumping the mysql tables first helps but won't avoid the problems with user tables.
[22 Aug 2018 13:20]
MySQL Verification Team
Hi, Beside other info that I requested, we would be grateful to have a repeatable test case. It could be a descriptive one, but if we fail to repeat it, then we would need a full-fledged test case.
[22 Sep 2018 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".