| Bug #74003 | The server was crashed because of long semaphore wait | ||
|---|---|---|---|
| Submitted: | 22 Sep 2014 3:08 | Modified: | 25 Sep 2018 12:27 |
| Reporter: | zhai weixiang (OCA) | Email Updates: | |
| Status: | Can't repeat | Impact on me: | |
| Category: | MySQL Server | Severity: | S3 (Non-critical) |
| Version: | 5.6.16 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[22 Sep 2014 3:08]
zhai weixiang
[22 Sep 2014 3:26]
zhai weixiang
I think it will be a great help if the mysqld can print backtraces of all threads while crashed because of long semaphore wait... :(
[24 Sep 2014 7:43]
zhai weixiang
We found the exact reason toady: 1. Run mysqldump on slave. mysqldump opened a readview and start to `SELECT` all tables. But this instance is very large. It need a long time to backup all datasets. The readview became very old. 2.The worker threads frequently update the slave_worker_info table. So the undo list became very long and can't be purged because there the data dump is not finished. 3.Finally mysqldump began to dump SLAVE_WORKER_INFO table, and tried to built old version of the record, HOLDING "S" lock ON the block. 4. The worker threads tried to update SLAVE_WORKER_INFO , but can not acquire X LOCK of the related block. 5. A long semaphore waiting happened and the server was crashed... Some advice: 1. For mysqldump, dump tables in mysql database first and then others.. 2. For innodb, release S or X lock on the page while taking too long time to build old version of the record and then retry..
[25 Sep 2014 3:20]
liu hickey
Why not add backtrace dumping before aborting due to long semaphore waiting? Just like:
diff --git a/storage/innobase/srv/srv0srv.cc b/storage/innobase/srv/srv0srv.cc
index cf8b288..26d97ff 100644
--- a/storage/innobase/srv/srv0srv.cc
+++ b/storage/innobase/srv/srv0srv.cc
@@ -1661,6 +1661,25 @@ exit_func:
}
/*********************************************************************//**
+Dump the backtrace for self for debugging.
+*/
+UNIV_INTERN
+void dump_backtrace()
+{
+ pid_t pid;
+ char cmd[512];
+ pid = getpid();
+ snprintf(cmd, 512,
+ "gdb -ex 'set pagination 0' -ex 'thread apply all bt' -batch -p %d",
+ pid);
+ fprintf(stderr,
+ "Start dumping backtrace for self with pid=%d\n", pid);
+ system(cmd);
+ fprintf(stderr,
+ "Finish dumping backtrace\n");
+}
+
+/*********************************************************************//**
A thread which prints warnings about semaphore waits which have lasted
too long. These can be used to track bugs which cause hangs.
@return a dummy parameter */
@@ -1739,7 +1758,6 @@ loop:
&& sema == old_sema && os_thread_eq(waiter, old_waiter)) {
fatal_cnt++;
if (fatal_cnt > 10) {
-
fprintf(stderr,
"InnoDB: Error: semaphore wait has lasted"
" > %lu seconds\n"
@@ -1747,6 +1765,8 @@ loop:
" because it appears to be hung.\n",
(ulong) srv_fatal_semaphore_wait_threshold);
+ dump_backtrace();
+
ut_error;
}
} else {
[10 Oct 2014 5:23]
zhai weixiang
At least I think mysqldump can be modified to always dump mysql database first to avoid this problem.
[10 Oct 2014 6:19]
MySQL Verification Team
I think dumping mysql database first would help prevent this specific issue. But what if some user is doing millions of updates to his zzz database tables, then i suppose you'd meet similar symptoms? (also note http://bugs.mysql.com/bug.php?id=54455 )
[10 Oct 2014 6:49]
zhai weixiang
Both have the same root cause. Modifying mysqldump is simple. Fixing the root cause is complicate (possibly) :) This is a real life problem. We backup data using mysqldump on slave and the dataset per instance is usually very very large (so may need several days to finish the backup) .
[21 Aug 2018 15:31]
MySQL Verification Team
Hi, Thank you for your bug report. First of all, long semaphore waits are quite common. In some cases it is due to a bug, but in some other cases it is a problem with misconfiguration. What is most important is that you have reported this with one very, very old version / release. Can you please tell us whether you observe the same behaviour with latest 5.7 or 8.0 ??? Thanks in advance.
[21 Aug 2018 18:38]
Mark Callaghan
I assume any table that is small and frequently updated has a chance of reproducing this problem. Dumping the mysql tables first helps but won't avoid the problems with user tables.
[22 Aug 2018 13:20]
MySQL Verification Team
Hi, Beside other info that I requested, we would be grateful to have a repeatable test case. It could be a descriptive one, but if we fail to repeat it, then we would need a full-fledged test case.
[22 Sep 2018 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
