Bug #112532 | Innodb_row_lock_current_waits values show non-zero when there is no lock | ||
---|---|---|---|
Submitted: | 27 Sep 2023 3:14 | Modified: | 14 Dec 2023 21:19 |
Reporter: | bin wang (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | Contribution |
[27 Sep 2023 3:14]
bin wang
[27 Sep 2023 3:15]
bin wang
suggested add patch
Attachment: patch (application/octet-stream, text), 545 bytes.
[27 Sep 2023 3:16]
bin wang
suggested add patch (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: patch (application/octet-stream, text), 545 bytes.
[27 Sep 2023 6:47]
MySQL Verification Team
Hello bin wang, Thank you for the report and contribution. regards, Umesh
[28 Sep 2023 9:44]
Jakub Lopuszanski
The `show status like '%Innodb_row_lock_current_waits%'` seems to map to: ``` export_vars.innodb_row_lock_current_waits = srv_stats.n_lock_wait_current_count; ``` There's also ``` {"lock_row_lock_current_waits", "lock", "Number of row locks currently being waited for" " (innodb_row_lock_current_waits)", static_cast<monitor_type_t>(MONITOR_EXISTING | MONITOR_DEFAULT_ON), MONITOR_DEFAULT_START, MONITOR_OVLD_ROW_LOCK_CURRENT_WAIT}, ``` exposed by ``` SELECT * FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME="lock_row_lock_current_waits" ``` which also maps to the same C++ counter: ``` /* innodb_row_lock_current_waits */ case MONITOR_OVLD_ROW_LOCK_CURRENT_WAIT: value = srv_stats.n_lock_wait_current_count; break; ``` The `srv_stats.n_lock_wait_current_count` is declared as `lint_ctr_1_t` which is an alias for a fuzzy counter: ``` typedef ib_counter_t<lint, 1, single_indexer_t> lint_ctr_1_t; ``` This declaration means that we use a non-atomic integer (`lint`), which isn't even sharded (1 instance). This means that multiple threads will try to modify the same memory location without any atomic guarantees, so of course the value of the counter will drift. (The increment actually happens under lock_sys->wait_mutex, but the decrement doesn't) The simple workaround is to look at another counter: ``` {"lock_threads_waiting", "lock", "Number of query threads sleeping waiting for a lock", static_cast<monitor_type_t>(MONITOR_DEFAULT_ON | MONITOR_DISPLAY_CURRENT), MONITOR_DEFAULT_START, MONITOR_LOCK_THREADS_WAITING}, ``` which you can get via: ``` SELECT * FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME="lock_threads_waiting" ``` This one is set to the right value ``` MONITOR_SET(MONITOR_LOCK_THREADS_WAITING, n); ``` at most one per second by `lock_wait_timeout_thread()` (provided that innobase_deadlock_detect is ON, which is the default, and recommended setting). Also, 1Hz is the worst case - it's updated in each iteration of the thread's loop which might be triggered by various events related to locking, such as: - a thread has started a wait - a thread which was already waiting changed the reason it was waiting (it is still blocked but by a different transaction) It is *not* (AFAICT) triggered by a transaction finishing a wait, so the value can be a bit too high for 1 second, but it should converge to zero at idle time after at most a second. You can monitor how many iterations per second this thread does by looking at: ``` {"lock_deadlock_rounds", "lock", "Number of times a wait-for graph was scanned in search for deadlocks", MONITOR_DEFAULT_ON, MONITOR_DEFAULT_START, MONITOR_DEADLOCK_ROUNDS}, ``` I think the easiest way to fix this bug, is to simply replace the data source for this counter. I mean something as simple as: ``` diff --git a/storage/innobase/include/srv0srv.h b/storage/innobase/include/srv0srv.h index 0147852aa24..fc51583a95d 100644 --- a/storage/innobase/include/srv0srv.h +++ b/storage/innobase/include/srv0srv.h @@ -124,9 +124,6 @@ struct srv_stats_t { /** Number of database lock waits */ ulint_ctr_1_t n_lock_wait_count; - /** Number of threads currently waiting on database locks */ - lint_ctr_1_t n_lock_wait_current_count; - /** Number of rows read. */ ulint_ctr_64_t n_rows_read; diff --git a/storage/innobase/lock/lock0wait.cc b/storage/innobase/lock/lock0wait.cc index 60168ce5c21..0ef044f56ea 100644 --- a/storage/innobase/lock/lock0wait.cc +++ b/storage/innobase/lock/lock0wait.cc @@ -241,7 +241,6 @@ void lock_wait_suspend_thread(que_thr_t *thr) { if (thr->lock_state == QUE_THR_LOCK_ROW) { srv_stats.n_lock_wait_count.inc(); - srv_stats.n_lock_wait_current_count.inc(); start_time = std::chrono::steady_clock::now(); } @@ -321,7 +320,6 @@ void lock_wait_suspend_thread(que_thr_t *thr) { if (thr->lock_state == QUE_THR_LOCK_ROW) { const auto diff_time = std::chrono::steady_clock::now() - start_time; - srv_stats.n_lock_wait_current_count.dec(); srv_stats.n_lock_wait_time.add( std::chrono::duration_cast<std::chrono::microseconds>(diff_time) .count()); diff --git a/storage/innobase/srv/srv0mon.cc b/storage/innobase/srv/srv0mon.cc index 5e458ac3a20..8b33aac0052 100644 --- a/storage/innobase/srv/srv0mon.cc +++ b/storage/innobase/srv/srv0mon.cc @@ -1867,7 +1867,7 @@ void srv_mon_process_existing_counter( /* innodb_row_lock_current_waits */ case MONITOR_OVLD_ROW_LOCK_CURRENT_WAIT: - value = srv_stats.n_lock_wait_current_count; + value = MONITOR_VALUE(MONITOR_LOCK_THREADS_WAITING); break; /* innodb_row_lock_time */ diff --git a/storage/innobase/srv/srv0srv.cc b/storage/innobase/srv/srv0srv.cc index 63d2c16e0ff..fd39a8f880f 100644 --- a/storage/innobase/srv/srv0srv.cc +++ b/storage/innobase/srv/srv0srv.cc @@ -1681,7 +1681,7 @@ void srv_export_innodb_status(void) { export_vars.innodb_row_lock_waits = srv_stats.n_lock_wait_count; export_vars.innodb_row_lock_current_waits = - srv_stats.n_lock_wait_current_count; + MONITOR_VALUE(MONITOR_LOCK_THREADS_WAITING); export_vars.innodb_row_lock_time = srv_stats.n_lock_wait_time / 1000; ```
[28 Sep 2023 15:10]
Jakub Lopuszanski
Posted by developer: Ah.. one thing I've missed is that the definition of these two counters is slightly different: "lock_row_lock_current_waits" is counting only threads which are waiting for a lock on a row, not on a table! "lock_threads_waiting" is counting all threads. So, it's not so simple to replace one with the other :(
[14 Dec 2023 21:19]
Philip Olson
Fixed as of the upcoming MySQL Server 8.3.0 release, and here's the proposed changelog entry from the documentation team: -- The Innodb_row_lock_current_waits variable could show a non-zero value when there were no row locks. Our thanks to Bin Wang for the contribution. -- Thank you for the bug report and contribution.