Bug #72123 Spurious lock_wait_timeout_thread wakeup in lock_wait_suspend_thread()
Submitted: 25 Mar 2014 14:12 Modified: 27 Mar 2014 17:32
Reporter: Inaam Rana (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.6, 5.6.18 OS:Any
Assigned to: CPU Architecture:Any
Triage: Needs Triage: D3 (Medium)

[25 Mar 2014 14:12] Inaam Rana
Description:
When a query thread has to wait on a rec or table lock it goes to sleep by calling lock_wait_suspend_thread(). In this function before starting the wait we do the following:

        /* Wake the lock timeout monitor thread, if it is suspended */

        os_event_set(lock_sys->timeout_event);

This is totally unnecessary because lock timeout monitor thread wakes up every second to release waiting threads. There is no need to wake it up when we are going to start a wait.

On a heavily contended system we can have hundreds of threads waiting for locks. What this extra wakeup call does is that whenever a thread is about to start a wait it forces the lock monitor out of sleep and then the lock monitor scans the list of waiting threads checking for timeout. This scan is done under lock_wait_mutex which is already heavily contended under such workloads. There is significant performance gain to be had for such workloads by removing this spurious wakeup call.

On a related note, there are various other wakeup calls for lock_monitor thread in the code. These calls are actually meant for srv_monitor_thread as they relate to printing monitor output and lock monitor has nothing to do with printing stuff.

irana@tw-mbp-irana:~/workspace/mysql-5.6/storage/innobase$ grep -riIn timeout_event .
./buf/buf0lru.cc:1213:			os_event_set(lock_sys->timeout_event);
./buf/buf0lru.cc:1341:		os_event_set(lock_sys->timeout_event);
./handler/ha_innodb.cc:15732:	os_event_set(lock_sys->timeout_event);
./include/lock0lock.h:941:	os_event_t	timeout_event;		/*!< Set to the event that is
./lock/lock0lock.cc:614:	lock_sys->timeout_event = os_event_create();
./lock/lock0wait.cc:264:	os_event_set(lock_sys->timeout_event);
./lock/lock0wait.cc:487:	os_event_t	event = lock_sys->timeout_event;
./row/row0mysql.cc:2196:		os_event_set(lock_sys->timeout_event);
./row/row0mysql.cc:2202:		os_event_set(lock_sys->timeout_event);
./row/row0mysql.cc:2207:		os_event_set(lock_sys->timeout_event);
./row/row0mysql.cc:2212:		os_event_set(lock_sys->timeout_event);
./srv/srv0srv.cc:1931:	os_event_set(lock_sys->timeout_event);
./srv/srv0start.cc:2913:			os_event_set(lock_sys->timeout_event);
./sync/sync0arr.cc:1014:		os_event_set(lock_sys->timeout_event);

How to repeat:
Create a heavy workload contending on record locks. The easiest way may be to have hundred of threads updating the same row. Note os waits on lock wait mutex.
Comment out the os_event_set(lock_sys->timeout_event) from lock_wait_suspend_thread() an rerun the test. Note the decrease in os waits on lock wait mutex.

Suggested fix:
Don't make the spurious call in lock_wait_suspend_thread(). Also fix other calls where the actual target should have been srv_monitor_thread. I'll upload a patch for this.
[25 Mar 2014 20:34] Sveta Smirnova
Thank you for the report.

I cannot repeat described behavior:

I created a table: create table t1(f1 int not null auto_increment primary key, f2 varchar(255), f3 timestamp) engine=innodb; inserted 10 rows into it, then updated row with f1=5 in 50 threads.

Results without change:

select EVENT_NAME, COUNT_STAR from events_waits_summary_by_instance where event_name like '%lock_wait_mutex'

1001596

Results with change:

select EVENT_NAME, COUNT_STAR from events_waits_summary_by_instance where event_name like '%lock_wait_mutex'

1065390

I run query 
select EVENT_NAME, COUNT_STAR from events_waits_summary_by_instance where event_name like '%lock_wait_mutex' before test and COUNT_STAR showed miserable values: 42 for the test without change and 175 for the test with change.

Please compare my test with yours and let me know if I missed something.
[25 Mar 2014 20:46] Inaam Rana
Sveta,

In my test I was using 512 or 1024 threads all attempting to update the same row. A really heavy hotspot issue. Though I don't think it matters but I was not using auto-increment. Just to make sure we are on same page the update is to a non-primary key int column i.e.: an update in place. Finally, I was using show engine innodb mutex to monitor os waits (perf schema was disabled at compile time).
[26 Mar 2014 18:04] Inaam Rana
Sveta,

I should have been more clear in details of what I tested. One of the prime difference was that I have deadlock detection code disabled. This is an option which I don't think is present in official MySQL release. With deadlock detection code the system gets so bogged down that the difference in performance is not very well marked but it is still visible. Here are the details:

root@localhost[test]mysql>show create table sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=latin1 MAX_ROWS=1000000
1 row in set (0.00 sec)

root@localhost[test]mysql>select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (8.53 sec)

Workload: "UPDATE sbtest1 SET k = k + 1 WHERE id = 33333"

Number of threads: 512 all doing update to the same row.

Duration: 180 sec.

code: base is twitter 5.6 but it should behave in same manner as stock 5.6 in this case. Patched version has the wakeup call commented out.

I ran for four configs:
base + dde = base with deadlock detection enabled (default)
base + ddd = base with deadlock detection disabled

patch + dde = patched with deadlock detection enabled
patch + ddd = patched with deadlock detection disabled

For all four configs I noted three values:
-- os waits (as reported by show engine innodb mutex) for lock_sys_wait_mutex
-- os waits (as reported by show engine innodb mutex) for lock_sys_mutex
-- QPS

build         QPS   lock_sys_mutex waits  lock_sys_wait_mutex waits
base+dde      341   521341                100362
base+ddd      2922  2261839               2244693
patched+dde   425   567961                13977
patched+ddd   5747  7163614               143945
[27 Mar 2014 17:32] Sveta Smirnova
Thank you for the feedback.

Verified as described. I still queries P_S, but now I had 1085475 (not patched) vs 838323(patched) locks. Also, some queries for not-patched versions dead, because "Lock wait timeout exceeded" while this did not happen for patched version.

How I tested.

1. Created query file for Gypsy:

i|1|drop table if exists t1
i|1|create table t1(f1 int not null auto_increment primary key, f2 varchar(255), f3 timestamp) engine=innodb
i|10|insert into t1(f2) values(md5(?))|int
n|1000|update t1 set f2=md5(?) where f1=5|int

2. Patched server, so deadlock detection code is removed:

=== modified file 'storage/innobase/lock/lock0lock.cc'
--- storage/innobase/lock/lock0lock.cc	2014-02-24 14:58:04 +0000
+++ storage/innobase/lock/lock0lock.cc	2014-03-27 16:17:28 +0000
@@ -1952,7 +1952,7 @@
 	(it is not currently suspended because of a lock wait),
 	its state can only be changed by this thread, which is
 	currently associated with the transaction. */
-
+/*
 	trx_mutex_exit(trx);
 
 	victim_trx_id = lock_deadlock_check_and_resolve(lock, trx);
@@ -1969,14 +1969,14 @@
 		return(DB_DEADLOCK);
 
 	} else if (trx->lock.wait_lock == NULL) {
-
+*/
 		/* If there was a deadlock but we chose another
 		transaction as a victim, it is possible that we
 		already have the lock now granted! */
-
+/*
 		return(DB_SUCCESS_LOCKED_REC);
 	}
-
+*/
 	trx->lock.que_state = TRX_QUE_LOCK_WAIT;
 
 	trx->lock.was_chosen_as_deadlock_victim = FALSE;

3. Started MTR: ./mtr --start innodb &

4. Run test:

mysql -h127.0.0.1 -P13000 -uroot performance_schema -e "select * from events_waits_summary_by_instance where event_name like '%lock_wait_mutex'" >> bug72123.log; gypsy --host=127.0.0.1:13000 --user=root --queryfile=bug72123.query --threads=50 --duration=1000 ; mysql -h127.0.0.1 -P13000 -uroot performance_schema -e "select * from events_waits_summary_by_instance where event_name like '%lock_wait_mutex'" >> bug72123.log

5. Stopped MTR

6. Made changes as described in this report:

=== modified file 'storage/innobase/lock/lock0wait.cc'
--- storage/innobase/lock/lock0wait.cc	2013-04-18 14:25:27 +0000
+++ storage/innobase/lock/lock0wait.cc	2014-03-25 19:37:18 +0000
@@ -261,7 +261,7 @@
 
 	/* Wake the lock timeout monitor thread, if it is suspended */
 
-	os_event_set(lock_sys->timeout_event);
+	//os_event_set(lock_sys->timeout_event);
 
 	lock_wait_mutex_exit();
 	trx_mutex_exit(trx);

7. Repeated steps 3 and 4.

8. Compared number of events_waits for both patched and not patched versions as well as Gypsy's output which showed some performance gain for patched version and no failed queries, because lock wait timeout.
[29 Feb 2016 1:07] Daniel Black
Inaam's patch (that I couldn't see referenced here)
https://github.com/facebook/mysql-5.6/commit/6e06bbfa315ffb97d713dd6e672d6054036ddc21
[4 Oct 2017 3:52] Laurynas Biveinis
It looks like this has been fixed in 8.0.3 by the VATS scheduling patch

fb056f442a96114c74d291302e8c4406c8c8e1af
Author:     Sunny Bains <Sunny.Bains@oracle.com>
AuthorDate: Fri Jul 14 09:15:44 2017 +1000
Commit:     Sunny Bains <Sunny.Bains@oracle.com>
CommitDate: Fri Jul 14 09:17:41 2017 +1000

Merged:     5.5 5.6 mysql-5.7.7
Containing: 8.0
Follows:    mysql-8.0.1 (2280)
Precedes:   mysql-8.0.3 (1059)

WL#10793 - InnoDB: Use VATS for scheduling lock release under high load

Variance-Aware Transaction Scheduling (VATS) is described in
https://arxiv.org/pdf/1602.01871.pdf. The patch was contributed by
Jiamin Huang <jiamin@umich.edu> and Barzan Mozafari <mozafari@umich.edu>
from the University of Michigan.

When granting locks the grant strategy used depends on the number of waiting
locks. If it is >= 32 then VATS will be used otherwise the original first come
first served (FCFS) strategy will be used. The threshold of 32 was chosen
after empirical verification on a supra class machine. Ideally we would
like to default to VATS for all loads but a small regression was seen on
the lower loads.

Reviewed-by: Jimmy <jimmy.yang@oracle.com>
Reviewed-by: Deb <debarun.banerjee@oracle.com>
RB:16290