Bug #72123 | Spurious lock_wait_timeout_thread wakeup in lock_wait_suspend_thread() | ||
---|---|---|---|
Submitted: | 25 Mar 2014 14:12 | Modified: | 27 Mar 2014 17:32 |
Reporter: | Inaam Rana (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 5.6, 5.6.18 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[25 Mar 2014 14:12]
Inaam Rana
[25 Mar 2014 20:34]
Sveta Smirnova
Thank you for the report. I cannot repeat described behavior: I created a table: create table t1(f1 int not null auto_increment primary key, f2 varchar(255), f3 timestamp) engine=innodb; inserted 10 rows into it, then updated row with f1=5 in 50 threads. Results without change: select EVENT_NAME, COUNT_STAR from events_waits_summary_by_instance where event_name like '%lock_wait_mutex' 1001596 Results with change: select EVENT_NAME, COUNT_STAR from events_waits_summary_by_instance where event_name like '%lock_wait_mutex' 1065390 I run query select EVENT_NAME, COUNT_STAR from events_waits_summary_by_instance where event_name like '%lock_wait_mutex' before test and COUNT_STAR showed miserable values: 42 for the test without change and 175 for the test with change. Please compare my test with yours and let me know if I missed something.
[25 Mar 2014 20:46]
Inaam Rana
Sveta, In my test I was using 512 or 1024 threads all attempting to update the same row. A really heavy hotspot issue. Though I don't think it matters but I was not using auto-increment. Just to make sure we are on same page the update is to a non-primary key int column i.e.: an update in place. Finally, I was using show engine innodb mutex to monitor os waits (perf schema was disabled at compile time).
[26 Mar 2014 18:04]
Inaam Rana
Sveta, I should have been more clear in details of what I tested. One of the prime difference was that I have deadlock detection code disabled. This is an option which I don't think is present in official MySQL release. With deadlock detection code the system gets so bogged down that the difference in performance is not very well marked but it is still visible. Here are the details: root@localhost[test]mysql>show create table sbtest1\G *************************** 1. row *************************** Table: sbtest1 Create Table: CREATE TABLE `sbtest1` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `k` int(10) unsigned NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k_1` (`k`) ) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=latin1 MAX_ROWS=1000000 1 row in set (0.00 sec) root@localhost[test]mysql>select count(*) from sbtest1; +----------+ | count(*) | +----------+ | 10000000 | +----------+ 1 row in set (8.53 sec) Workload: "UPDATE sbtest1 SET k = k + 1 WHERE id = 33333" Number of threads: 512 all doing update to the same row. Duration: 180 sec. code: base is twitter 5.6 but it should behave in same manner as stock 5.6 in this case. Patched version has the wakeup call commented out. I ran for four configs: base + dde = base with deadlock detection enabled (default) base + ddd = base with deadlock detection disabled patch + dde = patched with deadlock detection enabled patch + ddd = patched with deadlock detection disabled For all four configs I noted three values: -- os waits (as reported by show engine innodb mutex) for lock_sys_wait_mutex -- os waits (as reported by show engine innodb mutex) for lock_sys_mutex -- QPS build QPS lock_sys_mutex waits lock_sys_wait_mutex waits base+dde 341 521341 100362 base+ddd 2922 2261839 2244693 patched+dde 425 567961 13977 patched+ddd 5747 7163614 143945
[27 Mar 2014 17:32]
Sveta Smirnova
Thank you for the feedback. Verified as described. I still queries P_S, but now I had 1085475 (not patched) vs 838323(patched) locks. Also, some queries for not-patched versions dead, because "Lock wait timeout exceeded" while this did not happen for patched version. How I tested. 1. Created query file for Gypsy: i|1|drop table if exists t1 i|1|create table t1(f1 int not null auto_increment primary key, f2 varchar(255), f3 timestamp) engine=innodb i|10|insert into t1(f2) values(md5(?))|int n|1000|update t1 set f2=md5(?) where f1=5|int 2. Patched server, so deadlock detection code is removed: === modified file 'storage/innobase/lock/lock0lock.cc' --- storage/innobase/lock/lock0lock.cc 2014-02-24 14:58:04 +0000 +++ storage/innobase/lock/lock0lock.cc 2014-03-27 16:17:28 +0000 @@ -1952,7 +1952,7 @@ (it is not currently suspended because of a lock wait), its state can only be changed by this thread, which is currently associated with the transaction. */ - +/* trx_mutex_exit(trx); victim_trx_id = lock_deadlock_check_and_resolve(lock, trx); @@ -1969,14 +1969,14 @@ return(DB_DEADLOCK); } else if (trx->lock.wait_lock == NULL) { - +*/ /* If there was a deadlock but we chose another transaction as a victim, it is possible that we already have the lock now granted! */ - +/* return(DB_SUCCESS_LOCKED_REC); } - +*/ trx->lock.que_state = TRX_QUE_LOCK_WAIT; trx->lock.was_chosen_as_deadlock_victim = FALSE; 3. Started MTR: ./mtr --start innodb & 4. Run test: mysql -h127.0.0.1 -P13000 -uroot performance_schema -e "select * from events_waits_summary_by_instance where event_name like '%lock_wait_mutex'" >> bug72123.log; gypsy --host=127.0.0.1:13000 --user=root --queryfile=bug72123.query --threads=50 --duration=1000 ; mysql -h127.0.0.1 -P13000 -uroot performance_schema -e "select * from events_waits_summary_by_instance where event_name like '%lock_wait_mutex'" >> bug72123.log 5. Stopped MTR 6. Made changes as described in this report: === modified file 'storage/innobase/lock/lock0wait.cc' --- storage/innobase/lock/lock0wait.cc 2013-04-18 14:25:27 +0000 +++ storage/innobase/lock/lock0wait.cc 2014-03-25 19:37:18 +0000 @@ -261,7 +261,7 @@ /* Wake the lock timeout monitor thread, if it is suspended */ - os_event_set(lock_sys->timeout_event); + //os_event_set(lock_sys->timeout_event); lock_wait_mutex_exit(); trx_mutex_exit(trx); 7. Repeated steps 3 and 4. 8. Compared number of events_waits for both patched and not patched versions as well as Gypsy's output which showed some performance gain for patched version and no failed queries, because lock wait timeout.
[29 Feb 2016 1:07]
Daniel Black
Inaam's patch (that I couldn't see referenced here) https://github.com/facebook/mysql-5.6/commit/6e06bbfa315ffb97d713dd6e672d6054036ddc21
[4 Oct 2017 3:52]
Laurynas Biveinis
It looks like this has been fixed in 8.0.3 by the VATS scheduling patch fb056f442a96114c74d291302e8c4406c8c8e1af Author: Sunny Bains <Sunny.Bains@oracle.com> AuthorDate: Fri Jul 14 09:15:44 2017 +1000 Commit: Sunny Bains <Sunny.Bains@oracle.com> CommitDate: Fri Jul 14 09:17:41 2017 +1000 Merged: 5.5 5.6 mysql-5.7.7 Containing: 8.0 Follows: mysql-8.0.1 (2280) Precedes: mysql-8.0.3 (1059) WL#10793 - InnoDB: Use VATS for scheduling lock release under high load Variance-Aware Transaction Scheduling (VATS) is described in https://arxiv.org/pdf/1602.01871.pdf. The patch was contributed by Jiamin Huang <jiamin@umich.edu> and Barzan Mozafari <mozafari@umich.edu> from the University of Michigan. When granting locks the grant strategy used depends on the number of waiting locks. If it is >= 32 then VATS will be used otherwise the original first come first served (FCFS) strategy will be used. The threshold of 32 was chosen after empirical verification on a supra class machine. Ideally we would like to default to VATS for all loads but a small regression was seen on the lower loads. Reviewed-by: Jimmy <jimmy.yang@oracle.com> Reviewed-by: Deb <debarun.banerjee@oracle.com> RB:16290