Bug #21579 innodb_concurrent random failures with varying differences
Submitted: 11 Aug 2006 8:18 Modified: 15 Sep 2008 18:29
Reporter: Matthias Leich Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Engine Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: Ingo Strüwing CPU Architecture:Any

[11 Aug 2006 8:18] Matthias Leich
Description:
The problems are around the functions get_lock and release_lock,
the default value of innodb_lock_wait_timeout and the overall
load on the box.

The problem is currently analyzed and more informations will be
provided soon.

My environment:
   - PC(Pentium M, x86-32Bit) with Linux(SuSE 9.3)
   - MySQL compiled (compile-pentium-debug-max) from source
        Version 5.1 ChangeSet@1.2278, 2006-08-03

How to repeat:
Please run
cd mysql-test
./mysql-test-run.pl --big innodb_concurrent
[21 Sep 2006 8:01] Ingo Strüwing
Funnily enough. Pushbuild (5.1 main) claims that I pushed something for it to 5.1.12.
[20 Mar 2007 10:28] Jani Tolonen
A note: Test concurrent_innodb has been disabled in 5.1 due to this bug (in disabled.def). Discussed with Matthias Leich, this is being processed.
[4 Dec 2007 14:04] Magnus Blåudd
Test does not work. If useful it should be fixed and run, otherwise just scrap it.
[28 Mar 2008 18:32] Ingo Strüwing
This is a "big-test". It is in bad shape. After enabling it failed due to a missing CREATE USER. After adding it, it succeeded on a statement, which (due to the test case) should fail with a lock timeout. It failed that way in four of four runs on a little loaded smp machine. It does not look like a random problem.
[5 May 2008 18:39] Kristofer Pettersson
Progress report: 

The problematic test case can be extracted and made to work if executed like this:

grant usage on test.* to mysqltest@localhost;
-- source include/have_innodb.inc

#
# Check and select innodb lock type
#
#
#set global innodb_table_locks=1;
#set global TRANSACTION ISOLATION LEVEL SERIALIZABLE;
#
# one UPDATE not changing result set and SELECT ... FOR UPDATE
#
connect (thread1, localhost, mysqltest,,);
connection thread1;
set autocommit=1;
--echo ** THREAD 1
create table t1(eta int(11) not null, tipo int(11), c varchar(255)) engine=innodb;
insert into t1 values (7,7, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
insert into t1 values (8,8, "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb");
insert into t1 values (10,1,"ccccccccccccccccccccccccccccccccccccccccccc");
insert into t1 values (20,2,"ddddddddddddddddddddddddddddddddddddddddddd");
insert into t1 values (30,1,"eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee");
insert into t1 values (40,2,"fffffffffffffffffffffffffffffffffffffffffff");
insert into t1 values (50,1,"ggggggggggggggggggggggggggggggggggggggggggg");
insert into t1 values (60,2,"hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh");
insert into t1 values (70,1,"iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii");
insert into t1 values (80,22,"jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj");
insert into t1 values (90,11,"kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk");
select get_lock("hello",10);
connect (thread2, localhost, mysqltest,,);
connection thread2;
set autocommit=1;
--echo ** THREAD 2
--echo **   +2 begin transaction
begin;
select * from t1 where tipo=2 FOR UPDATE;
connection thread1;
--echo
--echo ** THREAD 1
--echo **   +1 begin transaction
begin;
select release_lock("hello");
--echo      UPDATE operation hangs because since there are no index on tipo a full table scan of 't1' needs to be done. The scan stop on the first locked row held by SELECT... FOR UPDATE. 
--error 1205
update t1 set tipo=11+get_lock("hello",10)*0 where tipo=22;
---echo     Image of table t1 is now changed after the update.
#select * from t1;
connection thread2;
--echo
--echo ** THREAD 2
--echo      Image of table t1 should not have changed.
	#select * from t1;
--echo **    +2 commit (will hang because of thread 1 got inside thread 2 transaction)
send commit;
connection thread1;
--echo
--echo ** THREAD 1
--echo **    +1 commit (will release thread 2)
commit;
connection thread2;
--echo
--echo ** THREAD 2
--echo **     +2 Collect result of transaction
reap;
--echo   Since transaction for thread 1 failed the result set should not have been changed. 
select * from t1;
connection thread1;
--echo
--echo ** THREAD 1
--echo   Since transaction for thread 1 failed the result set is rolled back.
select * from t1;
connection default;
--echo
--echo ** DEFAULT ROOT THREAD
drop table t1;

Moving this test case back into the include/concurrent.inc file will fail because the statement: 
update t1 set tipo=11+get_lock("hello",10)*0 where tipo=22;

..won't cause error "1205 lock time out" any more.

I have yet understood why that is. The test environment looks fairly isolated although a 'hello' lock might be lingering from previous test with no significant importance. InnoDB isolation level is 'repeatable-read'. Is further more still unclear to me what this test is showing and what significance is attributed get_lock. Comments are welcome.
[16 May 2008 8:19] Vasil Dimov
Hi,

First of all I applied this patch to get rid of the access denied error:

--- cut ---
--- include/concurrent.inc.orig 2008-05-16 11:11:33.000000000 +0300
+++ include/concurrent.inc      2008-05-16 10:56:44.000000000 +0300
@@ -22,6 +22,7 @@
 drop table if exists t1;
 --enable_warnings
 create table t1(eta int(11) not null, tipo int(11), c varchar(255));
+grant usage on test.* to mysqltest@localhost;
 connect (thread1, localhost, mysqltest,,);
 connection thread1;
 eval SET SESSION STORAGE_ENGINE = $engine_type;
--- cut ---

Then the test failed like this:

--- cut ---
main.concurrent_innodb         [ fail ]

mysqltest: In included file "./include/concurrent.inc": At line 212: query 'update t1 set tipo=11+get_lock("hello",10)*0 where tipo=22' succeeded - should have failed with errno 1205
--- cut ---

This is the relevant part of the test:

--- cut ---
186 # one UPDATE not changing result set and SELECT ... FOR UPDATE
187 #
188 #connect (thread1, localhost, mysqltest,,);
189 connection thread1;
190 create table t1(eta int(11) not null, tipo int(11), c varchar(255));
191 insert into t1 values (7,7, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
192 insert into t1 values (8,8, "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb");
193 insert into t1 values (10,1,"ccccccccccccccccccccccccccccccccccccccccccc");
194 insert into t1 values (20,2,"ddddddddddddddddddddddddddddddddddddddddddd");
195 insert into t1 values (30,1,"eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee");
196 insert into t1 values (40,2,"fffffffffffffffffffffffffffffffffffffffffff");
197 insert into t1 values (50,1,"ggggggggggggggggggggggggggggggggggggggggggg");
198 insert into t1 values (60,2,"hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh");
199 insert into t1 values (70,1,"iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii");
200 insert into t1 values (80,22,"jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj");
201 insert into t1 values (90,11,"kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk");
202 select get_lock("hello",10);
203 #connect (thread2, localhost, mysqltest,,);
204 connection thread2;
205 begin;
206 send select * from t1 where tipo=2 FOR UPDATE;
207 sleep 1;
208 connection thread1;
209 begin;
210 select release_lock("hello");
211 --error 1205
212 update t1 set tipo=11+get_lock("hello",10)*0 where tipo=22;
--- cut ---

I don't see why the query on line 212 would fail with error 1205 (ER_LOCK_WAIT_TIMEOUT). Noone is holding a lock on t1 where tipo=22, maybe the query on line 206 should be tipo=22 instead of tipo=2?

Please search the repository history to see who wrote this test or who changed it (broke it) and ask them what is the intention here.
[20 May 2008 14:09] Kristofer Pettersson
Progress report: According to Sinisa the test was added because InnoDB used to lock all rows when it did a table scan whether or not the rows were marked for an update. In 5.1 this behavior has changed and the rows are no longer locked. The test can therefore be changed to be executed without the time out parameter.
[22 May 2008 12:40] Heikki Tuuri
Kristofer,

are you running on READ COMMITTED or --innodb_locks_unsafe_for_binlog ?

On those levels, MySQL/InnoDB releases locks on non-matching rows.

I tested on REPEATABLE READ, and found that the table scan ... FOR UPDATE does set X-locks on all rows. Then there is a lock wait.

--Heikki
[22 May 2008 14:08] Heikki Tuuri
Vasil said that HE was running with --innodb_locks_unsafe_for_binlog. That explain why MySQL/InnoDB releases the locks on non-matching rows.
[5 Jun 2008 9:59] Kristofer Pettersson
Heikki: You were correct about the innodb_locks_unsafe_for_binlog flag.

However, removing this flag and ensuring REPEATABLE READ isolation level caused errors to other parts of the test.

Sinisa: I suggest that the test is perhaps broken down into three parts:
1. On part runs tests with get_lock/release_lock where it works in a deterministic way, in "REPEATABLE READ + innodb_locks_unsafe_for_binlog"-isolation level (=READ COMMITTED ?)
2. One part runs tests suitable for "REPEATABLE READ" isolation level.
3. One part runs in debug mode and uses DEBUG_EXECUTE to capture concurrency issues in row level locking. This is needed for testing a DELETE transaction running concurrently with an UPDATE transaction. Currently this test fails because of a 'lock time out'. The time out might be by design though. What happens is that the DELETE transaction succeeds with the commit and the UPDATE fails because it did semi-consistent reads on the rows which were removed. The UPDATE thread will then hang on a wait condition, presumably waiting for a rollback to complete. When this condition isn't fulfilled the error code gets propagated in the call stacks and transforms into a 'lock timeout'.
[5 Jun 2008 10:15] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47471
[5 Jun 2008 10:22] Kristofer Pettersson
Tracing concurrent_innodb.test

Attachment: trace.tgz (application/x-compressed-tar, text), 5.92 KiB.

[5 Jun 2008 10:25] Kristofer Pettersson
Trace on DELETE- and UPDATE-transactions running concurrently.

Attachment: bug21579-thoughts2.html (text/html), 26.12 KiB.

[17 Jun 2008 15:41] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48018

2656 Ingo Struewing	2008-06-17
      Bug#21579 - innodb_concurrent random failures with varying differences
      
      Not to be pushed. Just an experiment showing how one could fix it.
      
      Added comments, beautified tests (to my taste).
      Added --innodb_lock_wait_timeout=1 to make the test faster.
      Added conditional execution based on $expect_timeout.
      Removed unneeded get_lock()/Release_lock() calls.
[18 Jun 2008 20:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48124

2650 Kristofer Pettersson	2008-06-18
      Bug#21579 innodb_concurrent random failures with varying differences
        
      The concurrent_innodb.test test case failed. This patchs refactors the test by
      separating into three parts.
[20 Jun 2008 10:53] Ingo Strüwing
Requested more comments or rework of the patch. Please see email.
[15 Jul 2008 8:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49734

2650 Kristofer Pettersson	2008-07-15
      Bug#21579 innodb_concurrent random failures with varying differences
              
      The concurrent_innodb.test test case failed. This patchs refactors the test by
      separating into two parts.
[19 Jul 2008 19:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50079

2703 Ingo Struewing	2008-07-19
      Bug#21579 - innodb_concurrent random failures with varying differences
      
      Due to unknown changes the test failed in some ways.
      
      Fixed by checking the test case in detail, commenting the expected behavior,
      and fixing error directives.
      
      In the course of the analyze unneeded get_lock()/release_lock() use,
      unneeded send/reap use, and unneeded sleeps were removed. The lock wait
      timeout was reduced to 1 second, so that this is no big-test any more.
      
      The test was split into two parts, one running the tests with
      --innodb_locks_unsafe_for_binlog, the other part without.
      The main part (include/concurrent.inc) conditionally expects
      lock wait timeouts based on the value of the system variable
      innodb_locks_unsafe_for_binlog.
      
      The major part of the patch comes from Kristofer Pettersson.
[28 Jul 2008 13:41] Kristofer Pettersson
Patch approved with comments.
[29 Jul 2008 14:17] Ingo Strüwing
Patch approved by both reviewers. Canging state to "Patch approved".
[15 Aug 2008 18:48] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51768

2705 Chad MILLER	2008-08-15
      Bug#21579 - innodb_concurrent random failures with varying differences
      
      Due to unknown changes the test failed in some ways.
      
      Fixed by checking the test case in detail, commenting the expected behavior,
      and fixing error directives.
      
      In the course of the analyze unneeded get_lock()/release_lock() use,
      unneeded send/reap use, and unneeded sleeps were removed. The lock wait
      timeout was reduced to 1 second, so that this is no big-test any more.
      
      The test was split into two parts, one running the tests with
      --innodb_locks_unsafe_for_binlog, the other part without.
      The main part (include/concurrent.inc) conditionally expects
      lock wait timeouts based on the value of the system variable
      innodb_locks_unsafe_for_binlog.
      
      The major part of the patch comes from Kristofer Pettersson.
      
      (Chad queues this patch on demand by Trudy/Davi.)
[15 Aug 2008 18:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51770

2705 Chad MILLER	2008-08-15
      Bug#21579 - innodb_concurrent random failures with varying differences
      
      Due to unknown changes the test failed in some ways.
      
      Fixed by checking the test case in detail, commenting the expected behavior,
      and fixing error directives.
      
      In the course of the analyze unneeded get_lock()/release_lock() use,
      unneeded send/reap use, and unneeded sleeps were removed. The lock wait
      timeout was reduced to 1 second, so that this is no big-test any more.
      
      The test was split into two parts, one running the tests with
      --innodb_locks_unsafe_for_binlog, the other part without.
      The main part (include/concurrent.inc) conditionally expects
      lock wait timeouts based on the value of the system variable
      innodb_locks_unsafe_for_binlog.
      
      The major part of the patch comes from Kristofer Pettersson.
      
      (Chad queues this patch on demand by Trudy/Davi.)
[15 Aug 2008 18:53] Chad MILLER
Queued to 5.1-bugteam and 6.0-bugteam.
[21 Aug 2008 18:18] Bugs System
Pushed into 5.1.28  (revid:chad@mysql.com-20080815184814-yomuz9e9srbhzlra) (version source revid:sergefp@mysql.com-20080819132519-eimtstp3bx89ya9d) (pib:3)
[27 Aug 2008 2:34] Paul DuBois
Test case changes. No changelog entry needed.

Setting report to NDI pending push into 6.0.x.
[12 Sep 2008 1:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53909

2682 He Zhenxing	2008-09-12 [merge]
      Auto merge
      Update plugin.h.pp for WL#4398
[13 Sep 2008 23:13] Bugs System
Pushed into 6.0.6-alpha  (revid:chad@mysql.com-20080815184814-yomuz9e9srbhzlra) (version source revid:klewis@mysql.com-20080715185946-x4iqy1c9qzyqtx0g) (pib:3)
[15 Sep 2008 18:29] Paul DuBois
Test case changes. No changelog entry needed.
[30 Sep 2008 13:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/54774

2667 He Zhenxing	2008-09-29 [merge]
      Auto Merge