Bug #29049 lock_multi fails in rare case
Submitted: 12 Jun 2007 12:30 Modified: 11 Aug 2007 10:12
Reporter: Matthias Leich Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Tests Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Konstantin Osipov CPU Architecture:Any

[12 Jun 2007 12:30] Matthias Leich
Description:
Run of full regression test on 
- custome build of MySQL 5.036
  (some changes around connection-pool)
  32 Bit release
- uname -a
  SunOS sol10-sparc-a 5.10 Generic sun4u sparc
  SUNW,Sun-Fire-V240
  SUN SPARC 64 Bit
- mysqld was started with the option
  -thread-handling=pool-of-threads
- There was a high load on the testing box.
  (in parallel running tests + very likely a compile)

The testcase lock_multi failed.
Mixup of testscript and file with expected results:
connection locker;
create table t1(n int);
insert into t1 values (1);
lock tables t1 write;
connection writer;
send update low_priority t1 set n = 4;
connection reader;
--sleep 2
send select n from t1;
connection locker;
--sleep 2
unlock tables;
connection writer;
reap;
connection reader;
reap;
n
4  <--- We got here a "1" instead.
drop table t1;

1. I was unable to reproduce this testcase failure.
   Conclusion: It is rare.
2. There is no proof for a clear server bug like
   crash, non sense warning or error message.
   There is also a timeframe where a result set
   with n = 1 was valid.
3. There is no strict order of command processing
   controlled by mysqltest.
   Hint: send <command> ->
         Send the command to the server, but do not
         wait till the response comes in.
         Simply proceed with sending the next commands.
4. The presence of the "sleep" commands shows that it
   is tried to ensure some order within the processing of
   the commands on server side.

I assume that the "sleep 2" between
  send update low_priority t1 set n = 4;
and
  send select n from t1;
was just too short (much parallel load on testing box)
to ensure that the update is executed first after
the unlock tables.

Therefore I think that the server works correct, but 
testcase is weak.

I also guess that this problem does not depend on
MySQL version, OS or hardware.

How to repeat:
Just run lock_multi with high parallel load
till you get the same failure.

Suggested fix:
It should be tried to replace the "sleep 2" by a routine
which checks that the server has made sufficient 
progress in processing the update command.

I volunteer for an attempt to write such a routine.
Note: We have some SHOW commands which give a
      information about the current state of
      sessions. In the moment I do not know if this
      information is fine grained enough for
      detection of intended command processing state.
[11 Aug 2007 10:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/32420

ChangeSet@1.2564, 2007-08-11 14:07:49+04:00, kostja@bodhi.(none) +2 -0
  A fix for Bug#29049 lock_multi fails in rare case.
  The patch changes the test case only.
  The fix is to replace all 'sleep's with wait_condition. This makes
  the test deterministic and also ~300 times faster.
[11 Aug 2007 10:10] Konstantin Osipov
Queued in 5.1-runtime
[11 Aug 2007 10:12] Konstantin Osipov
Pushed into 5.1-target-5.1.22.
A test case change, no documentation entry is needed.
[21 Aug 2007 23:21] Bugs System
Pushed into 5.1.22-beta