Bug #40990 Maria: failure of maria.test & maria_notemebedded in deadlock detection
Submitted: 24 Nov 2008 19:52 Modified: 10 Mar 14:52
Reporter: Guilhem Bichot
Status: Closed
Category:Server: Maria Severity:S3 (Non-critical)
Version:5.1-maria,6.0-maria OS:Sun Solaris (sparc64)
Assigned to: Sergei Golubchik Target Version:6.0-beta
Tags: pushbuild, sporadic, test failure
Triage: Triaged: D2 (Serious) / R1 (None/Negligible) / E2 (Low)

[24 Nov 2008 19:52] Guilhem Bichot
Description:
I have not checked 5.1-maria.
Solaris 10 Sparc 64 debug_max build:

Running:
mysql-test-run.pl --timer --force --comment=ps_stm_threadpool --ps-protocol
--mysqld=--binlog-format=statement --mysqld=--thread-handling=pool-of-threads
...
maria.maria [ fail ]

mysqltest: At line 1493: query 'reap' failed with wrong errno 1213: 'Deadlock found when
trying to get lock; try restarting transaction', instead of 1062...

How to repeat:
log into the machine I guess
[6 Dec 2008 13:30] Guilhem Bichot
Now that the relevant piece of maria.test moved to maria_notembedded.test, it's that test
which fails:
test-max-sol10-sparc64
guilhem@mysql.co...
2008-12-05 22:42:49 
maria.maria_notembedded [ fail ]

mysqltest: At line 50: query 'reap' failed with wrong errno 1205: 'Lock wait timeout
exceeded; try restarting transaction', instead of 1062...
[16 Dec 2008 10:26] Guilhem Bichot
Sanja ran the test on the failing machine, in a loop for 14 hours, no failure, using
pushbuild2 binaries of Dec 14.
According to xref, all 5 failures (4 in 6.0-maria, one in 5.1-maria, all solaris 10
sparc64) were between Nov 22 and Dec 5. Shortly after the last failing push, Monty and
Serg have pushed fixes for memory corruption bugs related to versioning and transaction
manager, so that can be a possible reason why the problem is gone.
We close with "can't repeat" and will reopen if it fails again.
[16 Dec 2008 13:10] Alexander Nozdrin
Happened again:

https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0&order=107

Symptoms:
mysqltest: At line 46: query 'insert t1 values (3)' failed with wrong errno 1205: 'Lock
wait timeout exceeded; try restarting transaction', instead of 1213...
[17 Dec 2008 23:23] Guilhem Bichot
sent some ideas to Sanja and Serg
[22 Dec 2008 19:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62218

2707 Sergei Golubchik	2008-12-22
      Bug#40990 Maria: failure of maria.test & maria_notemebedded in deadlock detection
      detect a case when a blocker has removed itself and signalled after the condition
timed out
      but before it (cond_wait) acquired the mutex back
[7 Jan 22:53] Guilhem Bichot
We cannot be sure that the problem fixed by this patch is the cause of the observed
symptoms (those symptoms are rare, timing-dependent: hard to repeat), but it's quite
possible.
[17 Feb 12:47] Bugs System
Pushed into 6.0.10-alpha (revid:serg@mysql.com-20090217113558-vpsqsyjule7nz0gk) (version
source revid:guilhem@mysql.com-20090213163054-rsg204z5qzcekbfe) (merge vers:
6.0.10-alpha) (pib:6)