Bug #46400 RQG falcon_recovery fails due to fake deadlock
Submitted: 27 Jul 2009 12:16 Modified: 26 May 2010 17:48
Reporter: Olav Sandstå Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0.12-alpha OS:Any
Assigned to: John Embretsen CPU Architecture:Any
Tags: F_TEST
Triage: Triaged: D3 (Medium)

[27 Jul 2009 12:16] Olav Sandstå
Description:
The random query generator test falcon_recovery fails with the following error:

# 23:01:36 10 stalled queries detected, declaring deadlock at DSN dbi:mysql:host=127.0.0.1:port=19306:user=root:database=test.

This indicates that there is a deadlock in the server. This has earlier been reported and investigated as a runtime bug, see Bug#39193. 

This investigation revealed that this deadlock was caused by the test creating a long running transaction that involved at table that later was used in an ALTER operation. This ALTER was stalled "forever" and lack of progress was declared as likely deadlock by the test.

How to repeat:
Run the RQG falcon_recovery

Suggested fix:
This is a test issue. We need to avoid that the ALTER operation is stalled by the long running transaction.

Philip Stoev has proposed the following alternatives for how to solve this:

A. Disable the stall_serial_log_rotation rule from the grammar and leave the rest of the test to do recovery.

B. Make sure that the second mechanism used by this test -- disabling checkpoints by setting an impossible checkpoint schedule, does indeed cause long recovery logs

C. Make it so that the DML that is performed on stall_serial_log_rotation does not involve any of the tables that are subject to ALTER.
[30 Jul 2009 12:14] John Embretsen
A patch has been committed for this bug, to the mysql-test-extra-6.0 branch on 2009-07-30:

    ------------------------------------------------------------
    revno: 979.1.3
    revision-id: jembretsen@nehalem-1-20090730105608-s9fyxcddlu0tnx03
    parent: victor.kirkebo@sun.com-20090730104141-8a7kduhbgw9tk92t
    committer: John Embretsen <jembretsen@nehalem-1>
    branch nick: mysql-test-extra-6.0
    timestamp: Thu 2009-07-30 12:56:08 +0200
    message:
      Fix for Bug#46400 - RQG falcon_recovery fails due to fake deadlock:
       - Replaces reference to regular tables (_table) in INSERT SELECT statement with simply _digit, to avoid locking of tables leading to hangs or "fake deadlocks"
       - With this fix "deadlock" is still seen on some host (nehalem-1) with current falcon-team branch, but it is not seen on other hosts. Will see how it develops in PB2.

=== modified file 'mysql-test/gentest/conf/falcon_recovery.yy'
--- mysql-test/gentest/conf/falcon_recovery.yy  2008-10-29 13:36:18 +0000
+++ mysql-test/gentest/conf/falcon_recovery.yy  2009-07-30 10:47:50 +0000
@@ -11,7 +11,7 @@
 #

 stall_serial_log_rotation:
-       START TRANSACTION ; CREATE TEMPORARY TABLE IF NOT EXISTS stall ( `f1` INTEGER , `connection_id` INTEGER ) ENGINE = Falcon ; INSERT IGNORE INTO stall SELECT `int`, CONNECTION_ID() FROM _table LIMIT _digit ; UPDATE stall SET f1 = f1 + 1 WHERE connection_id = CONNECTION_ID() ; SELECT IF( CONNECTION_ID() = 10 , SLEEP(1800) , 1 ) ;
+       START TRANSACTION ; CREATE TEMPORARY TABLE IF NOT EXISTS stall ( `f1` INTEGER , `connection_id` INTEGER ) ENGINE = Falcon ; INSERT INTO stall VALUES (_digit, CONNECTION_ID()) ; UPDATE stall SET f1 = f1 + 1 WHERE connection_id = CONNECTION_ID() ; SELECT IF( CONNECTION_ID() = 10 , SLEEP(1800) , 1 ) ;

 serial_log_event:
        blob_delete |

---------------------------------------------

In more clear text:

With this change we avoid referencing tables used by other transactions (that are not designed to stall serial log rotation) by replacing 

INSERT IGNORE INTO stall SELECT `int`, CONNECTION_ID() FROM _table LIMIT _digit ;

with

INSERT INTO stall VALUES (_digit, CONNECTION_ID()) ;

(i.e. insert a random digit instead of the contents of another table)

Thanks to Olav to trying out the patch and giving OK to push.
Thanks to Philip for adding workaround for Bug#46433 to the RQG deadlock detector.
[30 Jul 2009 12:22] John Embretsen
Patch pushed to mysql-test-extra-6.0 repository with revision jembretsen@nehalem-1-20090730105608-s9fyxcddlu0tnx03, 2009-07-30.

Visible in Pushbuild 2, mysql-6.0-falcon-team branch, as of push of revision john.embretsen@sun.com-20090730080940-6h0gpy6b31ctiq20 (pushed 2009-07-30 13:56 CEST).
[30 Jul 2009 12:27] Olav Sandstå
Verified that with this fix the test runs without failing due to a "false deadlocks". I have also verified that recovery is processing a large amount of log records from the serial log (about 250.000 - 400.000 log records) so the new "serial log stall" transactions seems to have its intended effect on the test.