Bug #65037 ndbd can't start while old transactions are kept open
Submitted: 19 Apr 2012 15:19 Modified: 30 Sep 2015 12:50
Reporter: Hartmut Holzgraefe Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:7.1.x, 7.2.12 OS:Linux
Assigned to: CPU Architecture:Any

[19 Apr 2012 15:19] Hartmut Holzgraefe
Description:
A starting data node seems to have to lock all rows for a short period of time during start phase 5. 

If a long running transaction holds row locks that the starting node needs it will remain in phase 5 until that transaction ends, without giving any hint what it is waiting for.

How to repeat:
* create a test table like:

   CREATE TABLE t1(id int primary key, val int);
   INSERT INTO t1 VALUES(1,1);

* stop one data node

* update the test table row in a loop:

   BEGIN
   while(true)
      UPDATE t1 SET val = val + 1 WHERE id = 1
      sleep 1

* restart the stopped node, see how it hangs in phase 5

* kill the test transaction

* see how the node restart finally continues

Suggested fix:
minimum: have the starting node report that it is waiting for locks from another nodes transactions in regular intervals, e.g. once per minute

preferred, if possible: after a certain grace period (configurable?) force a timeout of the offending transaction so that the node restart can proceed and availability is not at risk as long as the long running transaction is still humming along ...

making this situation more clear in the documentation would also help a bit for now already ;)
[10 Jan 2013 10:31] MySQL Verification Team
Hello Hartmut,

Thank you for the report.
I can not repeat described behavior with the provided test case on mysql-5.1.63 ndb-7.1.24.

Please could you tell me the version in which you experienced this issue? 

Regards,
Umesh
[11 Feb 2013 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[17 May 2013 18:09] Hartmut Holzgraefe
test code

Attachment: test.php (application/x-php, text), 271 bytes.

[17 May 2013 18:10] Hartmut Holzgraefe
test schema / data

Attachment: schema.sql (text/x-sql), 76 bytes.

[17 May 2013 18:14] Hartmut Holzgraefe
I can still reproduce this easily on 7.1.24 and 7.2.12 using the attached schema/data and php code file:

* create a 2 node cluster (in this case with everything on localhost, but that shouldn't really matter)
* mysql ... test < schema.sql
* ndb_mgm -e "3 stop" # assuming data nodes have ids 2 and 3
* wait for the data node to stop
* run "php test.php" (assuming everything is on localhost and mysql "root" has no password set, else modify the php code accordingly)
* start the 2nd data node again
* watch how it does never get beyond start phase 4 ...
* stop the php script
* see how the 2nd data node completes the remaining start phases
[25 May 2013 15:11] MySQL Verification Team
Hello Hartmut,

Thank you for the report.
Verified as described.

Just a correction(not to confuse 2nd data node with nodeid 2) in the test case:

* create a 2 node cluster (in this case with everything on localhost, but that shouldn't really matter)
* mysql ... test < schema.sql
* ndb_mgm -e "3 stop" # assuming data nodes have ids 2 and 3
* wait for the data node to stop
* run "php test.php" (assuming everything is on localhost and mysql "root" has no password set, else modify the php code accordingly)
* start the node data again (nodeid: 3, which was stopped in step 2) 
* watch how it does never get beyond start phase 4 ...
* stop the php script
* see how the data node completes the remaining start phases((nodeid: 3)

###

ndb_mgm> 3 status
Node 3: starting (Last completed phase 4) (mysql-5.5.30 ndb-7.2.12)

Thanks,
Umesh
[30 Sep 2015 12:50] Hartmut Holzgraefe
Any news here?