Bug #28622 REPLACE INTO cause DELETE to time out on lock
Submitted: 23 May 2007 15:03 Modified: 30 Apr 2010 13:26
Reporter: Geert Vanderkelen Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.0 OS:Linux
Assigned to: Martin Skold CPU Architecture:Any
Tags: 5.0.41, delete, MySQL, ndb, replace into

[23 May 2007 15:03] Geert Vanderkelen
Description:
Doing a DELETE with a range scan while there is REPLACE INTO running makes the DELETE operation error with a 'lock wait timeout exceeded'.

It is easier to reproduce when deleting data from the same table than the replace statement is running on, but it can be any table.

How to repeat:
Three tables, all having the same DDL:

CREATE TABLE `tab1` (
  `id` char(12) NOT NULL,
  `timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`)
) ENGINE=ndbcluster DEFAULT CHARSET=ascii;

CREATE TABLE `tab1_newdata` (
  `id` char(12) NOT NULL,
  `timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`)
) ENGINE=ndbcluster DEFAULT CHARSET=ascii;

CREATE TABLE `tab2` (
  `id` char(12) NOT NULL,
  `timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`)
) ENGINE=ndbcluster DEFAULT CHARSET=ascii;

The small C application populates the tables is attached to this bug report.

tab1 and tab1_newdata need to be populated with a good amount of rows. In my test I had tab1_newdata with about 50000 rows. tab1 about the same, tab1 about the same, it could grow doing the REPLACE INTO. tab2 has also about 50000 rows.
Just need to have some data so the REPLACE INTO takes a while to complete.

Launch in two sessions a REPLACE INTO and a DELETE. You probably need to try a few times to get the error using tab2. With tab1 you should have all the time.

mysql_A> REPLACE INTO tab1 SELECT * FROM tab1_newdata;

mysql_B> DELETE from tab2 where timestamp<(CURDATE()-1);
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Doing the same on tab1, which is REPLACE INTO is updating makes it return with an error quicker:

mysql_B> DELETE from tab1 where timestamp<(CURDATE()-1);
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Odd thing is that it sometimes works! Following was a very long REPLACE INTO and the DELETE eventually succeeded.

mysql_B> DELETE from tab2 where timestamp<(CURDATE()-1);
Query OK, 0 rows affected (35.43 sec)

And sometimes/most of the time it just works fine on tab2.
[23 May 2007 15:06] Geert Vanderkelen
Reported on 5.0.38, and verified using 5.0.38 and 5.0.41.
[23 May 2007 15:08] Geert Vanderkelen
For creating and populating the tables.

Attachment: bug28622_testcase.c (text/plain), 6.91 KiB.

[23 May 2007 15:17] Geert Vanderkelen
Configuration for cluster is quite conservative with 2 data nodes:

[NDBD]
NoOfReplicas=2
DataMemory=400M
IndexMemory=20M

MaxNoOfConcurrentOperations=500000
MaxNoOfConcurrentTransactions=20000
[23 May 2007 15:19] Geert Vanderkelen
Verified using MySQL 5.1.19 (build from BK).

The REPLACE INTO is almost twice as slow, and I managed to crash a data node first doing it (was temporary error though). Didn't manage to repeat that crash till now.
[18 Jul 2008 10:48] Geert Vanderkelen
(Maybe related to bugs #11667 and #9961?)

Using MySQL Cluster 6.3bzr.

I could repeat twice a node crash with the test case for bug #28622: doing a REPLACE in one session, and a DELETE in the other (on another table). The lock time outs are still a problem, and I wonder if they they actually don't lead to the node crash (sometimes).

To help, I put the DELETE in a loop:

while true; do
   /data1/mysql/ndb-6.3bzr/bin/mysql -BN test -e "SELECT NOW(); DELETE from tab2 where timestamp<(CURDATE()-1); SELECT NOW();"; echo "--";
   sleep 1;
done

Two times the following error happend when session A ended. MySQL session B then exited with error ERROR 1297.

Time: Friday 18 July 2008 - 12:25:07
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, p
lease report a bug)
Error: 2303
Error data: Node 3 killed this node because GCP stop was detected
Error object: NDBCNTR (Line: 249) 0x0000000a
Program: /data1/mysql/ndb-6.3bzr/libexec/ndbd
Pid: 23527
Trace: /data2/users/geert/cluster/master/ndb_3_trace.log.1
Version: mysql-5.1.24 ndb-6.3.16-R

Session A
==============
> SELECT NOW(); REPLACE INTO tab1 SELECT * FROM tab1_newdata; SELECT NOW();
+---------------------+
| NOW()               |
+---------------------+
| 2008-07-18 12:29:41 |
+---------------------+
1 row in set (0.00 sec)

Query OK, 49348 rows affected (1 min 11.12 sec)
Records: 49348  Duplicates: 0  Warnings: 0

+---------------------+
| NOW()               |
+---------------------+
| 2008-07-18 12:30:52 |
+---------------------+
1 row in set (0.00 sec)

Session B
==============
--
2008-07-18 12:30:30
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
--
....
--
2008-07-18 12:30:56
ERROR 1297 (HY000) at line 1: Got temporary error 270 'Transaction aborted due to node shutdown' from NDBCLUSTER
--

The odd thing is that at the near the end of the REPLACE, the lock wait timeout errors start to happen.

Session A:
===============
...
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
+---------------------+
| NOW()               |
+---------------------+
| 2008-07-18 12:37:50 |
+---------------------+
1 row in set (0.00 sec)

Session B:
===============

2008-07-18 12:37:31
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
--
2008-07-18 12:37:37
2008-07-18 12:37:43
--
2008-07-18 12:37:44
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
--
2008-07-18 12:37:51
2008-07-18 12:37:51
--
2008-07-18 12:37:52
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
--
[8 Dec 2008 10:32] Geert Vanderkelen
Crash reported in bug #41292.

Lock timeouts still happening in MySQL Cluster 6.3bzr (latest pull).
[30 Apr 2010 13:26] Geert Vanderkelen
I can't repeat this anymore using MySQL Cluster 7.0.14. I suspect it got fixed along the way.