Bug #57466 Transaction Isolation problem via Duplicate Keys with impossible error data.
Submitted: 14 Oct 2010 18:50 Modified: 22 Oct 2010 23:37
Reporter: Daniel Smythe Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:6.3.26 + OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any
Tags: cluster, duplicate keys, transaction isolation

[14 Oct 2010 18:50] Daniel Smythe
Description:
When two transactions act on the same table attempting to delete and insert data, occasionally, duplicate key errors are thrown, even though the transaction begins with a DELETE to empty the table, and the values being inserted do not contain duplicates.

How to repeat:
Create our tables:

CREATE TABLE `static_values` (
  `dsg` int(11) NOT NULL,
  `node` int(11) NOT NULL,
  `prio` int(11) NOT NULL DEFAULT '-1',
  PRIMARY KEY (`dsg`,`node`),
  UNIQUE KEY `static_values_pk2` (`dsg`,`prio`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1

CREATE TABLE `changing_data` (
  `toe` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `dsg` int(11) NOT NULL,
  `node` int(11) NOT NULL,
  PRIMARY KEY (`toe`,`dsg`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1

Add some values:

INSERT INTO static_values VALUES (0, 15, 15), (0, 16, 16), (1, 15, 15), (1, 16, 16), (1, 20, 20), (2, 15, 15), (2, 16, 16), (2, 20, 20);

Create two identical scripts which do the following:

- START TRANSACTION;
- DELETE FROM changing_data;
- INSERT INTO changing_data ( dsg, node, toe ) SELECT dsg, node, NOW() FROM static_values WHERE CONCAT('d', dsg, '@n', node) IN ( 'd0@n15', 'd1@n16', 'd2@n15');
- COMMIT;

We understand that regardless of which script is running the transaction, each transaction should, as a unit, clear the table of data, and then put the same data back in, with different timestamps. 

I cannot determine why the error messages are sporadic, and always either:

Duplicate entry '1-20' for key 'PRIMARY'
Duplicate entry '1-15' for key 'PRIMARY'

When there are no values of '1' for the timestamp column, and there is never a '20' value returned from the INSERT ... SELECT. Furthermore, the data should never encounter a duplicate key error, as there are never duplicates being inserted, and the table theoretically should always be empty prior to the INSERT ... SELECT, as it is part of a transaction.

Here is the test script I am using to reproduce this:

<?php

$db = mysql_connect('localhost:/tmp/mysql.sock', '', '');
      mysql_select_db('test');

$start  = "START TRANSACTION";
$commit = "COMMIT";
$delete = "DELETE FROM changing_data";
$insert = "INSERT INTO changing_data ( dsg, node, toe ) SELECT dsg, node, NOW() FROM static_values WHERE CONCAT('d', dsg, '@n', node) IN ( 'd0@n15', 'd1@n16', 'd2@n15');";

while (true) {
echo "Starting: " . (time()+microtime())."\n";
  mysql_query($start);
echo "Finished, Deleting: " . (time()+microtime())."\n";
  mysql_query($delete);
echo "Deleted, Checking Error: " . (time()+microtime())."\n";
  if ( mysql_error() )
    echo mysql_error() . "\nOn: " . $delete ."\n";
echo "Inserting: " . (time()+microtime())."\n";
  mysql_query($insert);
echo "Finished, Checking Error: " . (time()+microtime())."\n";
  if ( mysql_error() )
    echo mysql_error() . "\nOn: " . $insert ."\n";
echo "Committing: " . (time()+microtime())."\n";
  mysql_query($commit);
echo "Committed: " . (time()+microtime())."\n";
}

?>

Simply create the schema and starting data, and run two of these scripts simultaneously. Here are raw logs from the output of mine:

RAW LOGS FROM TEST:
==Process 1
Starting: 1287078892.9959
Finished, Deleting: 1287078892.9959
Deleted, Checking Error: 1287078893.016
Inserting: 1287078893.0161
Finished, Checking Error: 1287078893.017
Duplicate entry '1-20' for key 'PRIMARY'
On: INSERT INTO changing_data ( dsg, node, toe ) SELECT dsg, node, NOW() FROM static_values WHERE CONCAT('d', dsg, '@n', node) IN ( 'd0@n15', 'd1@n16', 'd2@n15');
Committing: 1287078893.0171
Committed: 1287078893.0171

==Process 2
Starting: 1287078892.9756
Finished, Deleting: 1287078892.9757
Deleted, Checking Error: 1287078893.0148
Inserting: 1287078893.0148
Finished, Checking Error: 1287078893.0153
Committing: 1287078893.0153
Committed: 1287078893.0155

======Combined=======
P2 - Starting: 1287078892.9756
P2 - Finished, Deleting: 1287078892.9757
P1 - Starting: 1287078892.9959
P1 - Finished, Deleting: 1287078892.9959
P2 - Deleted, Checking Error: 1287078893.0148
P2 - Inserting: 1287078893.0148
P2 - Finished, Checking Error: 1287078893.0153
P2 - Committing: 1287078893.0153
P2 - Committed: 1287078893.0155
P1 - Deleted, Checking Error: 1287078893.016
P1 - Inserting: 1287078893.0161
P1 - Finished, Checking Error: 1287078893.017
Duplicate entry '1-20' for key 'PRIMARY'
On: INSERT INTO changing_data ( dsg, node, toe ) SELECT dsg, node, NOW() FROM static_values WHERE CONCAT('d', dsg, '@n', node) IN ( 'd0@n15', 'd1@n16', 'd2@n15');
P1 - Committing: 1287078893.0171
P1 - Committed: 1287078893.0171

You can see that P1's DELETE was delayed while it waited for P2's COMMIT, however once P1's DELETE completed, it subsequently received a Duplicate Key violation on the following INSERT, which should be impossible ( and furthermore, on a value which was not being inserted. )

Suggested fix:
There are a few different problems here. 

1: Transaction isolation is not behaving appropriately, it seems.
2: The error message returned does not return valid data for the key being violated. ( In other words, the error message is for a key violation on a timestamp, and INT. It is returning an INT for the timestamp column, and a value which is impossible for the INT. )
[15 Oct 2010 7:05] Jonas Oreland
This is not a bug...look at this possible "scheduling" of
the 2 transaction T1/T2

T1> delete from changing_tables; // no tuples deleted
T2> delete from changing tables; // no tuples deleted
T1> insert some values;          // insert performed but not committed
T2> insert some values;          // insert hanging on locks for T1
T1> commit;                      // ok
T2> duplicate key;               // ok

---

The difference from e.g innodb, is that the delete does not lock records not present in table. innodb does this by using next-key locking (btw, which in a delete wo/ where-clause will lock entire table)

---

A solution to customer problem is to serialize the two by having a lock-record somewhere, and add a update to that record first in transaction.

/Jonas

The "static_data" tables is not relevant at all...
replace the "insert into select from static_values" with multi-valued insert and you get same result
[15 Oct 2010 13:51] Axel Schwenke
Jonas,

this explains nicely why the transactions could collide. Still it does not explain why the failed transaction returns bogus values for the violated UNIQUE constraint.

The PRIMARY KEY is on (TIMESTAMP, INT) and the message claims the violating tuple was (1,20). But neither was 1 ever used as timestamp nor was 20 ever used for the int column.
[15 Oct 2010 16:01] Daniel Smythe
Jonas, I understand your scenario, however this is not the behavior I am seeing. In this log snippet, you can see that P1 and P2 both start transactions and schedule deletes from the table. P1's delete gets delayed until P2 has completely committed though. This means that P1's delete should have deleted the tuples that P2 added, leaving an empty table to insert into. Instead, the insert errors with a duplicate key error, even though the table was just deleted /after/ P2's commit.

I will work on a test case for the other issue, which is they key violation reporting incorrect values. This is similar to Bug #21072, but I will investigate more.

======Combined=======
P2 - Starting: 1287078892.9756
P2 - Finished, Deleting: 1287078892.9757 <------- P2 issues DELETE
P1 - Starting: 1287078892.9959
P1 - Finished, Deleting: 1287078892.9959 <------- P1 issues DELETE
P2 - Deleted, Checking Error: 1287078893.0148 <-- P2 DELETE finishes..
P2 - Inserting: 1287078893.0148
P2 - Finished, Checking Error: 1287078893.0153
P2 - Committing: 1287078893.0153
P2 - Committed: 1287078893.0155  <--------------- P2 COMMITS
P1 - Deleted, Checking Error: 1287078893.016 <--- P1's DELETE finishes.
P1 - Inserting: 1287078893.0161 <---------------- P1 INSERTS
P1 - Finished, Checking Error: 1287078893.017 <-- P1 gets duplicate key.
Duplicate entry '1-20' for key 'PRIMARY'
On: INSERT INTO changing_data ( dsg, node, toe ) SELECT dsg, node, NOW() FROM
static_values WHERE CONCAT('d', dsg, '@n', node) IN ( 'd0@n15', 'd1@n16', 'd2@n15');
P1 - Committing: 1287078893.0171
P1 - Committed: 1287078893.0171
[15 Oct 2010 16:16] Jonas Oreland
The printouts proves nothing.
I'll add possible scenario inline with your "combined"
The summary is that the only way P1 DELETE is quaranteed to find all rows
  inserted by P1, is it it's starts after P2 has committed.
  Otherwise there is a race...

======Combined=======
P2 - Starting: 1287078892.9756
P2 - Finished, Deleting: 1287078892.9757 <------- P2 issues DELETE
P1 - Starting: 1287078892.9959
P1 - Finished, Deleting: 1287078892.9959 <------- P1 issues DELETE

> p1: start scan table pos = 0;
> p1: scan position 0 -> 10, pauses at position p = 10; 

P2 - Deleted, Checking Error: 1287078893.0148 <-- P2 DELETE finishes..
P2 - Inserting: 1287078893.0148

> p2: insert pos = 0;

P2 - Finished, Checking Error: 1287078893.0153
P2 - Committing: 1287078893.0153
P2 - Committed: 1287078893.0155  <--------------- P2 COMMITS

> p2: commit row inserted at pos = 0;

P1 - Deleted, Checking Error: 1287078893.016 <--- P1's DELETE finishes.

p1> scan missed row @ pos = 0; found nothing

P1 - Inserting: 1287078893.0161 <---------------- P1 INSERTS
P1 - Finished, Checking Error: 1287078893.017 <-- P1 gets duplicate key.
Duplicate entry '1-20' for key 'PRIMARY'
On: INSERT INTO changing_data ( dsg, node, toe ) SELECT dsg, node, NOW() FROM
static_values WHERE CONCAT('d', dsg, '@n', node) IN ( 'd0@n15', 'd1@n16', 'd2@n15');
P1 - Committing: 1287078893.0171
P1 - Committed: 1287078893.0171
[15 Oct 2010 17:50] Daniel Smythe
Simpler test case, with lock table/row in transaction, table schema, test php script, and sample combined output of 2 events.

Attachment: bug57466.txt (text/plain), 6.05 KiB.

[20 Oct 2010 10:45] Jonas Oreland
Hi,

I can't repeat this.
I don't have php installed so I made what I think is the equivalent mysqltest scripts (as attached on bug)

Can you test "my version"
What cluster-config do you use?

Setting status to need feedback
/Jonas
[20 Oct 2010 10:46] Jonas Oreland
"my version"

Attachment: bug57466.tgz (application/x-compressed-tar, text), 695 bytes.

[22 Oct 2010 23:37] Daniel Smythe
I'm not able to reproduce the problem with the test scripts you've created Jonas.

It appears that PHP's mysql implementation does not allow "SET AUTOCOMMIT" commands to actually do anything, it still AUTOCOMMITS.

I can confirm this by using PHP's mysqli extension, which does support disabling AUTOCOMMIT, in which case I cannot reproduce the issue using my test anymore.

This certainly can be closed as !bug.
[23 Oct 2010 5:24] Shane Bester
i had seen this when i tested with a C client. anyway, I'm not going to argue with cluster experts on this....

i|1|drop table if exists static_values,changing_data;
i|1|create table `static_values` (`dsg` int,`node` int,  primary key (`dsg`,`node`)) engine=ndbcluster
i|1|create table `changing_data` (a int,`toe` timestamp,`dsg` int, primary key (`toe`,`dsg`)) engine=ndbcluster
i|1|insert into static_values values (0,0);
i|1|set global table_open_cache=0
n|1|set transaction isolation level read committed
n|1|start transaction;
n|1|delete from changing_data ;
n|1|insert into changing_data (dsg) select dsg from static_values where dsg = 0|
n|1|commit

"i" gets run before anything.
"n" gets run in order in 5 threads from top to bottom again and again.