Bug #32416 Foreign key constraint violation under concurrent load
Submitted: 15 Nov 2007 17:15 Modified: 24 Jul 2008 11:57
Reporter: Fadi Bitar Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:5.0.27, 5.0.37, 5.1.22 OS:Linux
Assigned to: Vasil Dimov CPU Architecture:Any
Tags: concurrent, foreign key, insert

[15 Nov 2007 17:15] Fadi Bitar
Description:
This bug came up while performing load tests on our product.
The client program used is a company developed java product. the connector used is mysql-connector-java-5.0.4-bin.jar

The following is the table structure of the db:
-----------------------------------------
-- Table structure for table 'document'
-----------------------------------------
CREATE TABLE DOCUMENT (
  docid 				int(10) unsigned 	NOT NULL auto_increment,
  dockey 				varchar(32) 		NOT NULL,
  ancestorkey			varchar(32)			default NULL,
  parentkey				varchar(32)			default NULL,				
  childkey				varchar(32)			default NULL,
  form 					varchar(128) 		NOT NULL,
  creator 				varchar(128) 		NOT NULL,
  creation_date 		bigint				default NULL,
  workflow_state 		varchar(128) 		default NULL,
  last_modified_by 		varchar(128) 		default NULL,
  last_modified_date 	bigint				default NULL,
  timer_date			bigint				default NULL,
  timer_processed		bigint				default NULL,
  timer_step			varchar(128)		default NULL,  
  PRIMARY KEY 				(docid),
  KEY dockey_index 			(dockey),
  KEY ancestorkey_index 	(ancestorkey),
  KEY parentkey_index   	(parentkey),
  KEY childkey_index    	(childkey),
  KEY timer_date_index		(timer_date),
  KEY timer_processed_index	(timer_processed)
) DEFAULT CHARACTER SET utf8 
  TYPE=InnoDB;

-----------------------------------------
-- Table structure for table 'field'
-----------------------------------------
CREATE TABLE FIELD (
  docid 			int(10) unsigned 	NOT NULL,
  field_name 		varchar(128) 		NOT NULL,
  field_type 		int(1) unsigned 	NOT NULL,
  field_string 		varchar(255)  		NOT NULL,
  field_numeric 	double 				default NULL,
  field_date 		bigint 				default NULL,
  field_text 		text				default NULL,
  field_position	int unsigned		NOT NULL,
  PRIMARY KEY  				(docid,field_name,field_position),
  FOREIGN KEY (docid) REFERENCES DOCUMENT(docid) ON DELETE CASCADE,
  KEY docid_index 			(docid),
  KEY field_name_index 		(field_name),
  KEY field_type_index 		(field_type),
  KEY field_string_index 	(field_string),
  KEY field_numeric_index 	(field_numeric),
  KEY field_date_index 		(field_date)
) DEFAULT CHARACTER SET utf8 
  TYPE=InnoDB;

In our code, the following statements are being executed:

Connection conn = requestContext.getRequestTxnConnection();

prepStmt = conn.prepareStatement("insert into " + TableType.METADATA.getTableName() + " ("
	                    + MetadataType.WORKFLOWSTATE.getAlias() + ", " + MetadataType.LASTMODIFIEDBY.getAlias() + ", "
	                    + MetadataType.LASTMODIFIEDDATE.getAlias() + ", " + MetadataType.TIMERSTEP.getAlias() + ", "
	                    + MetadataType.TIMERDATE.getAlias() + ", " + MetadataType.TIMERPROCESSED.getAlias() + ", "
	                    + MetadataType.DOCKEY.getAlias() + ", " + MetadataType.FORM.getAlias() + ", "
	                    + MetadataType.CREATOR.getAlias() + ", " + MetadataType.CREATIONDATE.getAlias() + ", "
	                    + MetadataType.ANCESTORKEY.getAlias() + ", " + MetadataType.PARENTKEY.getAlias() + ", "
	                    + MetadataType.CHILDKEY.getAlias() + ") values (?,?,?,?,?,?,?,?,?,?,?,?,?)",
	                    Statement.NO_GENERATED_KEYS);
----------------------------------------------------------------------------------
--after going through all the prepStmt.setXXX(..) methods and replacing all '?'
----------------------------------------------------------------------------------
prepStmt.executeUpdate();
prepStmt.close();

----------------------------------------------------------------------------------
--now we retrieve the docId through a select statement (this is done explicitly through a select in order to be compliant with other database servers in which this is necessary
----------------------------------------------------------------------------------
sql = "select " + MetadataType.DOCID.getAlias() + " from "
	            	+ TableType.METADATA.getTableName() + " where " + MetadataType.DOCKEY.getAlias() + " = '" + dockey + "'";

stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(sql);
rs.next();
docId = rs.getLong(MetadataType.DOCID.getAlias());
document.setValue(Document.DOCID, new Float(docId));
stmt.close();
----------------------------------------------------------------------------------
at this point, we try to insert to the field table, using the retrieved docId, and get the attached log. Note that is has only been observed under concurrent load, and that not all inserts to the FIELD table fail, but about half of them. Also note that the same behavior has been reproduced under windows XP OS
----------------------------------------------------------------------------------

How to repeat:
Given that the application we are developing and have caught the bug on is huge, it is impossible to send the whole code, but it is likely that with a code of a similar structure, run under concurrent load, the bug would be reproduced.
[15 Nov 2007 17:16] Fadi Bitar
Log trace

Attachment: trace.txt (text/plain), 13.85 KiB.

[15 Nov 2007 17:47] Fadi Bitar
just for the sake of clarity, examples of the sql statements executed are listed below:

insert into DOCUMENT (workflow_state, last_modified_by, last_modified_date, timer_step, timer_date, timer_processed, dockey, form, creator, creation_date, ancestorkey, parentkey, childkey) values ('listUpdated','admin',1195148671886,'',0,0,'34595D7A8F0DD1B456DF37A06EBB5FB6','articleCategories','admin',1195148671886,'34595D7A8F0DD1B456DF37A06EBB5FB6',NULL,NULL)

select docid from DOCUMENT where dockey = '34595D7A8F0DD1B456DF37A06EBB5FB6';

insert into FIELD (field_string, field_numeric, field_date, field_text, field_type, docid, field_name, field_position) values ('Test Article Category (34595D7A8F0DD1B456DF37A06EBB5FB6)',0.0,0,NULL,1,43,'sectionA.publishType',0); insert into FIELD (field_string, field_numeric, field_date, field_text, field_type, docid, field_name, field_position) values ('15/11/2007',0.0,1195148671719,NULL,3,43,'sectionA.firstEditDate',0); insert into FIELD (field_string, field_numeric, field_date, field_text, field_type, docid, field_name, field_position) values ('admin',0.0,0,NULL,1,43,'sectionA.writeACL',0); insert into FIELD (field_string, field_numeric, field_date, field_text, field_type, docid, field_name, field_position) values ('all',0.0,0,NULL,1,43,'sectionA.readACL',0);
[15 Nov 2007 18:04] Valeriy Kravchuk
Thank you for a problem report. Please, specify the exact version of server used, 5.0.x.
[16 Nov 2007 9:12] Fadi Bitar
Updated the versions of the servers used:
Behavior observed on 5.0.27 and 5.0.37. We were also able to reproduce it on 5.1.22
[16 Nov 2007 12:47] Heikki Tuuri
Fadi,

please print SHOW INNODB STATUS after a strange foreign key error. It shows a detailed explanation of the error.

Regards,

Heikki
[17 Nov 2007 14:32] Ryan Murray
innodb status after foreign key error

Attachment: inndb status output.txt (text/plain), 7.29 KiB.

[17 Nov 2007 14:37] Ryan Murray
Heikki-

Attached is a trace from the foreign key error on a Windows box.  Looking at the transaction details I see there was a deadlock detected and a transaction rolled back.  I presume this is a case of the famous next-key locking causing a problem. , leading to the insert failing.  If this is the cause I don't understand why within a single transaction I would get this foreign key failure instead of the deadlock failure. In our code however we never get a deadlock exception on any connection.

Best,
Ryan Murray
[17 Nov 2007 15:04] Ryan Murray
Noticed in the innodb docs on locking that foreign key constraints cause locks to be set (although as I understood it this would be on the parent table), so just as a curiosity I removed the constraint.

After removing the foreign key constraint, I now have threads getting exceptions from the deadlocks.  But I still don't understand why the FIELD table is getting conflicting locks now.

Just to be clear, yes my concurrent threads are inserting sequential ids.
[23 Nov 2007 11:42] Fadi Bitar
Ryan Murray has updated with all the necessary info.
[13 Dec 2007 12:55] Heikki Tuuri
Vasil,

please look at this.

--Heikki
[24 Jan 2008 15:40] Vasil Dimov
Hi,

I have 4 questions:

So you do:

insert into DOCUMENT (..., dockey, ...) values
(..., '34595D7A8F0DD1B456DF37A06EBB5FB6', ...)

select docid from DOCUMENT where dockey = '34595D7A8F0DD1B456DF37A06EBB5FB6';

insert into FIELD (..., docid, ...) values (..., 43, ...);
insert into FIELD (..., docid, ...) values (..., 43, ...);
insert into FIELD (..., docid, ...) values (..., 43, ...);
insert into FIELD (..., docid, ...) values (..., 43, ...);
insert into FIELD (..., docid, ...) values (..., 43, ...);

1. What is the value of autocommit? I.e. are you executing the above statements in a single transaction?

2. Is dockey unique?
It is not declared as unique in the document table definition.

If it is not unique then the select could return more than 1 row and the retrieved row (the first one presumably) could be rolled back before the insert into field.

Example:

TX1: insert into document; -- (docid, dockey)=(1, 'a')
TX2: insert into document; -- (docid, dockey)=(2, 'a')
TX2: select * from document where dockey = 'a';
-- 2 rows are returned; the application selects the first one, docid=1
TX1: rollback;
TX2: insert into field (docid) values (1); -- foreign key error

2. What transaction isolation level are you using?

3. Do you execute exactly 5 inserts into field for each docid?

Thanks!
[25 Jan 2008 14:05] Vasil Dimov
5. How many threads are you running concurrently?

6. How much time does it take for this error to appear?

(I am trying to reproduce this, no luck so far)
[28 Jan 2008 20:59] Ryan Murray
Hi Vasil-

In response to your questions:

1. autocommit - no we are not using autocommit.  The inserts to the DOCUMENT & FIELD tables happen in a transaction.

2. The dockey is a UUID gnerated by the application and is always unique.  The column on the db could be declared unique.  That said, the application is not rolling back before the insert to the field table.  It appears however that in the errors we see that mySQL is detecting a deadlock during the insert ONLY during concurrent inserts (when I remove the FOREIGN KEY constraint these deadlocks are apparent) which is effectively causing the parent record to no longer be available.  The issue is why the deadlocks occur.

2 (b).  TXN ISOLATION We have tried with READ REPEATABLE and READ COMMITTED with the same result.

3. The FIELD table is a key-value table.  The number of inserts can be variable although in the simple case which reproduces the error the number of inserts is usually around 5.

5. Even 2 concurrent inserts can give the error.  Our load tests have generated between 2 and 5 concurrent inserts and in all scenarios the problem can occur.

6. The error appears to happen immediately with relatively few record in the DOCUMENT and FIELD tables (ie DOCUMENT < 100 records, FIELDS < 500 records.

As I mentioned in previous comment, the issue appears to be index locking on the FIELD table causing an "implicit" rollback of the DOCUMENT insert. The questions are:

1) Why does this concurrent insert lead to deadlocks?  This is obviously the core problem for my application.
2) In the case of such a deadlock, why doesn't the error lead to an _immediate_ failure of the transaction rather than have the inserts to the FIELD table with a the FOREIGN KEY violation.  This problem is obviously what made it difficult to pinpoint the _real_ problem at first. The deadlock error gets propagated to our Java code executing stmt.executeBatch() where the batch contains the inserts to the FIELD table. The FOREIGN KEY error is propagated to the same statement.
[25 Feb 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[25 Feb 2008 12:43] Fadi Bitar
Any progress in reproducing the bug ?
[26 Feb 2008 13:18] Vasil Dimov
Fadi,

unfortunately I am still unable to reproduce this. I wrote this perl script:

--- cut ---
sub dbh_do_or_die {
        $dbh = shift;
        $sql = shift;
        if (!$dbh->do($sql)) {
                die("SQL DO ERROR: $sql: " . $dbh->errstr);
        }
}

my $dbh = $server->connect();

#dbh_do_or_die($dbh, "DROP TABLE IF EXISTS field");
#dbh_do_or_die($dbh, "DROP TABLE IF EXISTS document");
#
#dbh_do_or_die($dbh,
#"CREATE TABLE document (
#       docid INT(10) UNSIGNED NOT NULL auto_increment,
#       dockey VARCHAR(32) NOT NULL,
#       PRIMARY KEY (docid),
#       KEY dockey_index (dockey)
#) DEFAULT CHARACTER SET utf8 ENGINE=InnoDB");
#
#dbh_do_or_die($dbh,
#"CREATE TABLE field (
#       docid INT(10) UNSIGNED NOT NULL,
#       field_name VARCHAR(128) NOT NULL,
#       field_position INT UNSIGNED NOT NULL,
#       PRIMARY KEY (docid, field_name, field_position),
#       FOREIGN KEY (docid) REFERENCES document(docid) ON DELETE CASCADE,
#       KEY docid_index (docid),
#       KEY field_name_index (field_name)
#) DEFAULT CHARACTER SET utf8 ENGINE=InnoDB");

dbh_do_or_die($dbh, "SET AUTOCOMMIT = 0");

my $dockey = $ARGV[0] || die("Usage: ibtest-bug32416 uniqdockey");

for (my $j = 0; $j < 1000; $j++) {

        dbh_do_or_die($dbh, "INSERT INTO document (dockey) VALUES ('$dockey')");

        my $sth = $dbh->prepare("SELECT docid FROM document WHERE dockey = '$dockey'")
                or die("SQL PREPARE ERROR: " . $dbh->errstr);

        $sth->execute
                or die($DBI::errstr);

        my @row = $sth->fetchrow_array
                or die($DBI::errstr);

        $sth->finish;

        my $docid = $row[0];

        for (my $i = 0; $i < 5; $i++) {
                dbh_do_or_die($dbh, "INSERT INTO field (docid, field_name, field_position)+
        }

        dbh_do_or_die($dbh, "COMMIT");
}

$dbh->disconnect;
--- cut ---

Running 20 instances of this script at the same time and none of them "dies".

I am testing the latest 5.1 sources from the BitKeeper repository (5.1.24-rc).

Do you think that this script does the same as what your application does?
Anything I might have missed?
[4 Mar 2008 17:21] Heikki Tuuri
Vasil cannot repeat this.
[10 Apr 2008 12:16] Fadi Bitar
Code to reproduce the bug

Attachment: mysqlbug.zip (application/x-zip-compressed, text), 2.30 KiB.

[10 Apr 2008 12:17] Fadi Bitar
Documented reproduction procedure

Attachment: deadlock-sql.doc (application/msword, text), 30.50 KiB.

[10 Apr 2008 12:18] Fadi Bitar
I just attached some code and a word document describing how to deterministically reproduce the bug.

Hope this helps
[10 Apr 2008 12:44] Heikki Tuuri
Vasil,

please look at this again.

Heikki
[10 Apr 2008 12:53] Vasil Dimov
Fadi, thanks! I will take a look in the next few days...
[28 May 2008 14:50] Vasil Dimov
Executing the commands from the .doc file:

CREATE TABLE document (
        docid INT(10) UNSIGNED KEY AUTO_INCREMENT, 
        dockey VARCHAR(32) NOT NULL
) DEFAULT CHARACTER SET utf8 ENGINE=InnoDB;

CREATE TABLE field (
        docid INT(10) UNSIGNED NOT NULL, 
        field_name VARCHAR(128) NOT NULL, 
        field_string VARCHAR(255) NOT NULL, 
        PRIMARY KEY (docid,field_name),
        FOREIGN KEY (docid) REFERENCES document(docid) ON DELETE CASCADE
) DEFAULT CHARACTER SET utf8 ENGINE=InnoDB;

17:34:32 TRX1> start transaction;                  
Query OK, 0 rows affected (0.00 sec)

17:34:59 TRX2> start transaction;
Query OK, 0 rows affected (0.00 sec)

17:34:45 TRX1> insert into document (dockey) values ('Doc1');
Query OK, 1 row affected (0.00 sec)

17:35:51 TRX1> select docid from document where dockey='Doc1';
+-------+
| docid |
+-------+
|     1 | 
+-------+
1 row in set (0.00 sec)

17:36:08 TRX1> delete from field where docid=1;
Query OK, 0 rows affected (0.00 sec)

17:35:07 TRX2> insert into document (dockey) values ('Doc2');
Query OK, 1 row affected (0.00 sec)

17:36:48 TRX2> select docid from document where dockey='Doc2';
+-------+
| docid |
+-------+
|     2 | 
+-------+
1 row in set (0.00 sec)

17:36:56 TRX2> delete from field where docid=2;
Query OK, 0 rows affected (0.00 sec)

17:37:08 TRX2> insert into field (docid,field_name,field_string) values (2,'f1','v1');
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
17:38:10 TRX2>
[28 May 2008 16:22] Vasil Dimov
Fadi, what I observe is not a bug, here is a screen shot from the I_S tables that are provided in the InnoDB Plugin:

+--------+-----------+-----------------------+------------------------------------------------------------------------+
| trx_id | trx_state | trx_requested_lock_id | trx_query                                                              |
+--------+-----------+-----------------------+------------------------------------------------------------------------+
| 303    | RUNNING   | NULL                  | NULL                                                                   | 
| 304    | LOCK WAIT | 304:0:51:1            | insert into field (docid,field_name,field_string) values (2,'f1','v1') | 
+--------+-----------+-----------------------+------------------------------------------------------------------------+

+------------+-------------+-----------+-----------+----------------+------------------------+
| lock_id    | lock_trx_id | lock_mode | lock_type | lock_table     | lock_data              |
+------------+-------------+-----------+-----------+----------------+------------------------+
| 303:0:51:1 | 303         | X         | RECORD    | `test`.`field` | supremum pseudo-record | 
| 304:0:51:1 | 304         | X         | RECORD    | `test`.`field` | supremum pseudo-record | 
+------------+-------------+-----------+-----------+----------------+------------------------+
[30 May 2008 15:18] Ryan Murray
Hi Vasil-

The Java code we attached should reproduce the foreign key issue which is clearly a bug, if the CLI version doesn't.

Regarding the timeout, I guess given the inno index locking model you would consider this normal, but a deadlock for this case where there is no logical conflict is, IMHO, simply too pessimistic of an approach, particularly if there is no other "pessimism" mode I can choose (like soldDB has). This code runs without deadlock on solidDB, Oracle, MS SQL, Postgress and a number of other dbs.  

In any case, we've already had to find a workaround for our needs.
[31 May 2008 13:11] Fadi Bitar
Ryan's comment pretty much covers it.
[3 Jun 2008 7:38] Vasil Dimov
Ryan, Fadi,

What I observe from the .doc (and I guess you too) is not a deadlock, it is a lock wait timeout - TRX2 waits for TRX1 to complete, when TRX1 commits or rollbacks then TRX2 will continue executing. Indeed a deadlock occurs if I execute:

  insert into field (docid,field_name,field_string) values (1,'fx','vx');

from TRX1 after TRX2 is hung, waiting for TRX1 to complete.

This is some rough explanation why it happens (next-key locking):

1.
  delete from field where docid=1;

from TRX1 locks the supremum record in the field table:
http://dev.mysql.com/doc/refman/5.1/en/innodb-locks-set.html
"DELETE FROM ... WHERE ... sets an exclusive next-key lock on every record the search encounters."
and the supremum record is next to docid=1.

2.
from
http://dev.mysql.com/doc/refman/5.1/en/innodb-next-key-locking.html
"If a user has a shared or exclusive lock on record R in an index, another user cannot insert a new index record immediately before R in the index order."
and R=supremum record in our case. This is why TRX2's insert is blocked - it is also trying to lock R=supremum record.

3.
It is now obvious why when TRX1 tries to insert it is rolled back with a deadlock error.

About the java program - I get:

java.sql.BatchUpdateException: Cannot add or update a child row: a foreign key constraint fails (`test`.`field`, CONSTRAINT `field_ibfk_1` FOREIGN KEY (`docid`) REFERENCES `document` (`docid`) ON DELETE CASCADE)

I will now investigate...
[3 Jun 2008 10:34] Vasil Dimov
Fadi, Ryan,

Look what I found:

after reproducing this FK error with the java programs you uploaded, SHOW INNODB STATUS looks like this:

--- cut ---
------------------------
LATEST FOREIGN KEY ERROR
------------------------
080603 11:53:52 Transaction:
TRANSACTION 110E, ACTIVE 0 sec, OS thread id 13661440 inserting, thread declared inside InnoDB 500
mysql tables in use 1, locked 1
3 lock struct(s), heap size 488, 1 row lock(s)
MySQL thread id 3, query id 78 localhost 127.0.0.1 root update
INSERT INTO field (docid,field_name,field_string) VALUES (5,'field5','value5')
Foreign key constraint fails for table `test`.`field`:
,
  CONSTRAINT `field_ibfk_1` FOREIGN KEY (`docid`) REFERENCES `document` (`docid`) ON DELETE CASCADE
Trying to add in child table, in index `PRIMARY` tuple:
DATA TUPLE: 5 fields;
 0: len 4; hex 00000005; asc     ;; 1: len 6; hex 6669656c6435; asc field5;; 2: len 6; hex 00000000110e; asc       ;; 3: len 7; hex 800000002d0168; asc     - h;; 4: len 6; hex 76616c756535; asc value5;;

But in parent table `test`.`document`, in index `PRIMARY`,
the closest match we can find is record:
PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 00000004; asc     ;; 1: len 6; hex 00000000110c; asc       ;; 2: len 7; hex 800000002d0110; asc     -  ;; 3: len 6; hex 5431446f6334; asc T1Doc4;;

------------------------
LATEST DETECTED DEADLOCK
------------------------
080603 11:53:52
*** (1) TRANSACTION:
TRANSACTION 110C, ACTIVE 0 sec, OS thread id 13661056 inserting
mysql tables in use 1, locked 1
LOCK WAIT 6 lock struct(s), heap size 2512, 6 row lock(s), undo log entries 4
MySQL thread id 2, query id 68 localhost 127.0.0.1 root update
INSERT INTO field (docid,field_name,field_string) VALUES (4,'field4','value4')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 51 n bits 88 index `PRIMARY` of table `test`.`field` trx id 110C lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) TRANSACTION:
TRANSACTION 110D, ACTIVE 0 sec, OS thread id 13661440 inserting, thread declared inside InnoDB 500
mysql tables in use 1, locked 1
5 lock struct(s), heap size 2512, 3 row lock(s), undo log entries 1
MySQL thread id 3, query id 70 localhost 127.0.0.1 root update
INSERT INTO field (docid,field_name,field_string) VALUES (5,'field1','value1')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 51 n bits 88 index `PRIMARY` of table `test`.`field` trx id 110D lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 51 n bits 88 index `PRIMARY` of table `test`.`field` trx id 110D lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** WE ROLL BACK TRANSACTION (2)
--- cut ---

(transaction ids are in hex here, instead of 2 decimal numbers, but it does not matter)

Indeed the FK error looks strange, because we have just inserted this row in the parent table and now it reports that it is missing.

Notice that there should be only one connection/transacion/thread trying to INSERT INTO field (docid=5), but there are 2: 110E reported in the FK error and 110D reported in the deadlock error. This is to show that the following java code:

 52           psfield.setInt(1, docid);     
 53           for(int k = 1; k <= 5; k++) { 
 54             psfield.setString(2, "field" + k);  
 55             psfield.setString(3, "value" + k);  
 56             psfield.addBatch();                 
 57           }                             
 58           psfield.executeBatch();       

has generated new connections to the database. What happens is:

1. 110D inserts the parent row in the document table (does not commit)
2. new connections are fired because of the above code, 110E one of them
3. 110D is rolled back because of a deadlock with 110C, the deadlock is described in my previous comment to this bug report and in the SHOW INNODB STATUS too, this deadlock is not a bug
4. the parent row in document table disappears because the transaction that created it is rolled back
5. 110E gets the FK error because indeed the parent row is absent

I did rewrite the above code to

 52           for(int k = 1; k <= 5; k++) {
 53             psfield.setInt(1, docid);
 54             psfield.setString(2, "field" + k);
 55             psfield.setString(3, "value" + k);
 56             psfield.executeUpdate();
 57           }
 58         } else {

and indeed then I do not get the FK error, but only the deadlock one.

To avoid this "FK error"/"new connections are created by batch" confusion
one could:

* avoid using batch, as above
* try to tune the jdbc driver not to fire new connections
* commit the row in the parent table before starting the batch

but beware that this prog will still be subject to a deadlock.

Hope this helps!
[3 Jun 2008 16:03] Mark Matthews
Hi Vasil, I'm the developer of the JDBC driver for MySQL, and wanted to add one comment:

> Notice that there should be only one connection/transacion/thread
> trying to INSERT INTO field (docid=5), but there are 2: 110E reported
> in the FK error and 110D reported in the deadlock error. This is to
> show that the following java code:
>
> 52           psfield.setInt(1, docid);     
> 53           for(int k = 1; k <= 5; k++) { 
> 54             psfield.setString(2, "field" + k);  
> 55             psfield.setString(3, "value" + k);  
> 56             psfield.addBatch();                 
> 57           }                             
> 58           psfield.executeBatch();     
>
> has generated new connections to the database. What happens is:

No, it hasn't, or at least it shouldn't, and I don't see how it would be possible, given that the only time and way the JDBC driver can generate new connections to the database "transparently" is if it is killing a query on behalf of the user, either via a method call or timeout expiration.

What this code *might* cause the JDBC driver to do is to prepare another statement. That shouldn't cause another transaction to be spun up though.

  -Mark
[4 Jun 2008 12:58] Ryan Murray
Hi Vasil. Thanks for the effort.  As I said before, as soon as we discovered the root cause was in the next-key locking, we've had to work around in another way since this fairly pessimistic strategy was going to cause us a problem even if the foreign key issue was resolved.  Let's say an option for more optimistic locking could be a feature request - as far as the bug goes, you can downgrade the criticality of the issue if you want as it's resolution wouldn't change anything in practice.
[4 Jun 2008 13:46] Vasil Dimov
Mark,

What has been observed is that Java code like this:

      PreparedStatement psfield = c.prepareStatement(
                      "INSERT INTO field (docid,field_name,field_string) VALUES (?,?,?)");
...
          psfield.setInt(1, docid);
          for(int k = 1; k <= 5; k++) {
            psfield.setString(2, "field" + k);
            psfield.setString(3, "value" + k);
            psfield.addBatch();
          }
          psfield.executeBatch();

results in at least 2 different transactions inside InnoDB (and 2 MySQL threads) executing the above INSERT INTO field (docid, ...) VALUES (1, ...). I am not sure what is the reason for this, but if the code is rewritten like in one of my previous comments, not to use batch then this does not happen.
[4 Jun 2008 14:05] Vasil Dimov
ooops, there are 2 different InnoDB transactions 110E and 110D but one MySQL thread - "mysql thread id 3".

This is to show that a new InnoDB transaction is started within the same MySQL thread, I suggest the following happens:

1. autocommit=0
2. InnoDB TRX 110D inserts the parent row
3. a batch of 5 inserts is executed in the application, no commit yet
4. 110D executes the first statement and is rolled back because of a deadlock
5. new transaction 110E is created (automatically by MySQL) that tries to continue executing the remaining 4 statements, it gets FK error.

I guess that the batch mode does not handle failing inserts. If an insert fails it should return an error and stop executing the rest.

Mark, is this correct?
[4 Jun 2008 14:13] Vasil Dimov
Fadi, Ryan,

A possible workaround to the deadlock problem is if you run DELETE in a separate transaction and handle duplicate key errors in the INSERT, i.e. repeat DELETE; COMMIT; until the INSERT does not return duplicate key error.
[4 Jun 2008 16:15] Mark Matthews
Vasil,

There's a conflict with the JDBC specification here, in that batches *can* be executed in full, and only statements that fail get rolled back. InnoDB has switched from time-to-time whether statements are rolled back, or whether the entire transaction rolls back on things like constraint violations.

Do you know if there's any way to tell from an error message/code from InnoDB which one is the case? Unfortunately I don't think there's a way, and if that's the case we'll have to document that you need to choose the more pessimistic batch execution mode which fails the entire batch on any individual statement failure.

  -Mark
[5 Jun 2008 13:26] Heikki Tuuri
HA_ERR_LOCK_DEADLOCK rolls back the entire transaction.
HA_ERR_LOCK_TABLE_FULL rolls back the entire transaction.

Starting from 5.0.13, HA_ERR_LOCK_WAIT_TIMEOUT lets MySQL just roll back the
latest SQL statement in a lock wait timeout. Previously, we rolled back the whole transaction.

All other errors just roll back the statement.
[24 Jul 2008 11:57] Vasil Dimov
This is not a bug. What happens is described in my comment from "[4 Jun 16:05]". Roughly it is the following:

(autocommit is off)

* application inserts a row in the parent table (does not commit)

* application executes a batch of 5 INSERTs into the child table, one of those INSERTs causes a deadlock with some other transaction and thus the whole transaction is rolled back, including the not-committed row in the parent table.

* batch continues to execute the remaining INSERTs (a new transaction is started automatically after the previous one is rolled back) but they give a FK error because the row in the parent table has gone (rolled back)