MySQL Bugs: #61213: ON DUPLICATE KEY UPDATE breaks generated key list when extended INSERT is used

Bug #61213	ON DUPLICATE KEY UPDATE breaks generated key list when extended INSERT is used
Submitted:	18 May 2011 9:49	Modified:	22 May 2014 17:30
Reporter:	Borislav Andruschuk	Email Updates:
Status:	Closed	Impact on me:	None
Category:	Connector / J	Severity:	S2 (Serious)
Version:	5.1.16	OS:	Any
Assigned to:	Alexander Soklakov	CPU Architecture:	Any
Tags:	extended insert, generated keys, java, jdbc, ON DUPLICATE KEY UPDATE

Description:
When extended INSERT (insert with several tuples in one query) is used with ON DUPLICATE KEY UPDATE clause and there're unique key conflict the Connector/J's Statement will return incorrect list of generated identifiers that cannot be matched with tuples from INSERT query at all (for more details see 'How to repeat' section). The only workaround is to use separate query to retrive generated identifiers to reduce count of Data Base access and I don't want to perform this extended insert in separate queries.

I look thru the code in com.mysql.jdbc.StatementImpl and find that you retrive count of affected rows and just increment the value of LAST_RETURN_ID for each update and get sequential list of identifiers. But in fact it can be nonsequential (see 'Suggested fix' section).

Mysql returns unexpected count of updated rows here because of this behavior http://dev.mysql.com/doc/refman/5.5/en/mysql-affected-rows.html:

For INSERT ... ON DUPLICATE KEY UPDATE statements, the affected-rows value is 1 if the row is inserted as a new row and 2 if an existing row is updated.

and the discussion about this problem is here http://bugs.mysql.com/bug.php?id=46675

How to repeat:
1. create a table with auto increment primary key: 

create table TEST_TABLE(
        id int auto_increment, 
        test_field varchar(255), 
        primary key (id), 
        unique key(test_field)
) engine=InnoDB;

2. insert the following row:

insert into TEST_TABLE(id, test_field) values(100, 'test1');

3. perform an extended insert with ON DUPLICATE KEY UPDATE clause into a table with auto increment primary key via JDBC driver:

insert into TABLE(test_field) values('test2'),('test1'),('test3') on duplicate key update id=LAST_INSERT_ID(id);

and check returned generated keys: 

at first, you'll get 4 values instead of 3
at second, there's no correct id for test1 row in returned list

Thus only for the first row returned id will be correct, if test1 will be first in a query whole list will be incorrect and cannot be matched with inserted rows.

Suggested fix:
Actually I expect that driver will return correct number of generated keys with correct values. For my case I expect the result set: 1, 100, 3 <--- 3 is here because of the 2nd value, or even 1, NULL, 3 will be normal. But current result: 1, 2, 3, 4 is very confused and cannot be matched with inserted rows.

This is more a "can't" fix, the database itself doesn't return the information in any way that's possible to solve this generically, that I can tell. What really needs to happen is that the server needs the protocol extended to return the *actual* keys involved, whether generated or not, for an INSERT. The only thing that is returned at a protocol level now is a "row count" (used loosely, because you have to divine what it means) and the *first* value.

Applications which have knowledge of the statement (i.e. the original keys), can craft SELECTs to figure 
out which rows have retained keys, but the driver itself doesn't really *parse* the statement and has no metadata available (easily) to determine what bound values (if any) are related to the primary (or unique) keys in the INSERT statement.

Ok, I have the workaround now: I select identifiers in second separate query. But does MySQL server team already have this issue - to extend protocol for returning the list of involved identifiers? I'd like to vote for it but I didn't find such corresponding reported issue.

We really can't fix this in a proper way without MySQL protocol change, but we can introduce some improvements in c/J behavior, please see Bug#68562.

The previous approach in a fix for Bug#68562, "Combination rewriteBatchedStatements and useAffectedRows not working as expected" introduced the next logic: return the value received from server as a result of each statement in this batch so user receive correct result if total affected_rows=0 and can make some suggestions if total affected_rows > 0.

We got some complaints and the last one states that we introduced a breaking change for Hibernate.

My proposal now is to return Statement.SUCCESS_NO_INFO in case when number of rewritten statements > 1 and returned value for them > 0. This is compliant with JDBC spec and Hibernate knows about such possible result.

Added the following entry into the Connector/J 5.1.31 changelog:

"When rewriteBatchedStatements=true and useAffectedRows=true were set, executeBatch() returned the number of affected rows as given by the server, which was not a reliable value unless it was a “0”. That behavior was due to the fix for Bug#68562, which was implemented in Connector/J 5.1.27 (see the changelog for the release), and has been breaking a number of applications that use Connector/J. In this new fix, Connector/J returns Statement.SUCCESS_NO_INFO as a result for each statement in the batch as long as there are more than one batched statement and the returned value for the affected rows from the server is greater than “0”. Connector/J returns a “0” otherwise, which indicates that no rows have been affected."

This change in behavior does break our application. We have strong checks on row counts returned from 'executeBatch' and all those assertions now break because we get back SUCCESS_NO_INFO instead.

Hi Diwaker,

We are sorry the change is causing your code to break but it was necessary because we had a bug in processing ON DUPLICATE KEY UPDATE with rewriteBatchedStatements=true.

Also it was never possible to have correct strong checks on row counts returned from 'executeBatch' if use with rewriteBatchedStatements=true because server provides only one total count for all statements in the batch, all previous versions of c/J returned wrong values in this case: with hardcoded "1" for each statement it wasn't possible to distinguish real inserts and updates, with "total count" for each statement we had a JDBC non compliance affecting frameworks. 

If you need exact counts for each statement in the batch then you should use rewriteBatchedStatements=false. Or if you need rewriteBatchedStatements=true then you shouldn't use strong checks, this is a server limitation we can't work around.