MySQL Bugs: #68562: Combination rewriteBatchedStatements and useAffectedRows not working as expected

Bug #68562	Combination rewriteBatchedStatements and useAffectedRows not working as expected
Submitted:	4 Mar 2013 14:35	Modified:	8 Nov 2013 7:16
Reporter:	Pavel Cibulka	Email Updates:
Status:	Closed	Impact on me:	None
Category:	Connector / J	Severity:	S3 (Non-critical)
Version:	5.1.23	OS:	Any
Assigned to:	Alexander Soklakov	CPU Architecture:	Any
Tags:	ON DUPLICATE KEY, REWRITEBATCHEDSTATEMENTS, useAffectedRows

Description:
This bug is only in executeBatch function. This function should return number of row found (or affected if you set useAffectedRows=true). This works.

Problem is when you set rewriteBatchedStatements=true and useAffectedRows=true. It should now return number of affected (changed) rows, same as executeUpdate does. But it will return row found instead.

How to repeat:
Set:
rewriteBatchedStatements=true
useAffectedRows=true

Run query batch:
INSERT INTO table_name (id, more_data) VALUES(1, "data") ON DUPLICATE KEY UPDATE more_data=VALUES(more_data)

If data aren't changed (same data already in table), executeBatch should return array full of 0. But it returns array full of 1 (row found?).

Hello Pavel,

It's known server bug, in the case where you set a field's value to itself, then it is returning affected rows=1 (meaning insert) rather than 2 (meaning update). 

This problem appears to exist in all servers >= 5.5.16, server team work is in progress.

I'll mark your report as verified to track the problem.

Really this is a duplicate of Bug#61213. Server returns only one number - a sum of all affected rows (or found rows if useAffectedRows=false) for the whole rewritten statement, so it isn't possible for Connector/J to return correct number for each statement in batch.

Bug#61213 marked as "To be fixed later ", however we can do some steps to improve the situation a bit:

1) make an explicit note in servers documentation http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html about returning value for insert with several tuples in one query

2) change the Connector/J returning values for rewritten statements, now they are hardcoded to "1" which is completely useless; we can return the value received from server as a result of each statement in this batch so user receive correct result if total affected_rows=0 and can make some suggestions if total affected_rows > 0

3) make an explicit note in Connector/J documentation that rewriteBatchedStatements=true usage does not guarantee correct result with INSERT ... ON DUPLICATE statements

I don't mark this bug report as a duplicate of Bug#61213 to keep it active.

Hello Alexander,
your suggestion to improve this situation are very good.

Only thing I really need is make expression (affected_rows > 0) working. So I would know, if something has bean changed and I need to increase version id. Knowing exact number of changes seems to me like really corner case. It can be fixed later, if possible.

Added the following entry into the Connector/J 5.1.27 changelog:

"When rewriteBatchedStatements=true and useAffectedRows=true were set, executeBatch() did not return the number of affected rows as expected."

The new behavior confuses me.  Consider this test:

    public void testExecuteBatchCounts() throws Exception {
        Connection conn = null;
        try {
            conn = dataSource.getConnection();
            try (PreparedStatement stmt = conn.prepareStatement(
                    "CREATE TABLE t (" +
                    "x BIGINT NOT NULL" +
                    ", PRIMARY KEY (x))" +
                    " ENGINE=InnoDB")) { 
                stmt.execute();
            }   
            try (PreparedStatement stmt = conn.prepareStatement(
                    "INSERT INTO t (x) VALUES (?)")) {
                for (int i = 0; i < 3; ++i) {
                    stmt.setLong(1, i);
                    stmt.addBatch();
                }   
                int[] rowCounts = stmt.executeBatch();
                for (int i = 0; i < rowCounts.length; ++i) {
                    logger.info("row[{}] = {}", i, rowCounts[i]);
                }   
            }   
            conn.commit();
        } catch (SQLException sqle) {
            conn.rollback();
            throw sqle;
        } finally {
            conn.close();
        }   
    }

With 5.1.26 logs contain:

I 11-07 14:59:25.185 pool-1-thread-1 c.m.r.m.dbfs.DatabaseTest:639 ::] row[0] = 1
I 11-07 14:59:25.185 pool-1-thread-1 c.m.r.m.dbfs.DatabaseTest:639 ::] row[1] = 1
I 11-07 14:59:25.186 pool-1-thread-1 c.m.r.m.dbfs.DatabaseTest:639 ::] row[2] = 1

With 5.1.27 logs contain:

I 11-07 15:00:04.682 pool-1-thread-1 c.m.r.m.dbfs.DatabaseTest:639 ::] row[0] = 3
I 11-07 15:00:04.683 pool-1-thread-1 c.m.r.m.dbfs.DatabaseTest:639 ::] row[1] = 3
I 11-07 15:00:04.683 pool-1-thread-1 c.m.r.m.dbfs.DatabaseTest:639 ::] row[2] = 3

Returning {3, 3, 3} implies that we updated 9 rows, not 3.  The original behavior returning {1, 1, 1} matches my intuition, although understood that these were previously bogus values.  To avoid confusion, perhaps Connector/J should return {1, 1, 1}, {3, 0, 0}, or {3}?

This is not possible right now. If 2 rows out of 3 would be changed, connector doesn't know which one. Current return is better than last one, which was just a constant.

Only other option is to return {0,0,0,0, ...} if return is 0 and {1,1,1,1, ...} if return is > 0.

Hi Andrew,

As I told this is a kind of approximation, the only correct rowcount you can get with rewriteBatchedStatements=true is zero.

But there is another point. With rewriteBatchedStatements=true driver rewrite statements only if it's possibl, say when all batched statements are identical. Application isn't aware of whether certain batch was rewritten or not. And if batch wasn't actually rewritten then driver returns correct rowcounts for each statement in batch.

So we could go further and give a clue to application:
1) if batch wasn't rewritten then return actual values for each batched statement;
2) if batch was rewritten and result is 0 then return 0s for each batched statement;
3) if batch was rewritten and result isn't 0 then return SUCCESS_NO_INFO for each batched statement (JDBC API: "A value of SUCCESS_NO_INFO -- indicates that the command was processed successfully but that the number of rows affected is unknown");

Please reopen this bug report or create the new one if you think we should do this change.

This is a fairly large breaking change that effects a lot of existing code which expects the counts to be "1"

Would it be possible go back to the previous behavior by default and add a config option to use the new interpretation? 

Seems like a lot of code is being broken for little gain.

Also this doesn't return a "1" if "useAffectedRows" is unset (default is false). Should that be the case?

Hi Cody,

You wrote:
"This is a breaking change for Hibernate which is a stable/mature and
widely deployed ORM. Even if they were to modify their code to
accommodate this change it would be years before people are able to
get upgrade to a version which supports batching due to this change.

I would strongly consider either reverting the behavior or allowing a
config property to return "1" which was the previous behavior so that
the change isn't so impactful.

I can put you in contact with someone on the Hibernate team if you
wanted to discuss the change with them as well."

I see no sense to propose such changes to Hibernate because our functionality here isn't JDBC compliant and tends to stay poor implemented because server improvements of batched statements move in different direction.

However I guess we must change our output here to 0 - with no changes and SUCCESS_NO_INFO in other case, this is a known value for Hibernate and is a JDBC compliant.

Lets proceed in Bug#61213 for this fix.

Bug#72246 was marked as duplicate of this one.

Edited the entry in the 5.1.27 changelog to clarify on what had happened with this old fix:

"When rewriteBatchedStatements=true and useAffectedRows=true were set, executeBatch() did not return the number of affected rows as expected, but always returned a “1,” even if no rows were affected (for example, when an ON DUPLICATE KEY UPDATE clause was applicable but the data needed no changes). The “1” was actually a hardwired value, a workaround for a known issue on the server side, which does not return the number of affected rows in such cases. This fix makes Connector/J just return the value given by the server anyway, which means the returned number of affected rows is reliable now if Connector/J returns a “0” [no rows affected]; non-zero values returned should not be trusted though."

Note that the behavior has been changed in release 5.1.31. See comments in the ticket for Bug 61213.