Bug #1355 LOAD DATA INFILE REPLACE has RowsAffected incorrectly calculated
Submitted: 19 Sep 2003 14:55 Modified: 25 Sep 2003 23:50
Reporter: Mike MySQLAB Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S2 (Serious)
Version:4.1.0 alpha-max-nt OS:Windows (Win2k)
Assigned to: CPU Architecture:Any

[19 Sep 2003 14:55] Mike MySQLAB
Description:
re: LOAD DATA INFILE REPLACE
The Rows Affected value is incorrect because it counts replaced rows twice! It counts the row when it is deleted and once again when the row is imported. This is very annoying when trying to verify the proper # of rows  imported. If I know the text file has 100 rows, and if the rows don't exist in the table then Rows Affected is 100 and is correct. If I repeat the same LOAD DATA INFILE REPLACE, it now has Rows Affected set to 200 because it had to delete 100 rows before adding the 100 rows. And of course if 10 of the 100 rows were missing and 90 already existed in the table, then Rows Affected would be set to 190. Counting existing rows twice makes little sense and hampers the developers ability to verify that the LOAD DATA INFILE REPLACE command has worked properly. Rows Affected should only count the # of rows that were updated or added.

MYISAM Table. I haven't tried it with InnoDb.

How to repeat:
Run LOAD DATA INFILE REPLACE ... twice. First on an empty table and then again on the full table. The RowsAffected should increase the second time it is run.

Suggested fix:
Don't count the deleted rows. 
RowsAffected should only count the # of rows that were updated or added.
[25 Sep 2003 23:50] Alexander Keremidarski
This is docummented behaviour of REPLACE syntax.

I can't agree that Affected is incorrctly calculated. 
Note that terms used in this message is Affected rows. Not Inserted or Replaced.

This makes message very clear as it gives ability to check how much rows were actualy inserted.

In example you give when inserting 100 rows and MySQL returns 190 Rows Affected this is clear indication that there were 90 Replaced rows and 10 Inserted rows.

Your proposal is that inserting 100 rows always returns 100 rows affected.
I don't see any reason with it as this mean reducing of above information.
In this case it will be impossible to say how much rows were newly inserted and how much were old and replaced.

If your application does not need this information it is easy to skip it. Other applications may rely on it.
[26 Sep 2003 8:56] Mike MySQLAB
:In example you give when inserting 100 rows and MySQL returns 190 Rows Affected
:this is clear indication that there were 90 Replaced rows and 10 Inserted rows.

You're assuming all rows were accepted. If the input file has 100 rows and RowsAffected returns 190 after the LOAD DATA INFILE is executed, how do I know how many rows were accepted and how many were rejected out of that 100? 

I would rather know the total # of rows that were accepted/rejected instead of inserted/updated.

If you want to keep the functionality of RowsAffected the same, then perhaps MySQL should have a "RowsSkipped" or "RowsRejected" property that tells us how many rows could NOT be added/updated. We can then check that property to determine if the LOAD went correctly. This property could also be used on UPDATE, DELETE and INSERT statements as well. If the row could not be updated, deleted or inserted (because of locks, RI, or duplicated index etc.), then the counter gets incremented by 1. If the counter is 0 after the SQL commmand has finished, then we know everything went ok. Otherwise we know we have 'n' rows that weren't updated properly and the program can then take the proper action if necessary.