Description:
I believe this is a documentation bug. In the LOAD DATA statement, LOCAL and REPLACE work together to allow replacing existing rows with duplicate unique keys. The documentation is unclear about this, and implies they are exclusive. Here are some areas it could be clarified:
https://dev.mysql.com/doc/refman/8.0/en/load-data.html#load-data-local
"It has the same effect as the IGNORE modifier on the interpretation of input file contents and error handling; see Duplicate-Key and Error Handling, and Column Value Assignment."
This would be true only if REPLACE is not specified.
https://dev.mysql.com/doc/refman/8.0/en/load-data.html#load-data-error-handling
"The LOCAL modifier has the same effect as IGNORE. This occurs because the server has no way to stop transmission of the file in the middle of the operation."
This is not quite true because IGNORE cannot be used together with REPLACE, but LOCAL and REPLACE can be used together, and the behavior described is not true if REPLACE is also specified.
"With IGNORE or LOCAL, data-interpretation errors become warnings and the load operation continues, even if the SQL mode is restrictive. For examples, see Column Value Assignment."
This also is not true if LOCAL is used with REPLACE, in which case data interpretation errors are reported as errors and cause the load to be rolled back.
How to repeat:
To test the behavior, create a table with a unique key and load a file containing duplicates on the unique key. Example:
File contents "data":
---
a,1
b,2
c,3
---
create table test (id varchar(10) primary key, value int);
insert into test values ('b', 0);
load data local infile '/data' replace into table test fields terminated by ',' (id, value);
Query OK, 4 rows affected (0.01 sec)
Records: 3 Deleted: 1 Skipped: 0 Warnings: 0
select * from test;
+----+-------+
| id | value |
+----+-------+
| a | 1 |
| b | 2 |
| c | 3 |
+----+-------+
This shows replacement of the existing row "b".
To test the data interpretation behavior, consider a file with incorrect data types:
File contents "data.bad":
---
a,1
b,bad
c,3
---
truncate test;
load data local infile '/data.bad' replace into table test fields terminated by ',' (id, value);
ERROR 1366 (HY000): Incorrect integer value: 'bad' for column 'value' at row 2
select * from test;
Empty set (0.00 sec)
The entire load is rolled back due to the mismatched data type. Row "a", which appears before the error in the input, is not present.
For contrast, without REPLACE, the behavior is as described; namely, duplicate rows are skipped and data interpretation errors are reported as warnings:
truncate test;
insert into test values ('a', 7);
load data local infile '/data.bad' into table test fields terminated by ',' (id, value);
Query OK, 2 rows affected, 2 warnings (0.00 sec)
Records: 3 Deleted: 0 Skipped: 1 Warnings: 2
select * from test;
+----+-------+
| id | value |
+----+-------+
| a | 7 |
| b | 0 |
| c | 3 |
+----+-------+
Duplicate row "a" is skipped, the bad value in row "b" is defaulted to 0, and the load continues with row "c". Without REPLACE, the behavior is as documented.
Suggested fix:
Update the cited documentation sections to make it clear REPLACE can be used with LOCAL, with the behavior:
* Rows duplicating a unique key will be replaced via delete and insert.
* Data interpretation is restrictive and invalid data will result in errors which cause the entire load to be rolled back.
Better yet, as all the permutations of local, non-local, replace, and ignore result in many different behaviors for duplicate handling and data interpretation, create a table that clearly shows the behavior of each permutation. The existing explanations are too verbose and a table would be clearer I believe.