Bug #70202 REPLACE working differently for a table having PK or UK only versus both PK+UK
Submitted: 30 Aug 2013 15:36 Modified: 2 Sep 2013 11:51
Reporter: Przemyslaw Malkowski Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:5.5, 5.6 OS:Any
Assigned to: CPU Architecture:Any
Tags: innodb, REPLACE

[30 Aug 2013 15:36] Przemyslaw Malkowski
Description:
Seems like InnoDB engine has an optimization that allows using update logic when replacing existing unique values for a row instead of delete+insert. Unfortunately this optimization does not work for table having both primary and unique keys. It also does not work for MyISAM tables. 
As a result you may expect significant performance regression for large LOAD DATA INFILE ... REPLACE INTO TABLE.., statements if you add second unique key to a table.

How to repeat:
Note the "x rows affected" for REPLACE statement in each of following examples:

CREATE TABLE `uk_pk2` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > insert into uk_pk2 values (1),(2),(3);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql [localhost] {msandbox} (test) > select * from uk_pk2;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.00 sec)

mysql [localhost] {msandbox} (test) > replace into uk_pk2 values (1);
Query OK, 2 rows affected (0.00 sec)

------------------------------------------------------------------------

mysql [localhost] {msandbox} (test) > CREATE TABLE pk2 (id int primary key);
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {msandbox} (test) > insert into pk2 values (1),(2),(3);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql [localhost] {msandbox} (test) > select * from uk_pk2;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.00 sec)

mysql [localhost] {msandbox} (test) > replace into pk2 values (1);
Query OK, 1 row affected (0.00 sec)

------------------------------------------------------------------------

mysql [localhost] {msandbox} (test) > CREATE TABLE uk2 (id int, unique key(id));
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {msandbox} (test) > insert into uk2 values (1),(2),(3);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql [localhost] {msandbox} (test) > replace into uk2 values (1);
Query OK, 1 row affected (0.01 sec)

Suggested fix:
If it's possible - make this optimization work also for tables having two unique keys. 
Otherwise, the documentation needs clarifying. In http://dev.mysql.com/doc/refman/5.6/en/replace.html we can see:

"It is possible that in the case of a duplicate-key error, a storage engine may perform the REPLACE as an update rather than a delete plus insert, but the semantics are the same. There are no user-visible effects other than a possible difference in how the storage engine increments Handler_xxx status variables."

While this is not true - there is significant performance difference! I get much worse results for LOAD DATA INFILE ... REPLACE when two unique keys are present.
[2 Sep 2013 11:51] Umesh Shastry
Hello Przemyslaw,

Thank you for the bug report and the test case. 
Verified as described.

Thanks,
Umesh