Bug #71507 innodb is excessively slow to replace duplicate values in a table
Submitted: 29 Jan 2014 12:51
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[29 Jan 2014 12:51] Shane Bester
Description:
Affects all versions of InnoDB (4.1+). Replacing duplicates in innodb is too slow. This is one place MyISAM is superior.  MyISAM took 0.05 seconds.  

The testcase is not extravagant, it replaces 5000 tinyint into a unique index. InnoDB timings:

4.1.25: 44.82 sec
5.0.96: 1 min 40.59 sec
5.1.73: 2 min 7.91 sec  [builtin]
5.5.35: 1 min 33.97 sec
5.6.17: 57.93 sec
5.7.4:  1 min 3.84 sec

How to repeat:
drop table if exists `t1`,`t2`,`seed`;
create table `t1`(`id` int auto_increment primary key,`a` tinyint,unique key(`a`))engine=innodb;
create table `t2`(`id` int auto_increment primary key,`a` tinyint,unique key(`a`))engine=myisam;
create table `seed` (id int auto_increment primary key,a tinyint)engine=myisam;

insert into `seed`(`a`) values (7),(7),(7),(7),(7),(7),(7),(7);
insert into `seed`(`a`) select `seed`.`a` from `seed`, `seed` `t2`, `seed` `t3`,`seed` `t4`,`seed` `t5` limit 5000;

replace into `t1`(`a`) select `a` from `seed`; #slow
replace into `t2`(`a`) select `a` from `seed`; #fast

select version();

Suggested fix:
FR: please improve speed :)
[29 Jan 2014 13:01] MySQL Verification Team
5.7.4 overview of hotspots.

Attachment: bug71507_vtune_hotpots_overview.png (image/png, text), 93.73 KiB.

[16 Jun 2014 13:54] MySQL Verification Team
current mysql-trunk speed (datadir on ramdisk):

mysql>
mysql> replace into `t1`(`a`) select `a` from `seed`; #slow
Query OK, 10015 rows affected (57.01 sec)
Records: 5008  Duplicates: 5007  Warnings: 0

mysql> replace into `t2`(`a`) select `a` from `seed`; #fast
Query OK, 10015 rows affected (0.04 sec)
Records: 5008  Duplicates: 5007  Warnings: 0

mysql>
mysql> select version();
+-----------+
| version() |
+-----------+
| 5.7.5-m15 |
+-----------+
1 row in set (0.00 sec)
[29 Sep 2014 11:30] Marko Mäkelä
Posted by developer:
 
The duplicate key error handling in InnoDB relies on the undo log and statement rollback.

The test case is updating the PRIMARY KEY to a constant value 7.
InnoDB will map UPDATE of a key to a DELETE and INSERT.
DELETE will be like an update that updates the delete-mark bit.
Actual deletion would be performed by the purge subsystem.

I believe that we are doing the following:

1. Lock the record.
2. Write update_undo log for delete-marking the record.
3. Write insert_undo log for inserting the record.
4. Attempt to insert the new record.
5. Notice the duplicate.
6. Roll back the undo log written in steps 3 and 2.

We could probably do a shortcut, while keeping the same file format:

1. Lock the record.
2. Check if the new record can be inserted.
3. Write update_undo log for delete-marking the record.
4. Delete-mark the old record.
5. Write insert_undo log for inserting the record.
6. Insert the record.

The duplicate would be detected already at step 2. This would avoid any rollback in this case.
[3 May 2016 13:36] MySQL Verification Team
current mysql-trunk (5.8.0) takes:

mysql> replace into `t1`(`a`) select `a` from `seed`; #slow
Query OK, 10015 rows affected (1 min 23.15 sec)
Records: 5008  Duplicates: 5007  Warnings: 0
[27 Aug 2016 17:24] MySQL Verification Team
Current situation:
-------------------
Version: '8.0.1-dmr'  socket: ''  port: 3306  (Built on 2016/08/18)

mysql> replace into `t1`(`a`) select `a` from `seed`; #slow
Query OK, 10015 rows affected (1 min 26.11 sec)
Records: 5008  Duplicates: 5007  Warnings: 0
[17 Sep 2016 18:07] Mark Callaghan
I have been curious about a possibly related case. Performance for insert ... on duplicate key update when most rows will get the on dup key error. Hopefully we will have perf results to share soon.
[12 Oct 2016 13:33] Justin Swanhart
@Mark - a simple demonstration of this is trying to create a list of unique URL from the wikipedia logs. You pretty much can't do it.  I've always had  construct a bloom filter in front of it to avoid inserting values already in the table, and just taking the hit when there is a false negative.
[16 Jan 2018 11:09] MySQL Verification Team
5.7.21 results
..............

mysql>
mysql> replace into `t1`(`a`) select `a` from `seed`; #slow
Query OK, 10015 rows affected (1 min 34.75 sec)
Records: 5008  Duplicates: 5007  Warnings: 0

mysql> replace into `t2`(`a`) select `a` from `seed`; #fast
Query OK, 10015 rows affected (0.07 sec)
Records: 5008  Duplicates: 5007  Warnings: 0

mysql>
mysql> select version();
+-----------+
| version() |
+-----------+
| 5.7.21    |
+-----------+
1 row in set (0.00 sec)
[14 Nov 2019 20:47] MySQL Verification Team
On a different disk (SSD) on linux,  some new results:

5.7.28:
--------
mysql> 
mysql> replace into `t1`(`a`) select `a` from `seed`; #slow
Query OK, 10015 rows affected (1 min 10.16 sec)
Records: 5008  Duplicates: 5007  Warnings: 0

mysql> replace into `t2`(`a`) select `a` from `seed`; #fast
Query OK, 10015 rows affected (0.03 sec)
Records: 5008  Duplicates: 5007  Warnings: 0

mysql> 
mysql> select version();
+-----------+
| version() |
+-----------+
| 5.7.28    |
+-----------+
1 row in set (0.00 sec)
--------------------

8.0.18:
--------------

mysql> 
mysql> replace into `t1`(`a`) select `a` from `seed`; #slow
Query OK, 10015 rows affected (1 min 34.85 sec)
Records: 5008  Duplicates: 5007  Warnings: 0

mysql> replace into `t2`(`a`) select `a` from `seed`; #fast
Query OK, 10015 rows affected (0.05 sec)
Records: 5008  Duplicates: 5007  Warnings: 0

mysql> 
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.18    |
+-----------+
1 row in set (0.00 sec)

------------
Both servers run with:

./bin/mysqld --no-defaults --basedir=. --datadir=./data \
--innodb-flush-log-at-trx-commit=0 --innodb-buffer-pool-size=4G \
--innodb-buffer-pool-instances=3 --core-file --innodb-lock-wait-timeout=1 --lock-wait-timeout=1  --innodb_flush_method=O_DIRECT_NO_FSYNC \
--innodb-purge-threads=4 --innodb_flush_neighbors=0 --skip-log-bin
[21 Jul 2020 7:31] MySQL Verification Team
8.0.21:
----------
mysql> replace into `t1`(`a`) select `a` from `seed`; #slow
Query OK, 10015 rows affected (1 min 22.34 sec)
Records: 5008  Duplicates: 5007  Warnings: 0

mysql> replace into `t2`(`a`) select `a` from `seed`; #fast
Query OK, 10015 rows affected (0.03 sec)
Records: 5008  Duplicates: 5007  Warnings: 0

mysql> 
mysql> select version();
+-------------------+
| version()         |
+-------------------+
| 8.0.21-commercial |
+-------------------+
1 row in set (0.00 sec)

=================
I also tried:  alter instance disable INNODB REDO_LOG;

not much difference:

mysql> replace into `t1`(`a`) select `a` from `seed`; #slow
Query OK, 10015 rows affected (1 min 19.40 sec)
Records: 5008  Duplicates: 5007  Warnings: 0

mysql> replace into `t2`(`a`) select `a` from `seed`; #fast
Query OK, 10015 rows affected (0.03 sec)
Records: 5008  Duplicates: 5007  Warnings: 0