Bug #76551 Query hangs on INSERT INTO...SELECT... ON DUPLICATE KEY UPDATE
Submitted: 31 Mar 2015 22:31 Modified: 1 Apr 2015 17:15
Reporter: Tianshi Wang Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:5.6 OS:Linux (centos)
Assigned to: CPU Architecture:Any
Tags: hang, ON DUPLICATE KEY UPDATE

[31 Mar 2015 22:31] Tianshi Wang
Description:
When I insert into a target table using INSERT INTO...SELECT... ON DUPLICATE KEY UPDATE, the query would hang if the source table has the same record on unique key column over a million rows.

How to repeat:
-- Create two tables, source and target. We will select from source table then insert into target table.
create table target (
id int unsigned NOT NULL AUTO_INCREMENT primary key COMMENT 'Primary key\n ',
unqkey varchar(100) not null,
rvalue varchar(100) not null,
created timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
unique(unqkey)
) engine=innodb default charset=utf8;

create table source (
id int NOT NULL AUTO_INCREMENT primary key,
unqkey varchar(100) not null,
rvalue varchar(100) not null
) engine=innodb default charset=utf8;

-- use a script to loop the following insert query two million times. 
-- So the source table has two million records.
insert into source (unqkey, rvalue) select 'UNKNOWN', SUBSTRING(MD5(RAND()) FROM 1 FOR 10);

mysql> select * from source limit 5;
+----+---------+------------+
| id | unqkey  | rvalue     |
+----+---------+------------+
|  1 | UNKNOWN | a769905ce5 |
|  2 | UNKNOWN | 7d2107c2f0 |
|  3 | UNKNOWN | 7ee879bf4a |
|  4 | UNKNOWN | 0f41913474 |
|  5 | UNKNOWN | b628d62597 |
+----+---------+------------+
5 rows in set (0.00 sec)

-- Run the following query. This query will hang
INSERT INTO target (unqkey, rvalue)
SELECT unqkey, rvalue
FROM source temp 
ON DUPLICATE KEY UPDATE unqkey = temp.unqkey,
rvalue = temp.rvalue,
created = now();

-- I have reduced the source table to 100k rows. It still hangs.

mysql> show full processlist;
+------+-------------+-----------+--------+---------+----------+----------------------------------+-------------------------------------+
| Id   | User        | Host      | db     | Command | Time     | State                            | Info                                |
+------+-------------+-----------+--------+---------+----------+----------------------------------+-------------------------------------+
| 6480 | root        | localhost | mytest | Query   |    28222 | Sending data                     | INSERT INTO target (unqkey, rvalue)
SELECT unqkey, rvalue
FROM source temp
ON DUPLICATE KEY UPDATE unqkey = temp.unqkey,
rvalue = temp.rvalue,
created = now() |
| 6481 | root        | localhost | mytest | Query   |        0 | init                             | show full processlist                |
+------+-------------+-----------+--------+---------+----------+----------------------------------+--------------------------------------+

-- kill the insert statement, there is no rollback process. 

mysql> kill 6480;
Query OK, 0 rows affected (0.00 sec)

mysql> show full processlist;
+------+-------------+-----------+--------+---------+----------+----------------------------------+-----------------------+
| Id   | User        | Host      | db     | Command | Time     | State                            | Info                  |
+------+-------------+-----------+--------+---------+----------+----------------------------------+-----------------------+
| 6481 | root        | localhost | mytest | Query   |        0 | init                             | show full processlist |
+------+-------------+-----------+--------+---------+----------+----------------------------------+-----------------------+
2 rows in set (0.00 sec)

mysql> select * from target;
Empty set (0.00 sec)

Suggested fix:
None.
[1 Apr 2015 5:31] MySQL Verification Team
My testcase:
------------
drop table if exists t2,t1;
create table t2 (id serial,a varchar(100),b varchar(100),created timestamp,unique(a)) engine=innodb default charset=utf8;
create table t1 (id serial,a varchar(100),b varchar(100)) engine=innodb default charset=utf8;
insert into t1 (a, b) values('a', rand()),('a', rand()),('a', rand()),('a', rand()),('a', rand());
insert into t1 (a, b) select 'a', rand() from t1 a,t1 t1,t1 t2,t1 t3,t1 t4,t1 t5;
select count(*)from t1;
insert into t2 (a, b) select a, b from t1 q on duplicate key update a = q.a, b = q.b,created = now();
--------

Do you think this might be the same problem as described in
http://bugs.mysql.com/bug.php?id=71507 ?
[1 Apr 2015 5:33] MySQL Verification Team
in that bug report I had concluded it takes about 1 minute to replace 5000 duplicate rows in innodb.
[1 Apr 2015 14:26] Tianshi Wang
Hi Shane, 

Thanks for quickly reply!

I have noticed that the small batches dataset would get through. Nonetheless, the large batch take much longer than 5000 records/min.  In other word, the performance is not linear when working with large batches. In my 1 million record case, I have to kill it after 8 hour overnight run.

Other than the large batch test we had, I think this is the same bug as you had.
http://bugs.mysql.com/bug.php?id=71507
[1 Apr 2015 17:15] Tianshi Wang
Here is what I observed. Apparently the performance degrades exponentially while the source data increases.

Source table records vs the time completed.
Rows	Time (seconds)
1K :	1.87
2K : 	5.30
5K :	31.16
10K:    132.24
20K:    472
[16 Jan 2016 13:44] Zach Brown
I think we are hitting this same issue on 5.6. I'm not totally convinced this is a duplicate of that other bug.

We are doing insert/select/on duplicate key update with 8+ million rows with a similar key setup. We've let it run over 13 hours and never had it complete.

What is then curious about this is that if we run just the select portion of the insert/select it completes in a reasonable amount of time. (I assume it's priming the cache at this point maybe?)

Then, if we then re-run the exact same insert/select/on duplicate key update it will complete in a matter of minutes.
[16 Jan 2016 20:29] Zach Brown
Upgrading to 5.7 resolves the issue.
[16 Jan 2016 20:31] Zach Brown
I should extrapolate... 

Upgrading to: mysqld  Ver 5.7.10 for Linux on x86_64 (MySQL Community Server (GPL)) resolved the issue.