Bug #10886 INSERT ... SELECT ... ON DUPLICATE KEY UPDATE produces bad results
Submitted: 26 May 2005 14:59 Modified: 22 Jun 2005 2:36
Reporter: Mark Modrall Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:4.1.12 OS:Linux (Linux)
Assigned to: Evgeny Potemkin CPU Architecture:Any

[26 May 2005 14:59] Mark Modrall
Description:
I was trying to use ON DUPLICATE KEY UPDATE to calculate and save the results from a standard SELECT x, count(x) as hits FROM y GROUP BY x;.  In other words, I wanted to persist that into a table and to be able to accrete results from other tables/times onto that result.

I figured if I made a table with an x as a unique key and had ON DUPLICATE KEY UPDATE hits=hits+1 that I could keep a persistent running total (kind of like a group by that keeps on giving).

I found that if I define the holding table with hits INT DEFAULT 1 and leave hits out of the select, the total numbers produced by the insert are wildly wrong.  Maybe 3% of the resulting rows have accurate counts by 97% are off by hundreds or thousands.

How to repeat:
Produces bad results:
CREATE TABLE totals (query TEXT NOT NULL, UNIQUE (query(330)), hits INT DEFAULT 1) CHARACTER SET UTF8;

INSERT INTO totals (query)  SELECT query from raw_data where whatever=whatever ON DUPLICATE KEY UPDATE hits=hits+1;

SELECT query, hits FROM totals ORDER BY hits desc;

------- Compared to -----
SELECT * FROM (
    SELECT query, count(query) as hits FROM (
         SELECT query FROM raw_data WHERE whatever=whatever) a
    GROUP BY query ) b
ORDER BY hits desc;

---------------------------
These result sets should be equivalent but the ON DUPLICATE KEY UPDATE one is wildly off.

Suggested fix:
The work around(s) I found were to not use DEFAULT initialization on hits and to explicily include it in the SELECT statement e.g.

INSERT INTO totals SELECT query, 1 FROM raw_data WHERE whatever=whatever ON DUPLICATE KEY UPDATE hits=hits+1;

or (tried to see if it helped performance; didn't seem to make a huge difference)

INSERT INTO totals SELECT query, h FROM
   (SELECT query, count(query) as h FROM
      (SELECT query FROM raw_data WHERE whatever=whatever) a 
   GROUP BY query) b
ON DUPLICATE KEY UPDATE totals.hits=totals.hits+VALUES(hits);
[28 May 2005 15:15] Jorge del Conde
Hi,

Thanks for your bug report.  I was able to reproduce this behaviour using 4.1.13 from bk:

mysql> CREATE TABLE totals (query TEXT NOT NULL, UNIQUE (query(330)), hits INT DEFAULT
    -> 1) CHARACTER SET UTF8;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE raw_data (query TEXT NOT NULL, UNIQUE (query(330)), whatever INT DEFAULT 1) CHARACTER SET UTF8; 
Query OK, 0 rows affected (0.00 sec)

mysql> insert into raw_data values ('test1', 1);                                                         Query OK, 1 row affected (0.00 sec)

mysql> insert into raw_data values ('test2', 2);                                                         Query OK, 1 row affected (0.00 sec)

mysql> insert into raw_data values ('test2', 2);                                                         Query OK, 1 row affected (0.00 sec)

mysql> insert into raw_data values ('test3', 3);                                                         Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM (     SELECT query, count(query) as hits FROM (          SELECT query FROM raw_data WHERE whatever=whatever) a     GROUP BY query ) b ORDER BY hits desc;
+-------+------+
| query | hits |
+-------+------+
| test2 |    2 |
| test1 |    1 |
| test3 |    1 |
+-------+------+
3 rows in set (0.00 sec)

mysql> INSERT INTO totals (query)  SELECT query from raw_data where whatever=whatever ON DUPLICATE KEY UPDATE hits=hits+1;
Query OK, 5 rows affected (0.00 sec)
Records: 4  Duplicates: 1  Warnings: 0

mysql> select * from totals;
+-------+------+
| query | hits |
+-------+------+
| test1 |    1 |
| test2 |    2 |
| test3 |    2 |
+-------+------+
3 rows in set (0.01 sec)
[15 Jun 2005 14:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/26026
[15 Jun 2005 14:35] Evgeny Potemkin
Temporary field wasn't restored to default values after ON DUPLICATE KEY
  UPDATE event, which results in wrong data being inserted in new record.
[15 Jun 2005 16:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/26030
[21 Jun 2005 14:26] Evgeny Potemkin
Fixed in 4.1.13, cset 1.2293
[22 Jun 2005 2:36] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented bugfix in 4.1.13 Change History; marked closed.
[22 Jun 2005 13:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/26308