Bug #10886 INSERT ... SELECT ... ON DUPLICATE KEY UPDATE produces bad results
Submitted: 26 May 2005 16:59 Modified: 22 Jun 2005 4:36
Reporter: Mark Modrall
Status: Closed
Category:Server: Optimizer Severity:S2 (Serious)
Version:4.1.12 OS:Linux (Linux)
Assigned to: Evgeny Potemkin Target Version:

[26 May 2005 16:59] Mark Modrall
Description:
I was trying to use ON DUPLICATE KEY UPDATE to calculate and save the results from a
standard SELECT x, count(x) as hits FROM y GROUP BY x;.  In other words, I wanted to
persist that into a table and to be able to accrete results from other tables/times onto
that result.

I figured if I made a table with an x as a unique key and had ON DUPLICATE KEY UPDATE
hits=hits+1 that I could keep a persistent running total (kind of like a group by that
keeps on giving).

I found that if I define the holding table with hits INT DEFAULT 1 and leave hits out of
the select, the total numbers produced by the insert are wildly wrong.  Maybe 3% of the
resulting rows have accurate counts by 97% are off by hundreds or thousands.

How to repeat:
Produces bad results:
CREATE TABLE totals (query TEXT NOT NULL, UNIQUE (query(330)), hits INT DEFAULT 1)
CHARACTER SET UTF8;

INSERT INTO totals (query)  SELECT query from raw_data where whatever=whatever ON
DUPLICATE KEY UPDATE hits=hits+1;

SELECT query, hits FROM totals ORDER BY hits desc;

------- Compared to -----
SELECT * FROM (
    SELECT query, count(query) as hits FROM (
         SELECT query FROM raw_data WHERE whatever=whatever) a
    GROUP BY query ) b
ORDER BY hits desc;

---------------------------
These result sets should be equivalent but the ON DUPLICATE KEY UPDATE one is wildly off.

Suggested fix:
The work around(s) I found were to not use DEFAULT initialization on hits and to
explicily include it in the SELECT statement e.g.

INSERT INTO totals SELECT query, 1 FROM raw_data WHERE whatever=whatever ON DUPLICATE KEY
UPDATE hits=hits+1;

or (tried to see if it helped performance; didn't seem to make a huge difference)

INSERT INTO totals SELECT query, h FROM
   (SELECT query, count(query) as h FROM
      (SELECT query FROM raw_data WHERE whatever=whatever) a 
   GROUP BY query) b
ON DUPLICATE KEY UPDATE totals.hits=totals.hits+VALUES(hits);
[28 May 2005 17:15] Jorge del Conde
Hi,

Thanks for your bug report.  I was able to reproduce this behaviour using 4.1.13 from
bk:

mysql> CREATE TABLE totals (query TEXT NOT NULL, UNIQUE (query(330)), hits INT DEFAULT
    -> 1) CHARACTER SET UTF8;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE raw_data (query TEXT NOT NULL, UNIQUE (query(330)), whatever INT
DEFAULT 1) CHARACTER SET UTF8; 
Query OK, 0 rows affected (0.00 sec)

mysql> insert into raw_data values ('test1', 1);                                         
               Query OK, 1 row affected (0.00 sec)

mysql> insert into raw_data values ('test2', 2);                                         
               Query OK, 1 row affected (0.00 sec)

mysql> insert into raw_data values ('test2', 2);                                         
               Query OK, 1 row affected (0.00 sec)

mysql> insert into raw_data values ('test3', 3);                                         
               Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM (     SELECT query, count(query) as hits FROM (          SELECT
query FROM raw_data WHERE whatever=whatever) a     GROUP BY query ) b ORDER BY hits
desc;
+-------+------+
| query | hits |
+-------+------+
| test2 |    2 |
| test1 |    1 |
| test3 |    1 |
+-------+------+
3 rows in set (0.00 sec)

mysql> INSERT INTO totals (query)  SELECT query from raw_data where whatever=whatever ON
DUPLICATE KEY UPDATE hits=hits+1;
Query OK, 5 rows affected (0.00 sec)
Records: 4  Duplicates: 1  Warnings: 0

mysql> select * from totals;
+-------+------+
| query | hits |
+-------+------+
| test1 |    1 |
| test2 |    2 |
| test3 |    2 |
+-------+------+
3 rows in set (0.01 sec)
[15 Jun 2005 16:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/26026
[15 Jun 2005 16:35] Evgeny Potemkin
Temporary field wasn't restored to default values after ON DUPLICATE KEY
  UPDATE event, which results in wrong data being inserted in new record.
[15 Jun 2005 18:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/26030
[21 Jun 2005 16:26] Evgeny Potemkin
Fixed in 4.1.13, cset 1.2293
[22 Jun 2005 4:36] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented bugfix in 4.1.13 Change History; marked closed.
[22 Jun 2005 15:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/26308