MySQL Bugs: #19243: wrong LAST_INSERT_ID() after ON DUPLICATE KEY UPDATE

Bug #19243	wrong LAST_INSERT_ID() after ON DUPLICATE KEY UPDATE
Submitted:	21 Apr 2006 4:45	Modified:	13 Sep 2006 17:45
Reporter:	Nathan Cheng	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.0.21-BK, 4.1.7	OS:	Linux (Linux, Windows)
Assigned to:	Guilhem Bichot	CPU Architecture:	Any

Description:
When INSERT ... ON DUPLICATE KEY UPDATE executes the UPDATE statement instead of the INSERT statement, LAST_INSERT_ID() gets incremented as if a row was added to the table, even though no such thing happened.

There are three problems with this:

1) It gives the false impression that a row was inserted, and returns an invalid nonexistent id, which leads to buggy applications.

2) It limits the usefulness of the ON DUPLICATE KEY UPDATE functionality, since there is no way to tell which row was updated.

3) It limits the usefulness of the ON DUPLICATE KEY UPDATE functionality, since there is no way to tell whether or not the command inserted a row or updated one.

The problem exists at least for both INNODB engine and MyISAM engine.

How to repeat:
mysql> CREATE TABLE `d` (
    ->   `k` int(11) NOT NULL auto_increment,
    ->   `a` int(11) default NULL,
    ->   `c` int(11) default NULL,
    ->   PRIMARY KEY  (`k`),
    ->   UNIQUE KEY `idx_1` (`a`)
    -> ) ENGINE=INNODB DEFAULT CHARSET=latin1
    -> ;
Query OK, 0 rows affected (0.25 sec)

mysql> insert into d ( a ) values ( 6 ) on duplicate key update c = ifnull( c, 0 ) + 1;
Query OK, 1 row affected (0.12 sec)

mysql> select last_insert_id();
+------------------+
| last_insert_id() |
+------------------+
|                1 |
+------------------+
1 row in set (0.00 sec)

mysql> select * from d;
+---+------+------+
| k | a    | c       |
+---+------+------+
| 1 |    6 |    NULL |
+---+------+------+
1 row in set (0.00 sec)

mysql> insert into d ( a ) values ( 6 ) on duplicate key update c = ifnull( c, 0 ) + 1;
Query OK, 2 rows affected (0.11 sec)

mysql> select last_insert_id();
+------------------+
| last_insert_id() |
+------------------+
|                2 |
+------------------+
1 row in set (0.00 sec)

mysql> select * from d;
+---+------+------+
| k | a    | c    |
+---+------+------+
| 1 |    6 |    1 |
+---+------+------+
1 row in set (0.00 sec)

Suggested fix:
In the case where INSERT...ON DUPLICATE KEY UPDATE... results in an update, LAST_INSERT_ID() should return the id of the updated row, instead of the id that would have been created.

Doing this simple fix will fix the first 2 problems listed in the Description.

Downgrading to S2, since the workaround is just not use INSERT...ON DUPLICATE KEY UPDATE..., but rather just do it the old cumbersome way with an INSERT, catch the duplicate key error if there is one, etc.

Verified just as described on 5.0.21-BK build on Linux:

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.0.21    |
+-----------+
1 row in set (0.00 sec)

mysql> create table d(k int not null auto_increment,
    -> a int, c int, primary key(k), unique key idx_1(a)) engine=InnoDB;
Query OK, 0 rows affected (0.03 sec)

mysql> insert into d ( a ) values ( 6 ) on duplicate key update c = ifnull( c,
0 ) + 1;
Query OK, 1 row affected (0.02 sec)

mysql> select last_insert_id();
+------------------+
| last_insert_id() |
+------------------+
|                1 |
+------------------+
1 row in set (0.01 sec)

mysql> select * from d;
+---+------+------+
| k | a    | c    |
+---+------+------+
| 1 |    6 | NULL |
+---+------+------+
1 row in set (0.00 sec)

mysql> insert into d ( a ) values ( 6 ) on duplicate key update c = ifnull( c,
0 ) + 1;
Query OK, 2 rows affected (0.01 sec) <--- this 2 rows is problem #1

mysql> select last_insert_id();
+------------------+
| last_insert_id() |
+------------------+
|                2 |                  <---- and this is problem #2 
+------------------+
1 row in set (0.00 sec)

mysql> select * from d;
+---+------+------+
| k | a    | c    |
+---+------+------+
| 1 |    6 |    1 |
+---+------+------+
1 row in set (0.01 sec)

I see no explanation for this behaviour of last_insert_id() at http://dev.mysql.com/doc/refman/5.0/en/information-functions.html. So, this is either a bug or a documentation request.

This bug is also present in 4.1.18, on both platforms I just tested it on.  Mac OS X, and Linux.  Both problem #1 and #2 are as shown above.

Same here, where it breaks an interesting (ab)use of on duplicate key update.  I need a function that inserts a row and return the new id, or if the row already exists, returns the existing id.  I'm behind odbc, so I cannot use mysql_insert_id() from the C API.
I came up with this:

  set @id:=-1; insert into pseudoana (lat, lon, ident) values (2, 2, 'foo') ON DUPLICATE KEY UPDATE id = (@id:=values(id)); select if(@id = -1,last_insert_id(),@id);

Which has this unexpected weird result:
 - the first time, it inserts the row and returns the last_insert_id(), say 1
 - the second time, it updates the id with a new value, say 2, and returns it
 - all the following times, it returns 2

It's quite frustrating to have no way of doing this simple thing atomically.  And I can't use InnoDB tables because it is very important for the usage patterns of our database to be able to shrink data files when they are empty, which InnoDB cannot do.

I'm sorry, it was the wrong query.  values(id) was taking the value from the input values, which indeed was the autogenerated ID.

The correct version is this one, that works:

set @id:=-1; insert into pseudoana (lat, lon, ident) values (2, 2, 'foo') ON DUPLICATE KEY UPDATE id = (@id:=id); select if(@id = -1,last_insert_id(),@id);

However, it's not that efficient, as it would do an update for something that could be a select.  frustrating that there's no efficient way to handle this common case.  Looks like I'll have to do a select + insert inside a transaction.

Another testcase based on the same ideas (but without ON DUPLICATE KEY UPDATE):
CREATE TABLE `d` (
  `k` int(11) NOT NULL auto_increment,
  `a` int(11) default NULL,
  `c` int(11) default NULL,
  PRIMARY KEY  (`k`),
  UNIQUE KEY `idx_1` (`a`)
) ENGINE=InnoDB;
insert into d values(null,4,2);
insert into d values(null,4,2); select last_insert_id(); select * from d;
ERROR 1062 (23000): Duplicate entry '4' for key 2
+------------------+
| last_insert_id() |
+------------------+
|                0 |
+------------------+
insert into d values(null,5,2),(null,4,2); select last_insert_id(); select * from d;
ERROR 1062 (23000): Duplicate entry '4' for key 2
+------------------+
| last_insert_id() |
+------------------+
|                2 |
+------------------+
1 row in set (0.00 sec)

+---+------+------+
| k | a    | c    |
+---+------+------+
| 1 |    4 |    2 |
+---+------+------+
The first LAST_INSERT_ID() returned "0" while the second LAST_INSERT_ID() returned "2" which is a number bigger than what is in the table in the end.

Sorry, my testcase is bad; it's known that when the INSERT statement gives an error, LAST_INSERT_ID() is undefined.

This bug is fixed in 5.1 version.
As this is a change in behavior 4.1 and 5.0 are left as they are.

If you use 

  on duplicate key update id = LAST_INSERT_ID(id)

then

  select LAST_INSERT_ID();

will return either the new id or the updated id.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/11198

ChangeSet@1.2291, 2006-08-31 21:54:52+02:00, guilhem@gbichot3.local +3 -0
  New way to fix BUG#19243 "wrong LAST_INSERT_ID() after ON DUPLICATE KEY UPDATE".
  This bug report was two problems:
  1) LAST_INSERT_ID() returns a value which does not exist in the table
  2) the reporter would want it to return the autoinc id of the updated
  row. 1) is a real bug, 2) is a feature request.
  In July I implemented 2) in 5.1 (which automatically fixes 1).
  This has not yet been documented or released, so is changeable.
  Precisely, recently Paul and a user found an easy workaround to give
  2), which works in 4.1-5.0-5.1. So I can revert my code for 2),
  because it's not needed, that's what I do here;
  we forget about 2) (we will document the workaround). 
  But when I revert my code for 2), 1) comes back. We solve 1) by saying
  that if INSERT ON DUPLICATE KEY UPDATE updates a row, it's like a
  regular UPDATE: LAST_INSERT_ID() should not be affected (instead of
  returning a non-existent value).
  So note: no behaviour change compared to the last released 5.1; just
  a bugfix for 1).

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/11460

ChangeSet@1.2310, 2006-09-06 12:50:42+02:00, guilhem@gbichot3.local +3 -0
  New way to fix BUG#19243 "wrong LAST_INSERT_ID() after ON DUPLICATE KEY UPDATE".
  This bug report was two problems:
  1) LAST_INSERT_ID() returns a value which does not exist in the table
  2) the reporter would want it to return the autoinc id of the updated
  row.
  1) is a real bug, 2) is a feature request.
  In July I implemented 2) in 5.1 (which automatically fixes 1).
  This has not yet been documented or released, so is changeable.
  Precisely, recently Paul and a user found an easy workaround to give
  2), which works in 4.1-5.0-5.1. So I can revert my code for 2),
  because it's not needed, that's what I do here;
  we forget about 2) (we will document the workaround).
  But when I revert my code for 2), 1) comes back. We solve 1) by saying
  that if INSERT ON DUPLICATE KEY UPDATE updates a row, it's like a
  regular UPDATE: LAST_INSERT_ID() should not be affected (instead of
  returning a non-existent value).
  So note: no behaviour change compared to the last released 5.1; just
  a bugfix for 1).

LAST_INSERT_ID() is not meaningful when INSERT ... UPDATE
updates a row. However, to work around this so that LAST_INSERT_ID()
is meaningful whether the statement inserts or updates a row,
use LAST_INSERT_ID(expr):

INSERT INTO t (col_list) VALUES(value_list)
ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id), <other assignments here>;

(This assumes that id is the table's AUTO_INCREMENT column.)

I'll add this workaround to the manual.

Merged into 5.1.12

Noted in 5.1.12 changelog.

After an INSERT ... ON DUPLICATE KEY UPDATE statement that updated an existing row, LAST_INSERT_ID() could return a value not in the table.

Bug #38900 was marked as duplicate of this one.