MySQL Bugs: #34319: last_insert_id() inaccurate when inserting multiple rows

Bug #34319	last_insert_id() inaccurate when inserting multiple rows
Submitted:	5 Feb 2008 18:21	Modified:	6 Oct 2008 21:56
Reporter:	Kevin Benton (Candidate Quality Contributor)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: DML	Severity:	S4 (Feature request)
Version:	5.1.22-rc, 5.0.45, 4.1.22	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	qc

Description:
LAST_INSERT_ID() returns the first new ID of the last insert rather than the last new ID of the last insert.  This bug shows up when doing multi-row inserts.

How to repeat:
mysql> use test;
Database changed
mysql> create temporary table t1 (id mediumint auto_increment not null primary key, b2 char not null default ' ');
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t1 (b2) values ('a'), ('b');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
|                1 |
+------------------+
1 row in set (0.00 sec)

mysql> insert into t1 (b2) values ('c'), ('d');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
|                3 |
+------------------+
1 row in set (0.00 sec)

mysql> insert into t1 (b2) values ('e');
Query OK, 1 row affected (0.00 sec)

mysql> select LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
|                5 |
+------------------+
1 row in set (0.00 sec)

mysql>

Suggested fix:
Make LAST_INSERT_ID() return the value of the last ID inserted whether or not the insert was multi-row.

Thank you for the bug report. Currently that is the expected behavior:

http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html

"Note

For a multiple-row insert, LAST_INSERT_ID() and mysql_insert_id() actually return the AUTO_INCREMENT key from the first of the inserted rows. This allows multiple-row inserts to be reproduced correctly on other servers in a replication setup."

I read the documentation before filing the bug and I still think this is a bug.

I can see the arguments coming - "if you want the real last_insert_id, don't do multi-row inserts" or "use the expr as documented in the help" - the problem is that the name of the function misleads users into thinking that they're really going to get the last insert ID, not the first ID of the last insert.

Here's the help from 4.1.22:

mysql> \h LAST_INSERT_ID
Name: 'LAST_INSERT_ID'
Description:
Syntax:
LAST_INSERT_ID(), LAST_INSERT_ID(expr)

LAST_INSERT_ID() (with no argument) returns the first automatically
generated value that was set for an AUTO_INCREMENT column by the most
recently executed INSERT or UPDATE statement to affect such a column.
For example, after inserting a row that generates an AUTO_INCREMENT
value, you can get the value like this:

mysql> SELECT LAST_INSERT_ID();
        -> 195

The currently executing statement does not affect the value of
LAST_INSERT_ID(). Suppose that you generate an AUTO_INCREMENT value
with one statement, and then refer to LAST_INSERT_ID() in a
multiple-row INSERT statement that inserts rows into a table with its
own AUTO_INCREMENT column. The value of LAST_INSERT_ID() will remain
stable in the second statement; its value for the second and later rows
is not affected by the earlier row insertions. (However, if you mix
references to LAST_INSERT_ID() and LAST_INSERT_ID(expr), the effect is
undefined.)

If the previous statement returned an error, the value of
LAST_INSERT_ID() is undefined. For transactional tables, if the
statement is rolled back due to an error, the value of LAST_INSERT_ID()
is left undefined. For manual ROLLBACK, the value of LAST_INSERT_ID()
is not restored to that before the transaction; it remains as it was at
the point of the ROLLBACK.

Notice that here, expr is not explained at all.

I suggest one of the following in order of preference:

a) create a new function that does what this says it should do named so that it says what it really does (maybe FIRST_OF_LAST_INSERT_IDS()) and make this function do what it's name suggests,
b) do the following as a way to give users access to the list of last ID's inserted:
    mysql> SELECT ID FROM INFORMATION_SCHEMA.SESSION_LAST_INSERT_ID;
    -> 21124195
    -> 21124196
    -> 21124198
    mysql> SELECT ID FROM INFORMATION_SCHEMA.SESSION_LAST_INSERT_ID LIMIT 1;
    -> 21124195
c) change the documentation of this function and the functionality if appropriate so that users have access to the real last insert ID or,
d) create a new function called "REAL_LAST_INSERT_ID()" that does all of the above except that it always reports the session's last insert ID even if the last insert was a multi-row insert.

Regardless, I think the current implementation misleads users simply due to the name of the function.  I believe that the LAST_INSERT_ID() function should only return the most current ID regardless of whether the last insert was multi-row or not.

Thank you for the feedback. Since the current behavior is by design I am
changing to feature request. A work-around with the sample provide could be:

c:\dbs>5.1\bin\mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.24-rc-nt-log Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create temporary table t1 (id mediumint auto_increment not null primary key, b2
    -> char not null default ' ');
Query OK, 0 rows affected (0.05 sec)

mysql> insert into t1 (b2) values ('a'), ('b');
Query OK, 2 rows affected (0.03 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select row_count() + last_insert_id()-1;
+----------------------------------+
| row_count() + last_insert_id()-1 |
+----------------------------------+
|                                2 |
+----------------------------------+
1 row in set (0.03 sec)

mysql> insert into t1 (b2) values ('c'), ('d');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select row_count() + last_insert_id()-1;
+----------------------------------+
| row_count() + last_insert_id()-1 |
+----------------------------------+
|                                4 |
+----------------------------------+
1 row in set (0.00 sec)

Any update on this?

Thinking through this some more, the example listed above should be fine with MyISAM, but it may not work for row-level-locking tables (InnoDB, NDB, ...) where it's possible to have concurrent simultaneous inserts running in different sessions.

My point may be moot, however, if COMMIT guarantees that inserts will always have consecutive auto_increment ID's.  I'm curious if this could be an issue for the ARCHIVE storage engine as well.  Anyway, back at you guys.

> b) do the following as a way to give users access to the list of last ID's 
>  inserted:
>    mysql> SELECT ID FROM INFORMATION_SCHEMA.SESSION_LAST_INSERT_ID;
>    -> 21124195
>    -> 21124196
>    -> 21124198
>    mysql> SELECT ID FROM INFORMATION_SCHEMA.SESSION_LAST_INSERT_ID LIMIT 1;
>    -> 21124195

I vote for something along this line of thinking (i.e. exposing all of the IDs, whether as a memory table or otherwise) as a replacement for the last_insert_id() function, unless functions will be extended to return multiple rows anytime soon. An ordered set of ALL the IDs of the last insertion is obviously much more generally helpful than either the first OR last of that set. In my case, it would allow more efficient, mass insertions/updates of linked data into multiple tables with one statement per table (without an extra select for each making redundant trips through the indexes).

If the id of the first row of the last insertion is needed for replication, it's readily obtainable from the set.