MySQL Bugs: #70692: auto_increment wrong value when using insert into... select from

Bug #70692	auto_increment wrong value when using insert into... select from
Submitted:	23 Oct 2013 3:46	Modified:	23 Oct 2013 6:21
Reporter:	ricky leung	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: DML	Severity:	S2 (Serious)
Version:	5.6.12,5.6.14,5.5.34,5.1.72	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
When copying into a table with a primary key with auto_increment, using insert into...select from, the auto_increment value of the table becomes wrong.  Subsequent inserts into the table will use the wrong value

How to repeat:
1. Create any table and insert some values:
create table t1(name varchar(10));
insert into t1 values ('abc');
insert into t1 values ('abcd');
insert into t1 values ('ghj');
insert into t1 values ('ghjd');
insert into t1 values ('ghjd oop');

2. Create a table with auto_increment field:
create table t2 (ID int not null primary key auto_increment, NAME varchar(10) not null);
alter table t2 add unique(name);

3. Populate the table t2 with values from t1:
insert into t2(name) select name from t1;

4. Verify the values from t2, everything seems fine up to now:
mysql> select * from t2;
+----+----------+
| ID | NAME     |
+----+----------+
|  1 | abc      |
|  2 | abcd     |
|  3 | ghj      |
|  4 | ghjd     |
|  5 | ghjd oop |
+----+----------+
5 rows in set (0.00 sec)

5. Check the table t2 you will see the auto_increment value not correct:
mysql> show create table t2;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                        |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `NAME` varchar(10) NOT NULL,
  PRIMARY KEY (`ID`),
  UNIQUE KEY `NAME` (`NAME`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8 |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

6. Now insert some more values into t2, the auto_increment field will be wrong:
mysql> insert into t2(name) values ('new');
Query OK, 1 row affected (0.00 sec)

mysql> select * from t2;
+----+----------+
| ID | NAME     |
+----+----------+
|  1 | abc      |
|  2 | abcd     |
|  3 | ghj      |
|  4 | ghjd     |
|  5 | ghjd oop |
|  8 | new      |
+----+----------+
6 rows in set (0.00 sec)

Suggested fix:
Please fix the wrong auto_increment symptom when using insert into...select from syntax

Hello ricky,

Thank you for the bug report.
Verified as described.

Thanks,
Umesh

There have been lots of similar discussion here.

In my understanding this is expected behavior with innodb_autoinc_lock_mode variable set to "1".

Peter
(not a MySQL/Oracle person)

According to the source code https://github.com/mysql/mysql-server/blob/8.0/sql/handler.cc#L3820, for statements like INSERT INTO ... SELECT, where the exact number of inserted rows is unknown in advance, the SQL handler reserves intervals of auto-increment values while processing the records, every time reserving twice as larger an interval as the previous one as soon as the number of rows reaches the highest value of the previous interval. For example, when inserting 4 records, first, an interval of 1 gets reserved, afterwards an interval of 2, then 4, and in the end 8. In this particular case, the next auto-increment value is going to be 8 instead of 5, and as a result, a gap between 4 and 8 appears. With larger numbers, the gap gets much more significant, for instance, while processing data in bulk say by 5000, the next auto-increment will be the current column value + 8192 instead of the current column value + 5000 + 1. As a workaround to this, data could be processed in batches equal to the value of 2^n-1. However, even with this, gaps can't be avoided at all but at least could be reduced.