Bug #114250 Wrong auto_increment set when using INSERT SELECT with ORDER RAND()
Submitted: 6 Mar 2024 18:15 Modified: 7 Mar 2024 12:24
Reporter: Renan Benedicto Pereira Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:8.0.31-google OS:Linux (Cloud SQL)
Assigned to: CPU Architecture:Other
Tags: auto_increment, insert select, order by rand

[6 Mar 2024 18:15] Renan Benedicto Pereira
Description:
The server defines wrong auto_increment when batch inserting into a table using a INSERT SELECT ... ORDER BY RAND() statement.

Analysing the query plan, it shows that the engine uses temporary table and file sort. What let me think the final table inheriting the temporary auto_increment column what is reaching the datatype limit to handle IDs in random order.

When the SELECT statement returns 50000 rows, the auto_increment must be set to 50001, but it is set to 65536 (smallint limit +1).
Why smallint? Maybe internally chosen based on the list size (50000).

How to repeat:
Execute: 

CREATE TABLE `matrix` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `matrixTypeId` int NOT NULL,
  `number` int unsigned NOT NULL,
  `dozens` varchar(40) DEFAULT NULL,
  `dozensSet` set('01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47','48','49','50','51','52','53','54','55','56','57','58','59','60') DEFAULT NULL,
  `createdAt` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `updatedAt` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`,`matrixTypeId`),
  KEY `fk_matrix_matrixtype` (`matrixTypeId`)
) ENGINE=InnoDB AUTO_INCREMENT=72152597 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY LIST (`matrixTypeId`)
(PARTITION p1 VALUES IN (1) ENGINE = InnoDB,
 PARTITION p2 VALUES IN (2) ENGINE = InnoDB,
 PARTITION p3 VALUES IN (3) ENGINE = InnoDB,
 PARTITION p4 VALUES IN (4) ENGINE = InnoDB,
 PARTITION p5 VALUES IN (5) ENGINE = InnoDB,
 PARTITION p6 VALUES IN (6) ENGINE = InnoDB,
 PARTITION p7 VALUES IN (7) ENGINE = InnoDB,
 PARTITION p8 VALUES IN (8) ENGINE = InnoDB,
 PARTITION p9 VALUES IN (9) ENGINE = InnoDB,
 PARTITION p10 VALUES IN (10) ENGINE = InnoDB) */;

Insert data in table `matrix` to have the scenario of the SELECT statement.

CREATE TABLE `matrixPromotion667` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `matrixId` int unsigned NOT NULL,
  `couponNumber` int unsigned NOT NULL,
  `dozensSet` set('01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47','48','49','50','51','52','53','54','55','56','57','58','59','60') NOT NULL,
  `reservedPersonId` int DEFAULT NULL,
  `reservationExpirationTime` timestamp NULL DEFAULT NULL,
  `status` enum('available','chosen','used') NOT NULL DEFAULT 'available',
  `updatedAt` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `createdAt` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `matrixPromotion667` (id, matrixId, couponNumber, dozensSet)
  SELECT id as matrixId, number as couponNumber, dozensSet
  FROM matrix
  WHERE matrixTypeId = 7 AND number BETWEEN 1 AND 50000
  ORDER BY RAND();

The statement bellow returns 50000:
SELECT MAX(id) FROM `matrixPromotion667`;

The statement bellow still returns 1:
SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_NAME = 'matrixPromotion667';

Executing the statements bellow will show the correct values:
ANALYZE TABLE matrixPromotion667 ;
SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_NAME = 'matrixPromotion667';

But note the AUTO_INCREMENT returned is not 50000, it was set to 65536.
[7 Mar 2024 11:25] MySQL Verification Team
Hi Mr. Pereira,

Thank you for your bug report.

However, what you describe is expected behaviour. Simply, that is how InnoDB is designed to work with INSERT .... SELECT .....

Some of us have been part of that design. This has also to do with the fact that InnoDB is a transactional engine. As such, it has to have auto-increment key values reserved PRIOR to the start of the INSERT statement. Hence, InnoDB tries to calculate how many rows will SELECT return. Meanwhile, other transactions may run other DML statements, which might interfere with with a running INSERT and change the number of rows that is reserved. 

That is how InnoDB deals with INSERT from SELECT since 2003.

The same is valid for INSERT with multi-row INSERT.

Hence, this is the expected behaviour and it is here to stay.

Some of this is explained in the Reference Manual, which would be read :

https://dev.mysql.com/doc/refman/8.0/en/innodb-auto-increment-handling.html

Not a bug.
[7 Mar 2024 12:24] Renan Benedicto Pereira
Thank you for fast answer!

So it compromises the ACID, may I have to hope for future changes about it?

The workaround I used here is to create a "row_id" in the SELECT statement to hard load this column instead let auto_increment take the decision. 
Whatever it still have the behavior that you mentioned, reserving some others ID "slots". 
But what paid me attention for, was the exactly size of SMALLINT was used.

Great thanks everyone that works hard on MySQL development and improvements !
[7 Mar 2024 13:05] MySQL Verification Team
Hi,

Currently there are no plans about changing this behaviour.

It is very, very rare that ACID is compromised. Also, do remember that neither SQL nor ACID allow for the auto-increment type, at all.

Thank you for your kind words.

We wish you the best.