Bug #107128 InnoDB table: gaps in auto increment doing LOAD DATA o INSERT INTO ... SELECT
Submitted: 26 Apr 2022 10:13 Modified: 26 Apr 2022 13:54
Reporter: Agostino di Salle Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.28 OS:Red Hat
Assigned to: MySQL Verification Team CPU Architecture:Any

[26 Apr 2022 10:13] Agostino di Salle
Description:
Using InnoDB table with auto increment we noted gaps in auto increment when inserting rows using LOAD DATA or INSERT INTO ... SELECT statements.

Approximately we see around 30% "lost" auto increment id.
Doing extensive test to understand the amount of the "lost" auto increment id we see this doin LOAD DATA:

- file rows: 1, allocated auto_increment_id: 1
- file rows between 2 and 3, allocated auto_increment_id: 3
- file rows between 4 and 7, allocated auto_increment_id: 7
- file rows between 8 and 15, allocated auto_increment_id: 15
- file rows between 8 and 15, allocated auto_increment_id: 15
- file rows between 15 and 31, allocated auto_increment_id: 31
- file rows between 512 and 1023, allocated auto_increment_id: 1023
- file rows between 1024 and 2047, allocated auto_increment_id: 2047

it's like the server get number of rows to be inserted from the file to be loaded (or from the SELECT) and allocate a block of auto increment using this formula:

n = number of rows
b = number of bits to represents n
allocated auto increment: (2 ^ log2(b)) - 1 

6 rows files -> 3 bits to repsent 6 -> allocated auto increment: 2^3 -1 = 7
255 rows files -> 8 bits to represent 255 -> allocated auto increment: 2^8 -1 = 255
256 rows files -> 9 bits to represent 256 -> allocated auto increment: 2^9 -1 = 511
33000 rows files -> 16 bits to represent 33000 -> allocated auto increment 2?16 -1 = 65535

in this moment our workaround to avoid this issue is to slit insert into small batch that don't create gaps, for example:
- to load 512 record with do two separate insert, one of 511 rows and one of 1 row.
- to load 491 record we do 5 separate insert of 255,127,63,31,15 rows

is it possible to fix the server in order to allocate a range of auto increment that it's equal to the number of rows that are being inserted ? Doing so gaps will be present only in case of duplicate unique keys.

regards,
Agostino

How to repeat:
create a table with an auo increment primary key and check value of next auto increment after a LOAD DATA tryng to load different number of rows.

Suggested fix:
allocate an auto increment range = number of rows that are being inserted
[26 Apr 2022 13:17] MySQL Verification Team
Hi,

> with auto increment we noted gaps in auto increment when inserting rows

SQL standard does not promise you there will be no gaps nor that id's will be in order. This is not a bug, this is by design.

Thanks for your report
[26 Apr 2022 13:24] Agostino di Salle
Maybe I don't understand but if I'm inserting 512 or 513 or 600 rows why the server need always to allocate 1023 ids ? The issue is not about out of order but about the fact that the sw is always allocating a predictable range of id generally greater that needed.

Why do not allocate exactly the same like the number of rows being inserted? Like it's done by INSERT INRTO tables VALUES (...), (...), (...) ?

Regards,
Agostino
[26 Apr 2022 13:39] MySQL Verification Team
Hi Agostino,

It is done because of the speed as multiple threads can insert rows in the same time; so every thread will allocate some number of ID's. If each thread would have to allocate id for every row it would cause high contention and would severely limit performance. As I already stated, SQL standard allows for this so holes in auto increment fields are normal and expected. You only do not have these holes in table lock storage engines like myisam. For some storage engines (e.g. ndbcluster) the amount of id's being allocated is tunable, in others it is hardcoded (and can be changed without warning through versions).

kind regards
Bogdan Kecman
[26 Apr 2022 13:54] Agostino di Salle
Hi Bodgan, just a real example:
- LOAD DATA of 511 rows... the server always allocate exactly 511 ids.
- LOAD DATA of 512 rows... the server always allocate exactly 1023 ids.

What I don't understand is why in the second example the server can't allocate just 512 ids instead of 1023.

Agostino
[26 Apr 2022 14:01] MySQL Verification Team
Hi,

This is now turning into "how MySQL Server works" and for that there is no place in bugs system. As I mentioned, this is not a bug and is how SQL standard allows and most SQL servers implement the feature. For why/how please either contact our MySQL Support system or use free https://forums.mysql.com/

kind regards