Bug #88827 innodb uses too much space in the PK for concurrent inserts into the same table
Submitted: 8 Dec 2017 1:30 Modified: 2 Apr 2018 4:34
Reporter: Mark Callaghan Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.6.35, 5.6.38, 5.7.20, 8.0.4 OS:Any
Assigned to: CPU Architecture:Any

[8 Dec 2017 1:30] Mark Callaghan
Description:
This reproduces for InnoDB in MySQL 5.6.35, 5.7.17 and 8.0.3

The test case is the insert benchmark using:
* 16 clients and 1 table, and then 16 clients and 16 tables
* database larger than RAM -- in my case 2B rows inserted, 50g RAM on server

After the load finishes:
* data_length in SHOW TABLE STATUS is ~2X larger for the 1 table test
* from iostat output captured during a full scan of the PK there is ~2X more data read from storage in the 1 table test -- 205gb vs 125gb

The PK uses an auto-inc column and multi-row inserts are done.

How to repeat:
I explain how to run the insert benchmark here:
http://smalldatum.blogspot.com/2017/06/the-insert-benchmark.html

I will update this with a blog post that has more details. I am not sure my reproduction details will ever satisfy Sinisa but I don't mind if you don't fix this because I care more about MyRocks today and this bug makes MyRocks look better.

Suggested fix:
I don't know the root cause.
[8 Dec 2017 2:09] Mark Callaghan
For more details on the reproduction:
* this explains how I run the insert benchmark - http://smalldatum.blogspot.com/2017/06/the-insert-benchmark.html
* this is the insert benchmark - https://github.com/mdcallag/mytools/blob/master/bench/ibench/iibench.py
* mostly accurate my.cnf files for MySQL 5.6, 5.7 and 8.0 are here - https://github.com/mdcallag/mytools/tree/master/bench/conf/wrk

To save you from running the scripts, this command line probably works for the 16 table test, but I haven't run this script so there can be bugs:

seed=$( date +%s )

for tnum in $( seq 1 16 ); do

iibench.py --db_name=ib --rows_per_report=100000 --db_host=127.0.0.1 --db_user=root --db_password=pw --engine=innodb --engine_options= --unique_checks=1 --bulk_load=0 --max_rows=125000000 --table_name=pi${tnum} --setup --num_secondary_indexes=3 --data_length_min=10 --data_length_max=20 --rows_per_commit=100 --inserts_per_second=0 --query_threads=0 --seed=$(( $seed + $tnum )) &

done

And this might work for the 1 table test. The difference is that only one client should use --setup for the 1 table test:

seed=$( date +%s )

iibench.py --db_name=ib --rows_per_report=100000 --db_host=127.0.0.1 --db_user=root --db_password=pw --engine=innodb --engine_options= --unique_checks=1 --bulk_load=0 --max_rows=125000000 --table_name=pi1 --setup --num_secondary_indexes=3 --data_length_min=10 --data_length_max=20 --rows_per_commit=100 --inserts_per_second=0 --query_threads=0 --seed=$(( $seed + $tnum )) &

for tnum in $( seq 2 16 ); do

iibench.py --db_name=ib --rows_per_report=100000 --db_host=127.0.0.1 --db_user=root --db_password=pw --engine=innodb --engine_options= --unique_checks=1 --bulk_load=0 --max_rows=125000000 --table_name=pi1 --num_secondary_indexes=3 --data_length_min=10 --data_length_max=20 --rows_per_commit=100 --inserts_per_second=0 --query_threads=0 --seed=$(( $seed + $tnum )) &

done
[8 Dec 2017 18:57] Mark Callaghan
Just a reminder, the problem (extra space) is much worse when database size is much greater than the buffer pool size.
[28 Dec 2017 13:51] MySQL Verification Team
Hi!

Thank you for your report. I will try to reproduce the behaviour with all of the scripts that you have provided. If I run into any problems due to the insufficient data, I will let you know.
[28 Dec 2017 15:36] MySQL Verification Team
Hi Mark,

Can you please just confirm that, in order to repeat your results, we need to run only iibench.py and not iq.ah ???

Needless to say, data_length should be retrieved after both runs.
[28 Dec 2017 15:40] MySQL Verification Team
Hi Mark,

I could be wrong, but, may be, iq.sh should be run before iibench.py, as it seems to be preparing the benchmark.

Sorry for so many questions, but I have been using only sysbench for the tests like this one.
[28 Dec 2017 22:37] Mark Callaghan
Repeated today on my home servers using 5M rows/user and 16 users -> 80M rows table.
* database directory size is 15gb with 16 tables and 22gb with 1 table
* data_length/index_length are about 5gb/9gb for 16 tables and then 10g/9gb for 1 table

Shell script that I used is called ibug.sh and is here: https://gist.github.com/mdcallag/0843d07f83938108e180f0f0bcbda1fe

Command lines that assume db login is root/pw and schema is "ib". The script doesn't create the schema and assumes the tables don't exist when it starts:
* 1 table -> bash ibug.sh 16 yes 5000000 root pw ib
* 16 tables -> bash ibug.sh 16 no 5000000 root pw ib

Script uses iibench.py from https://github.com/mdcallag/mytools/blob/master/bench/ibench/iibench.py

For 16 tables:
mysql> select count(*), sum(data_length)/(1024*1024) as dl_mb, sum(index_length)/(1024*1024) as il_mb, table_rows from tables where table_schema = 'ib'\G
*************************** 1. row ***************************
  count(*): 16
     dl_mb: 5102.2969
     il_mb: 8949.2813
table_rows: 4860194

For 1 table:
mysql> select count(*), sum(data_length)/(1024*1024) as dl_mb, sum(index_length)/(1024*1024) as il_mb, table_rows from tables where table_schema = 'ib'\G
*************************** 1. row ***************************
  count(*): 1
     dl_mb: 10034.0000
     il_mb: 9432.8438
table_rows: 79899463
[28 Dec 2017 22:38] Mark Callaghan
Full show table status output for 16 tables:
+------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation         | Checksum | Create_options | Comment |
+------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| pi1  | InnoDB |      10 | Dynamic    | 4860194 |             68 |   333332480 |               0 |    584876032 |   7340032 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:30 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi10 | InnoDB |      10 | Dynamic    | 4861308 |             71 |   348028928 |               0 |    618414080 |   7340032 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:26 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi11 | InnoDB |      10 | Dynamic    | 4860880 |             69 |   339640320 |               0 |    595345408 |   6291456 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:48 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi12 | InnoDB |      10 | Dynamic    | 4870633 |             65 |   317587456 |               0 |    547110912 |   5242880 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:11 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi13 | InnoDB |      10 | Dynamic    | 4865543 |             67 |   328089600 |               0 |    570195968 |   7340032 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:28 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi14 | InnoDB |      10 | Dynamic    | 4865207 |             68 |   331235328 |               0 |    578568192 |   4194304 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:13 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi15 | InnoDB |      10 | Dynamic    | 4858500 |             70 |   342786048 |               0 |    605847552 |   5242880 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:56 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi16 | InnoDB |      10 | Dynamic    | 4861615 |             66 |   322830336 |               0 |    556564480 |   7340032 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:16 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi2  | InnoDB |      10 | Dynamic    | 4859740 |             71 |   346980352 |               0 |    622575616 |   6291456 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:36 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi3  | InnoDB |      10 | Dynamic    | 4853867 |             70 |   343834624 |               0 |    608976896 |   6291456 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:29 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi4  | InnoDB |      10 | Dynamic    | 4858277 |             69 |   338575360 |               0 |    595345408 |   4194304 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:25 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi5  | InnoDB |      10 | Dynamic    | 4870581 |             65 |   317587456 |               0 |    552353792 |   4194304 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:16 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi6  | InnoDB |      10 | Dynamic    | 4864090 |             68 |   333332480 |               0 |    585908224 |   4194304 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:56 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi7  | InnoDB |      10 | Dynamic    | 4871965 |             64 |   316538880 |               0 |    543965184 |   7340032 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:20 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi8  | InnoDB |      10 | Dynamic    | 4847681 |             71 |   345931776 |               0 |    611074048 |   6291456 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:30 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi9  | InnoDB |      10 | Dynamic    | 4853834 |             70 |   343834624 |               0 |    606879744 |   4194304 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:18 | NULL       | latin1_swedish_ci |     NULL |                |         |
+------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+

And full show table status output for 1 table:
+------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows     | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation         | Checksum | Create_options | Comment |
+------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| pi1  | InnoDB |      10 | Dynamic    | 79899463 |            131 | 10521411584 |               0 |   9891053568 |   4194304 |       80000001 | 2017-12-28 10:17:56 | 2017-12-28 12:49:57 | NULL       | latin1_swedish_ci |     NULL |                |         |
+------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
[28 Dec 2017 22:48] Mark Callaghan
I wrote above "80M rows table". That is confusing. The example used 80M rows total. With 1 table there is 80M rows in 1 table. With 16 tables there are 5M rows/table. In both cases there were 16 concurrent connections doing the insert.
[29 Dec 2017 12:34] MySQL Verification Team
Hi Mark,

Thank you for your feedback. That clears out all muddy waters, so far. I will be running these scripts today.
[30 Dec 2017 17:06] Mark Callaghan
Great. Just a reminder that reproduction requires database >> buffer pool. In my test I used 1gb buffer pool with 80M rows and database ended up >> 10gb
[5 Jan 2018 4:32] MySQL Verification Team
Thank you Mark, for the detailed steps and explanation.

Regards,
Umesh
[5 Jan 2018 4:36] MySQL Verification Team
Test results - 5.6.35, 5.6.38, 5.7.20 and 8.0.4

Attachment: 88827.results (application/octet-stream, text), 45.91 KiB.

[27 Mar 2018 6:16] Aakanksha Verma
Observations while reproducing bug on 5.6:
for 1 table 16 conn on 5.6
Database changed
mysql> select count(*), sum(data_length)/(1024*1024) as dl_mb,
sum(index_length)/(1024*1024) as il_mb, table_rows from tables where
table_schema = 'ib'\G
*************************** 1. row ***************************
  count(*): 1
     dl_mb: 9779.9844
     il_mb: 9765.8750
table_rows: 86241038
1 row in set (0.00 sec)
mysql> select count(*), sum(data_length)/(1024*1024) as dl_mb,
sum(index_length)/(1024*1024) as il_mb, table_rows from tables where
table_schema = 'ib'\G
*************************** 1. row ***************************
  count(*): 16
     dl_mb: 5125.3438
     il_mb: 9003.3750
table_rows: 4863024.
1 row in set (0.05 sec)

What i am seeing is even though the clustered index memory occupied is almost
twice in the case of 1 table and 16 conn as compared to 16 table 16 conn ,
the number of rows in case of 1 table/16 conn(86241038) is way more than that
of 16 table/16 conn 4863024 .
Can this still be considered a bug since number of rows largely differ even
though the script try to insert 80M rows essentially in both the cases.
[27 Mar 2018 22:36] Mark Callaghan
It is not a bug if the row count sum for 16 tables don't match the row count for 1 table. But you measured the estimated row count, not the actual row count from the tables.