Bug #88827 innodb uses too much space in the PK for concurrent inserts into the same table
Submitted: 8 Dec 2017 1:30 Modified: 2 Apr 4:34
Reporter: Mark Callaghan Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.6.35, 5.6.38, 5.7.20, 8.0.4 OS:Any
Assigned to: CPU Architecture:Any

[8 Dec 2017 1:30] Mark Callaghan
Description:
This reproduces for InnoDB in MySQL 5.6.35, 5.7.17 and 8.0.3

The test case is the insert benchmark using:
* 16 clients and 1 table, and then 16 clients and 16 tables
* database larger than RAM -- in my case 2B rows inserted, 50g RAM on server

After the load finishes:
* data_length in SHOW TABLE STATUS is ~2X larger for the 1 table test
* from iostat output captured during a full scan of the PK there is ~2X more data read from storage in the 1 table test -- 205gb vs 125gb

The PK uses an auto-inc column and multi-row inserts are done.

How to repeat:
I explain how to run the insert benchmark here:
http://smalldatum.blogspot.com/2017/06/the-insert-benchmark.html

I will update this with a blog post that has more details. I am not sure my reproduction details will ever satisfy Sinisa but I don't mind if you don't fix this because I care more about MyRocks today and this bug makes MyRocks look better.

Suggested fix:
I don't know the root cause.
[8 Dec 2017 2:09] Mark Callaghan
For more details on the reproduction:
* this explains how I run the insert benchmark - http://smalldatum.blogspot.com/2017/06/the-insert-benchmark.html
* this is the insert benchmark - https://github.com/mdcallag/mytools/blob/master/bench/ibench/iibench.py
* mostly accurate my.cnf files for MySQL 5.6, 5.7 and 8.0 are here - https://github.com/mdcallag/mytools/tree/master/bench/conf/wrk

To save you from running the scripts, this command line probably works for the 16 table test, but I haven't run this script so there can be bugs:

seed=$( date +%s )

for tnum in $( seq 1 16 ); do

iibench.py --db_name=ib --rows_per_report=100000 --db_host=127.0.0.1 --db_user=root --db_password=pw --engine=innodb --engine_options= --unique_checks=1 --bulk_load=0 --max_rows=125000000 --table_name=pi${tnum} --setup --num_secondary_indexes=3 --data_length_min=10 --data_length_max=20 --rows_per_commit=100 --inserts_per_second=0 --query_threads=0 --seed=$(( $seed + $tnum )) &

done

And this might work for the 1 table test. The difference is that only one client should use --setup for the 1 table test:

seed=$( date +%s )

iibench.py --db_name=ib --rows_per_report=100000 --db_host=127.0.0.1 --db_user=root --db_password=pw --engine=innodb --engine_options= --unique_checks=1 --bulk_load=0 --max_rows=125000000 --table_name=pi1 --setup --num_secondary_indexes=3 --data_length_min=10 --data_length_max=20 --rows_per_commit=100 --inserts_per_second=0 --query_threads=0 --seed=$(( $seed + $tnum )) &

for tnum in $( seq 2 16 ); do

iibench.py --db_name=ib --rows_per_report=100000 --db_host=127.0.0.1 --db_user=root --db_password=pw --engine=innodb --engine_options= --unique_checks=1 --bulk_load=0 --max_rows=125000000 --table_name=pi1 --num_secondary_indexes=3 --data_length_min=10 --data_length_max=20 --rows_per_commit=100 --inserts_per_second=0 --query_threads=0 --seed=$(( $seed + $tnum )) &

done
[8 Dec 2017 18:57] Mark Callaghan
Just a reminder, the problem (extra space) is much worse when database size is much greater than the buffer pool size.
[28 Dec 2017 13:51] Sinisa Milivojevic
Hi!

Thank you for your report. I will try to reproduce the behaviour with all of the scripts that you have provided. If I run into any problems due to the insufficient data, I will let you know.
[28 Dec 2017 15:36] Sinisa Milivojevic
Hi Mark,

Can you please just confirm that, in order to repeat your results, we need to run only iibench.py and not iq.ah ???

Needless to say, data_length should be retrieved after both runs.
[28 Dec 2017 15:40] Sinisa Milivojevic
Hi Mark,

I could be wrong, but, may be, iq.sh should be run before iibench.py, as it seems to be preparing the benchmark.

Sorry for so many questions, but I have been using only sysbench for the tests like this one.
[28 Dec 2017 22:37] Mark Callaghan
Repeated today on my home servers using 5M rows/user and 16 users -> 80M rows table.
* database directory size is 15gb with 16 tables and 22gb with 1 table
* data_length/index_length are about 5gb/9gb for 16 tables and then 10g/9gb for 1 table

Shell script that I used is called ibug.sh and is here: https://gist.github.com/mdcallag/0843d07f83938108e180f0f0bcbda1fe

Command lines that assume db login is root/pw and schema is "ib". The script doesn't create the schema and assumes the tables don't exist when it starts:
* 1 table -> bash ibug.sh 16 yes 5000000 root pw ib
* 16 tables -> bash ibug.sh 16 no 5000000 root pw ib

Script uses iibench.py from https://github.com/mdcallag/mytools/blob/master/bench/ibench/iibench.py

For 16 tables:
mysql> select count(*), sum(data_length)/(1024*1024) as dl_mb, sum(index_length)/(1024*1024) as il_mb, table_rows from tables where table_schema = 'ib'\G
*************************** 1. row ***************************
  count(*): 16
     dl_mb: 5102.2969
     il_mb: 8949.2813
table_rows: 4860194

For 1 table:
mysql> select count(*), sum(data_length)/(1024*1024) as dl_mb, sum(index_length)/(1024*1024) as il_mb, table_rows from tables where table_schema = 'ib'\G
*************************** 1. row ***************************
  count(*): 1
     dl_mb: 10034.0000
     il_mb: 9432.8438
table_rows: 79899463
[28 Dec 2017 22:38] Mark Callaghan
Full show table status output for 16 tables:
+------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation         | Checksum | Create_options | Comment |
+------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| pi1  | InnoDB |      10 | Dynamic    | 4860194 |             68 |   333332480 |               0 |    584876032 |   7340032 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:30 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi10 | InnoDB |      10 | Dynamic    | 4861308 |             71 |   348028928 |               0 |    618414080 |   7340032 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:26 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi11 | InnoDB |      10 | Dynamic    | 4860880 |             69 |   339640320 |               0 |    595345408 |   6291456 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:48 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi12 | InnoDB |      10 | Dynamic    | 4870633 |             65 |   317587456 |               0 |    547110912 |   5242880 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:11 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi13 | InnoDB |      10 | Dynamic    | 4865543 |             67 |   328089600 |               0 |    570195968 |   7340032 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:28 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi14 | InnoDB |      10 | Dynamic    | 4865207 |             68 |   331235328 |               0 |    578568192 |   4194304 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:13 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi15 | InnoDB |      10 | Dynamic    | 4858500 |             70 |   342786048 |               0 |    605847552 |   5242880 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:56 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi16 | InnoDB |      10 | Dynamic    | 4861615 |             66 |   322830336 |               0 |    556564480 |   7340032 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:16 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi2  | InnoDB |      10 | Dynamic    | 4859740 |             71 |   346980352 |               0 |    622575616 |   6291456 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:36 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi3  | InnoDB |      10 | Dynamic    | 4853867 |             70 |   343834624 |               0 |    608976896 |   6291456 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:29 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi4  | InnoDB |      10 | Dynamic    | 4858277 |             69 |   338575360 |               0 |    595345408 |   4194304 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:25 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi5  | InnoDB |      10 | Dynamic    | 4870581 |             65 |   317587456 |               0 |    552353792 |   4194304 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:16 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi6  | InnoDB |      10 | Dynamic    | 4864090 |             68 |   333332480 |               0 |    585908224 |   4194304 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:56 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi7  | InnoDB |      10 | Dynamic    | 4871965 |             64 |   316538880 |               0 |    543965184 |   7340032 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:50:20 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi8  | InnoDB |      10 | Dynamic    | 4847681 |             71 |   345931776 |               0 |    611074048 |   6291456 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:30 | NULL       | latin1_swedish_ci |     NULL |                |         |
| pi9  | InnoDB |      10 | Dynamic    | 4853834 |             70 |   343834624 |               0 |    606879744 |   4194304 |        5000001 | 2017-12-28 10:18:17 | 2017-12-28 12:49:18 | NULL       | latin1_swedish_ci |     NULL |                |         |
+------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+

And full show table status output for 1 table:
+------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows     | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation         | Checksum | Create_options | Comment |
+------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| pi1  | InnoDB |      10 | Dynamic    | 79899463 |            131 | 10521411584 |               0 |   9891053568 |   4194304 |       80000001 | 2017-12-28 10:17:56 | 2017-12-28 12:49:57 | NULL       | latin1_swedish_ci |     NULL |                |         |
+------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
[28 Dec 2017 22:48] Mark Callaghan
I wrote above "80M rows table". That is confusing. The example used 80M rows total. With 1 table there is 80M rows in 1 table. With 16 tables there are 5M rows/table. In both cases there were 16 concurrent connections doing the insert.
[29 Dec 2017 12:34] Sinisa Milivojevic
Hi Mark,

Thank you for your feedback. That clears out all muddy waters, so far. I will be running these scripts today.
[30 Dec 2017 17:06] Mark Callaghan
Great. Just a reminder that reproduction requires database >> buffer pool. In my test I used 1gb buffer pool with 80M rows and database ended up >> 10gb
[5 Jan 4:32] Umesh Shastry
Thank you Mark, for the detailed steps and explanation.

Regards,
Umesh
[5 Jan 4:36] Umesh Shastry
Test results - 5.6.35, 5.6.38, 5.7.20 and 8.0.4

Attachment: 88827.results (application/octet-stream, text), 45.91 KiB.

[27 Mar 6:16] Aakanksha Verma
Observations while reproducing bug on 5.6:
for 1 table 16 conn on 5.6
Database changed
mysql> select count(*), sum(data_length)/(1024*1024) as dl_mb,
sum(index_length)/(1024*1024) as il_mb, table_rows from tables where
table_schema = 'ib'\G
*************************** 1. row ***************************
  count(*): 1
     dl_mb: 9779.9844
     il_mb: 9765.8750
table_rows: 86241038
1 row in set (0.00 sec)
mysql> select count(*), sum(data_length)/(1024*1024) as dl_mb,
sum(index_length)/(1024*1024) as il_mb, table_rows from tables where
table_schema = 'ib'\G
*************************** 1. row ***************************
  count(*): 16
     dl_mb: 5125.3438
     il_mb: 9003.3750
table_rows: 4863024.
1 row in set (0.05 sec)

What i am seeing is even though the clustered index memory occupied is almost
twice in the case of 1 table and 16 conn as compared to 16 table 16 conn ,
the number of rows in case of 1 table/16 conn(86241038) is way more than that
of 16 table/16 conn 4863024 .
Can this still be considered a bug since number of rows largely differ even
though the script try to insert 80M rows essentially in both the cases.
[27 Mar 22:36] Mark Callaghan
It is not a bug if the row count sum for 16 tables don't match the row count for 1 table. But you measured the estimated row count, not the actual row count from the tables.