Bug #79025 Tablespace traversal/validation on recovery: performance regression 5.7 vs 5.6
Submitted: 29 Oct 2015 22:14 Modified: 19 Dec 2015 14:24
Reporter: Szymon Komendera Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[29 Oct 2015 22:14] Szymon Komendera
Description:
This is an observation related to behavior/performance of MySQL 5.7 recovery procedures. You may not consider it to be a bug per se, however it does look like a regression, especially for MySQL customers relying entirely on MySQL documentation as the source of knowledge.

#### BACKGROUND:

- MySQL <5.7.5 the infamous tablespace discovery is a brute-force operation that scans all IBDs to map space IDs to names for the purposes of crash recovery. This impacts recovery times in databases with large number of tables.
- MySQL >=5.7.5 promises improved recovery speeds by not having to scan the entire *.IBD file base. The docs do mention some additional scans required to perform recovery and the text does not explicitly say the recovery is quicker. Still, after working with 5.6 and reading the document [1], a MySQL customer would not expect performance degradation here.

References:

[1] https://dev.mysql.com/doc/refman/5.7/en/innodb-recovery-tablespace-discovery.html

#### SHORT ISSUE DESCRIPTION:

For the sake of brevity, I'm only comparing latest versions here i.e. MySQL 5.6.27 and 5.7.9. 

# Recovery behavior in MySQL 5.6.27:

- Tablespace information is loaded by "fil_load_single_table_tablespaces" called from "recv_init_crash_recovery" (storage/innobase/log/log0recv.cc). This is the most time-consuming step during the entire recovery phase.
- Then tablespaces are checked by "dict_check_tablespaces_and_store_max_id" called with DICT_CHECK_ALL_LOADED flag. This is relatively quick because - per code comments - the function merely compares dictionary data "to what we already read with fil_load_single_table_tablespaces".

# Recovery behavior in MySQL 5.7.9:

- We do not have to open all IBD files to build the "space_id-filename" map. Indeed, you no longer see the "Reading tablespace information from the .ibd files" message in the logs.
- However, we still have to call "dict_check_tablespaces_and_store_max_id" (storare/innobase/srv/srv0start.cc) with a flag that causes brute-force validation of tablespaces. When in recovery, the function is called with "validate" flag that forces the underlying code to physically read the first page of every IBD that hasn't been scanned earlier by other recovery steps.

#### RECOVERY PERFORMANCE

- Tested on multiple installations of 5.6.27 and 5.7.9 (source WITH_DEBUG, source with no debug, binary on Debian 7)
- Servers used: Amazon EC2 machines (m4.xlarge, 500GB, 5000 provisioned iops). Also tested on a bare-metal SSD-backed sandbox but I screwed up that one and didn't save the exact performance figures (5.6 was still faster though).
- See below for testing methodology

# Binary distribution (DEB packages for Debian 7)

5.7.9 total startup time with recovery: 5m 53s
5.6.27 total startup time with recovery: 3m 5s

# Source distribution (non-debug)

5.7.9 total startup time with recovery: 3m 25s
5.6.27 total startup time with recovery: 2m 57s

# Summary:

- Total startup time (including recovery) is significantly longer on 5.7.9
- Information gathered with pstack/gdb/perf during recovery confirms that tablespace validation (and resulting IO traffic) is indeed the main factor contributing to recovery time. Here's a sample output from stack:

0x7fa5580dda43: _fini + 0x7fa4ffffdd03 (7ffc45766968, a0a511b68, 7ffc45766af0, ea64000, a, 0) + 190
0x010370be: _ZL17os_file_read_pageR9IORequestiPvmmPmb.constprop.124 + 0x8e (7ffc00000000, 1)
0x01037564: _Z35os_file_read_no_error_handling_funcR9IORequestiPvmmPm + 0x14 (bd7e9ced, a, 0, 117312e, 2100000200, ffffffff00000000) + 1a0
0x011fa1ea: _ZN8Datafile15read_first_pageEb + 0x5ea (eac7310, 176d2e8, 0, 7fa55850c1c8, 0, 7ffc45766cf0) + 1a0
0x011fac02: _ZN8Datafile19validate_first_pageEPm + 0x652 (7ffc45767001, 0, 100000200, 7ffc00000000, 38d88a8, 53d9588) + 160
0x011fb505: _ZN8Datafile14validate_to_ddEmm + 0x55 (fffffffffffffffe, 23e8000, 1, fffffffffffffffe, 1, ffffffffffffffff) + bf0
0x011e227a: _Z12fil_ibd_openbb10fil_type_tmmPKcS1_ + 0x60a (a68c658, 7ffc457692b0, 1007ffc45767e90, 1, 0, 186c2) + 1b20
0x011c5332: _Z39dict_check_tablespaces_and_store_max_idb + 0x1482 (0, 2, 4010000000000000, 4060000000000000, 0, 0) + 74c0
0x010e6900: _Z34innobase_start_or_create_for_mysqlv + 0x40c0 (c2c5c8896cb13356, b8, 7f0100000000, f6ded7, 19, 1fc) + 160
0x00fc2cc4: _ZL13innobase_initPv + 0xba4 (2, 38866b0, 4, 3874800)
0x00935b3e: _Z24ha_initialize_handlertonP13st_plugin_int + 0x4e (7ffc45770fc0, 1e6e560, 7ffc45770fc0, 3874800)
0x00db193c: _ZL17plugin_initializeP13st_plugin_int + 0x4c (0, daf9c0, 0, 7ffc00000001, 7fa500000000, 7fa558501410) + 3a0
0x00db64e8: _Z11plugin_initPiPPci + 0x758 (7fa5585029a0, 7fa5582e5f28, 7ffc457716d0, 7fa5582f410a, 37651b0, f45771350) + 540
0x008e8e65: _Z11mysqld_mainiPPc + 0x8f5 (0, 7ffc457718f8, 200000000, 8c61e0, 0, 482ec90af4e549ce) + ffff8003ba88e7b0 

How to repeat:
1. Begin with a clean installation of MySQL with no user-created tables. Go with default file-per-table settings (1).
2. Create 200K tables of arbitrarily simple structure. What I used was:

mysql -e "create schema discovery;"

for i in `seq 1 200000`; do echo "create table discovery.table_$i (id int primary key auto_increment);" >> create_tables.sql; done;

mysql < create_tables.sql

3. Start MysQL and simulate a crash while leaving some unprocessed data in the logs. The procedure I used was:

for i in `seq 1 100`; do echo "insert into discovery.table_$i values (null);" >> insert_100.sql; done;

(stop mysql)
(flush OS buffers and caches)
(start mysql)

mysql < insert_100.sql && pkill -9 mysqld

4. Start MySQL again and note the total startup time

Suggested fix:
- I would kindly request that the documentation of recovery procedures be improved. Current documentation suggest that 5.7 comes with improvements around recovery time, which doesn't seem to be the case (unless I'm missing something obvious which of course is possible).
- If possible, consider using lazy/deferred tablespace validation so that the server becomes available for queries without the need to check all IBDs first.
[16 Dec 2015 9:43] MySQL Verification Team
Hello Szymon,

Thank you for the report and test case.
I'm not seeing that much difference at my end when comparing 5.6.28 with 5.7.10 start up time(binary tarball used on OL7 with default conf). Could you please try with latest GA, and provide test results along with configuration file used for tests?

-- my local test results with default conf 

// 5.6.28

2015-12-16 08:57:41 17932 [Note] Plugin 'FEDERATED' is disabled.
.
2015-12-16 08:57:41 17932 [Note] InnoDB: Starting crash recovery.
2015-12-16 08:57:41 17932 [Note] InnoDB: Reading tablespace information from the .ibd files...
2015-12-16 08:58:13 17932 [Note] InnoDB: Restoring possible half-written data pages
2015-12-16 08:58:13 17932 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Doing recovery: scanned up to log sequence number 937716651
2015-12-16 08:58:13 17932 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
2015-12-16 08:58:15 17932 [Note] InnoDB: 128 rollback segment(s) are active.
2015-12-16 08:58:15 17932 [Note] InnoDB: Waiting for purge to start
2015-12-16 08:58:15 17932 [Note] InnoDB: 5.6.28 started; log sequence number 937716651
.
Version: '5.6.28-enterprise-commercial-advanced'  socket: '/tmp/mysql_ushastry.sock'  port: 15000  MySQL Enterprise Server - Advanced Edition (Commercial)

// 5.7.10

.
2015-12-16T08:15:51.237413Z 0 [Note] InnoDB: Starting crash recovery.
2015-12-16T08:15:51.244103Z 0 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
2015-12-16T08:15:51.751037Z 0 [Note] InnoDB: Apply batch completed
2015-12-16T08:16:05.174772Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2015-12-16T08:16:05.174809Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
.
.
2015-12-16T08:16:05.485620Z 0 [Note] bin/mysqld: ready for connections.
Version: '5.7.10-enterprise-commercial-advanced'  socket: '/tmp/mysql_ushastry.sock'  port: 15000  MySQL Enterprise Server - Advanced Edition (Commercial)
2015-12-16T08:16:06.458616Z 0 [Note] InnoDB: Buffer pool(s) load completed at 151216  9:16:06

Thanks,
Umesh
[19 Dec 2015 14:24] Szymon Komendera
Hi Umesh,

Thanks for spending your time on the repro. I re-ran the tests against 5.6.27, 5.7.9 and 5.7.10 (source distribution, compiled without debugging).

- More powerful instance type: EC2 m4.2xlarge, 500GB / 5000 IOPS
- Same exact OS installation as the last time (Debian Wheezy 7.9)

I'm attaching a file "Bug79025_iteration2.zip" with all the data you should need.

Unfortunately I've no access to a powerful enough bare-metal machine but let me know if you want me to use another OS or MySQL distribution on EC2. I did try out the binary distribution during previous iteration but I'm happy to do it again (whatever you need to avoid unnecessary legwork on your side).

-
Szymon
[19 Dec 2015 14:25] Szymon Komendera
Bug79025_iteration2

Attachment: Bug79025_iteration2.zip (application/zip, text), 550.53 KiB.