Bug #72303 | Improve tablespace import logging, documentation (and speed) | ||
---|---|---|---|
Submitted: | 10 Apr 2014 10:31 | Modified: | 15 Apr 2014 9:46 |
Reporter: | Simon Mudd (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 5.6.15 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | import, innodb, Tablespace |
[10 Apr 2014 10:31]
Simon Mudd
[14 Apr 2014 8:17]
MySQL Verification Team
Hello Simon, Thank you for the report. Verified as described. Thanks, Umesh
[15 Apr 2014 1:24]
Sunny Bains
1. I agree that that tablespace/table name in the messages will make it more useful. 2. Phase II - Purge records from index … -- this will be printed only if there records to purge 3. Nothing has been done in 5.7 to speed things up. 4. Can you profile the 4 hour scan and let me know where the cost is during the import? 5. Along with the file name do you think the .ibd file size should also be printed and a short summary on completion, number of records scanned/purged, number of indexes etc.
[15 Apr 2014 9:46]
Simon Mudd
Hi, You ask if I can profile this. I'm not quite sure how that needs to be done but I can try and take a look though this import was a one off migration. I can probably simulate it in a dev environment if you can provide me some pointers. Note: several tables were involved, the largest, 59G, had this structure: CREATE TABLE `mytable` ( `date` date NOT NULL DEFAULT '0000-00-00', `id1` int(10) unsigned NOT NULL DEFAULT '0', `field1` int(11) NOT NULL DEFAULT '0', `field2` char(2) NOT NULL DEFAULT '', `field3` enum('enum1','enum2','enum3','enum4') NOT NULL, `id2` int(10) unsigned NOT NULL DEFAULT '0', `id3` int(10) unsigned NOT NULL DEFAULT '0', `field4` int(10) unsigned NOT NULL DEFAULT '0', `field5` int(10) unsigned NOT NULL DEFAULT '0', `field6` int(10) unsigned NOT NULL DEFAULT '0', `field7` int(10) unsigned NOT NULL DEFAULT '0', `field8` int(10) unsigned NOT NULL DEFAULT '0', `field9` double NOT NULL DEFAULT '0', `field10` double NOT NULL DEFAULT '0', `field11` double NOT NULL DEFAULT '0', `field12` double NOT NULL DEFAULT '0', `field13` double NOT NULL DEFAULT '0', `field14` double NOT NULL DEFAULT '0', `field15` double NOT NULL DEFAULT '0', `field16` double NOT NULL DEFAULT '0', `field17` int(10) unsigned NOT NULL DEFAULT '0', `field18` int(10) unsigned NOT NULL DEFAULT '0', `field19` int(10) unsigned NOT NULL DEFAULT '0', `field20` int(10) unsigned NOT NULL DEFAULT '0', `field21` int(10) unsigned NOT NULL DEFAULT '0', `field22` double NOT NULL DEFAULT '0', `field23` double NOT NULL DEFAULT '0', `field24` double NOT NULL DEFAULT '0', `field25` double NOT NULL DEFAULT '0', `field26` double NOT NULL DEFAULT '0', `field27` double NOT NULL DEFAULT '0', `field28` double NOT NULL DEFAULT '0', `field29` double NOT NULL DEFAULT '0', `last_changed` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`date`,`id1`,`field1`,`field2`,`field3`), KEY `id1` (`id1`,`field2`,`date`), KEY `id1_3` (`id1`,`date`), KEY `id3` (`id3`,`field2`,`date`), KEY `id3_2` (`id3`,`date`), KEY `id2_field2_date` (`id2`,`field2`,`date`), KEY `id2_date` (`id2`,`date`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8 and I_S says: TABLE_CATALOG: def TABLE_SCHEMA: myschema TABLE_NAME: mytable TABLE_TYPE: BASE TABLE ENGINE: InnoDB VERSION: 10 ROW_FORMAT: Compressed TABLE_ROWS: 248043278 AVG_ROW_LENGTH: 147 DATA_LENGTH: 36602118144 MAX_DATA_LENGTH: 0 INDEX_LENGTH: 24467931136 DATA_FREE: 4194304 AUTO_INCREMENT: NULL CREATE_TIME: 2014-04-07 18:17:43 UPDATE_TIME: NULL CHECK_TIME: NULL TABLE_COLLATION: latin1_swedish_ci CHECKSUM: NULL CREATE_OPTIONS: row_format=COMPRESSED KEY_BLOCK_SIZE=8 TABLE_COMMENT: Other tables (statistics summaries) are defined similarly. I think it's useful to provide some more statistics on completing. I'm not sure how much this feature is being used at the moment, but suspect it will be used more frequently and consequently providing brief (1-line) information seems useful, especially as this import process takes some time.
[15 Apr 2014 23:42]
Sunny Bains
Thanks for the schema. There are (potentially) two parts to this problem: 1. I want to know where the CPU bottlenecks are and what we can do about it 2. IO issues around compressed tables The compressed table IMPORT is done on a page by page basis. For un-compressed tables we use 4MB buffers for reading and writing. I was always a little concerned about the compressed table IMPORT even when I implemented this functionality. Looks like we may have to revisit this issue. FWIW, this is the comment from when this functionality was implemented, see fil0fil.cc. I hope the "trivially" still holds :-) and actually fixes this issue. ********************************************************************//** TODO: This can be made parallel trivially by chunking up the file and creating a callback per thread. . Main benefit will be to use multiple CPUs for checksums and compressed tables. We have to do compressed tables block by block right now. Secondly we need to decompress/compress and copy too much of data. These are CPU intensive. Iterate over all the pages in the tablespace. @param iter Tablespace iterator @param block block to use for IO @param callback Callback to inspect and update page contents @retval DB_SUCCESS or error code */ static dberr_t fil_iterate(
[15 Apr 2014 23:44]
Sunny Bains
I forgot, for the CPU profiling I was mainly interested in the output of "perf top -z" during the longer phase of the import and/or "perf record -a -g -F1999; perf report", on Linux.