Bug #43818 | LOJ on I_S too slow | ||
---|---|---|---|
Submitted: | 24 Mar 2009 4:05 | Modified: | 27 Feb 2011 3:11 |
Reporter: | Jared S (Silver Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Information schema | Severity: | S5 (Performance) |
Version: | 5.1.32 | OS: | Any (MS Windows, Linux) |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
Tags: | I_S, loj, qc |
[24 Mar 2009 4:05]
Jared S
[25 Mar 2009 11:11]
Valeriy Kravchuk
Thank you for te problem report. Please, send the results of EXPLAIN for the query in your case. Read http://dev.mysql.com/doc/refman/5.1/en/information-schema-optimization.html for some new optimizations available. In my case I've got: mysql> explain SELECT t1.referenced_table_name, t1.referenced_column_name, t0.OR DINAL_POSITION, -> t0.COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE, EXTRA, COLUMN_DEFAULT, COLUMN_C OMMENT -> FROM INFORMATION_SCHEMA.`COLUMNS` as t0 -> left outer JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS t1 ON t1.TABLE_SCH EMA = -> t0.TABLE_SCHEMA AND t1.TABLE_NAME = t0.TABLE_NAME AND t1.COLUMN_NAME = t0 .COLUMN_NAME -> WHERE t0.TABLE_SCHEMA = schema() AND t0.TABLE_NAME = 't2' -> ORDER BY t0.TABLE_NAME, t0.ORDINAL_POSITION, t0.COLUMN_NAME, DATA_TYPE, -> CHARACTER_MAXIMUM_LENGTH\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t0 type: ALL possible_keys: NULL key: TABLE_SCHEMA,TABLE_NAME key_len: NULL ref: NULL rows: NULL Extra: Using where; Open_frm_only; Scanned 0 databases; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: t1 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: 2 rows in set (0.02 sec) and the query was fast enough: mysql> SELECT t1.referenced_table_name, t1.referenced_column_name, t0.ORDINAL_PO SITION, -> t0.COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE, EXTRA, COLUMN_DEFAULT, COLUMN_C OMMENT -> FROM INFORMATION_SCHEMA.`COLUMNS` as t0 -> left outer JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS t1 ON t1.TABLE_SCH EMA = -> t0.TABLE_SCHEMA AND t1.TABLE_NAME = t0.TABLE_NAME AND t1.COLUMN_NAME = t0 .COLUMN_NAME -> WHERE t0.TABLE_SCHEMA = schema() AND t0.TABLE_NAME = 't2' -> ORDER BY t0.TABLE_NAME, t0.ORDINAL_POSITION, t0.COLUMN_NAME, DATA_TYPE, -> CHARACTER_MAXIMUM_LENGTH\G *************************** 1. row *************************** referenced_table_name: NULL referenced_column_name: NULL ORDINAL_POSITION: 1 COLUMN_NAME: c1 COLUMN_TYPE: varchar(2) IS_NULLABLE: YES EXTRA: COLUMN_DEFAULT: NULL COLUMN_COMMENT: 1 row in set (0.14 sec) mysql> select version(); +------------------+ | version() | +------------------+ | 5.1.32-community | +------------------+ 1 row in set (0.01 sec)
[29 Mar 2009 22:28]
Jared S
Please specify a Table with foriegn key in it. This will increase your time of (0.14 sec) to (0.72 sec). Maybe you have fully linked InnoDB for testing. I have seperated these querys in my app for performance reasons. But since I have 'implemented a workaround' I thought you may wish to have S5 against this issue. Also I have performaed a test on 2 normal postcodes tables and when I perform a select LOJ on their suburb name(which is a varchar field), performance is exremely fast (0.02 secs). I_S displays slow performance when joining varchar fields!
[23 Sep 2009 8:51]
Sveta Smirnova
Thank you for the report. Verified as described.
[23 Sep 2009 8:52]
Sveta Smirnova
test case
Attachment: bug43818.test (application/octet-stream, text), 2.08 KiB.
[23 Sep 2009 8:53]
Sveta Smirnova
option file
Attachment: bug43818-master.opt (application/octet-stream, text), 31 bytes.
[23 Sep 2009 8:53]
Sveta Smirnova
In my case query on information_schema took 4 seconds while query on similar tables in test schema took less than 1 second.
[10 Nov 2009 10:36]
Oli Sennhauser
This problem also pops up when MEM agent is used against a database with many tables (30k). See also bug #48660. Create 30k tables (script attached). explain SELECT GROUP_CONCAT('\\\\\n*', t.table_schema, '.', t.table_name) AS table_list FROM information_schema.tables t LEFT JOIN information_schema.table_constraints c ON (t.table_schema = c.table_schema AND t.table_name = c.table_name AND c.constraint_type IN ('PRIMARY KEY','UNIQUE')) WHERE t.table_schema NOT IN ('mysql','information_schema') AND t.engine NOT IN ('ARCHIVE','FEDERATED') AND c.table_name IS NULL ; +----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------------------------+ | 1 | SIMPLE | t | ALL | NULL | NULL | NULL | NULL | NULL | Using where; Open_frm_only; Scanned all databases | | 1 | SIMPLE | c | ALL | NULL | NULL | NULL | NULL | NULL | Using where; Not exists | +----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------------------------+ I assume the Open_frm_only "optimization is the reason for the slow down! See iostat: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 49.50 2.97 405.94 0.99 6534.65 31.68 16.14 5.84 14.32 2.41 98.22 sda 144.00 0.00 496.00 0.00 7288.00 0.00 14.69 4.56 9.31 2.01 99.60 sda 20.20 167.68 512.12 17.17 5616.16 210.10 11.01 30.26 8.36 1.85 98.18 sda 0.00 0.00 294.12 53.92 2352.94 1670.59 11.56 14.39 113.43 2.77 96.47 Execution time takes: 1 row in set, 1 warning (1 min 23.71 sec) 1 row in set, 1 warning (1 min 12.94 sec) 1 row in set, 1 warning (1 min 4.36 sec) Setting open_files_limit to 50000 does not help much: 1 row in set, 1 warning (1 min 17.55 sec) 1 row in set, 1 warning (1 min 5.76 sec) 1 row in set, 1 warning (1 min 0.19 sec) Next idea was to do the same query on "normal" MEMORY tables: set session max_heap_table_size = 128*1024*1024; show table status; +-------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment | +-------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+ | table_constraints | MEMORY | 10 | Fixed | 24 | 2504 | 130416 | 134001560 | 0 | 0 | NULL | NULL | NULL | NULL | utf8_general_ci | NULL | | | | tables | MEMORY | 10 | Fixed | 30052 | 3545 | 107146204 | 134103805 | 0 | 0 | NULL | NULL | NULL | NULL | utf8_general_ci | NULL | | | +-------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+ Execution plan looks different and performance is much better: +----+-------------+-------+------+---------------+------+---------+------+-------+-------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+-------+-------------------------+ | 1 | SIMPLE | t | ALL | NULL | NULL | NULL | NULL | 30052 | Using where | | 1 | SIMPLE | c | ALL | NULL | NULL | NULL | NULL | 24 | Using where; Not exists | +----+-------------+-------+------+---------------+------+---------+------+-------+-------------------------+ 1 row in set, 1 warning (0.83 sec) 1 row in set, 1 warning (0.86 sec) 1 row in set, 1 warning (0.88 sec) Adding an index helps again: alter table table_constraints add index (table_schema, table_name); +----+-------------+-------+------+---------------+--------------+---------+---------------------------------------+-------+-------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+--------------+---------+---------------------------------------+-------+-------------------------+ | 1 | SIMPLE | t | ALL | TABLE_SCHEMA | NULL | NULL | NULL | 30052 | Using where | | 1 | SIMPLE | c | ref | TABLE_SCHEMA | TABLE_SCHEMA | 388 | test.t.TABLE_SCHEMA,test.t.TABLE_NAME | 2 | Using where; Not exists | +----+-------------+-------+------+---------------+--------------+---------+---------------------------------------+-------+-------------------------+ 1 row in set, 1 warning (0.19 sec) 1 row in set, 1 warning (0.19 sec) 1 row in set, 1 warning (0.19 sec) Impact: Huge memory consumption.
[10 Nov 2009 10:37]
Oli Sennhauser
Skript to create 30k tables within 200 schemata
Attachment: create_30k_tables.php (application/x-httpd-php, text), 975 bytes.
[28 Jul 2010 12:37]
Simon Mudd
Another problem situation is if multiple clients query I_S in parallel looking for tables that match certain patterns. This can be very expensive on some of our servers which have over 2000 tables split over a large number of databases. Other RDBMSes handle this much better but that's because they don't have to search for the information by looking around the filesystems and each table's definition each time. They already store and keep this information in a format suitable for fast querying. If you are used to querying the metadata inside the DB it becomes rather worrying that it is actually quicker to run scripts outside the database.
[5 Aug 2010 20:36]
Kevin Lewis
Straight SQL to build the database and test it.
Attachment: bug43818.sql (application/octet-stream, text), 3.18 KiB.
[5 Aug 2010 20:37]
Kevin Lewis
An analysis by Kevin Lewis of where the bulk of time is being spent.
Attachment: bug43818.analysis.txt (text/plain), 3.27 KiB.
[5 Aug 2010 21:16]
Kevin Lewis
I analyzed where the time was being spent in this query and found that the great bulk of the time was in InnoDB when the information_schema code requests that it recalculate statistics for each file. The optimizer can do nothing about this. See the attached file called bug43818.analysis.txt. It shows that if the files being accessed are created in another engine such as MyISAM, then the slow performance does not occur. But that is not generally an option. For a test where 600 files are created in InnoDB, the query on my test machine took about 10 seconds. Almost all of that time was spent in ha_innobase.cc; ha_innobase::info(uint flag). In particular, whenever the HA_STATUS_TIME flag is used, this function uses a lot of time to update the statistics about the database. It probably should not be done for every row of output in information_schema.key_column_usage. I am looking for a way to avoid calling handler::info(HA_STATUS_TIME) too often.
[20 Aug 2010 15:12]
Kevin Lewis
The bulk of extra time taken during these queries is in an InnoDB function named dict_update_statistics(table). This must be done each time a table is opened for the first time, but it does not seem to be necessary when information_schema is querying TABLE_CONSTRANTS, KEY_COLUMN_USAGE, or REFERENTIAL_CONSTRAINTS. In this case, the query into INFORMATION_SCHEMA.KEY_COLUMN_USAGE will open every possible table that the MySQL instance knows about and update the statistics for each one. Each table in InnoDB will have its statistics updated, which I do not think is necessary. Note that the first time this query is run, it must initially open all the tables, which will initialize these statistics at least the first time. The proposed change (see attached file Bug43818.no-info.diff) does not fix the slowness of the query the first time it is run on a newly started server, but every subsequent query will rum many times faster, depending on the number of InnoDB tables in the system.
[20 Aug 2010 15:13]
Kevin Lewis
Proposed fix
Attachment: Bug43818.no-info.diff (application/octet-stream, text), 1.76 KiB.
[20 Aug 2010 15:19]
Kevin Lewis
Reviewers; Please determine if the lack of newly updated statistics should be a problem for these three information schema tables. Also, please determine if fixing the performance of all but the first query (before opening any InnoDB tables) will be sufficient.
[8 Sep 2010 18:23]
Vasil Dimov
Hello, I assume ::info() is called for each row from INFORMATION_SCHEMA.KEY_COLUMN_USAGE. http://dev.mysql.com/doc/refman/5.1/en/key-column-usage-table.html This means that when a SELECT is made from that table, ::info() is called once for each table for each index in that table, for each column in that index that has a constraint on it. This looks like an overkill to me.
[8 Sep 2010 18:30]
Vasil Dimov
I mean: for each table for each index in that table for each column in that index that has a constraint on it
[9 Sep 2010 19:19]
Kevin Lewis
Sergey Glukhov wrote in an email; KEY_COLUMN_USAGE, TABLE_CONSTRAINTS & REFERENTIAL_CONTRAINTS retrieve only DDL(or metadata) information, there are no fields which hold statistical information. We use handler::get_foreign_key_list for retrieving necessary information and ::info() call does not affect get_foreign_key_list() behaviour. So ::info() call is unnecessary atm. Why was it added? I suppose It was added for TABLE_CONSTRAINTS by mistake and erroneously propagated to KEY_COLUMN_USAGE & REFERENTIAL_CONTRAINTS tables later. ::info() call could be useful if we have any plans to add statistical fields into these tables but imho it's not good to have a mix of metadata & statistical info in one table. We have such tables(I_S.TABLES& I_S.STATISTICS) but these tables were implemented such way by historical reasons to be old SHOW commands compatible. Taking into account that ::info() call does not affect metadata retrieving I think :info() should be removed. Vasil Dimov wrote; [...] Seconded.
[9 Sep 2010 19:20]
Kevin Lewis
Pushed to mysql-next-mr-bugfixing.
[2 Oct 2010 18:14]
Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101002181053-6iotvl26uurcoryp) (version source revid:alexander.nozdrin@oracle.com-20101002180917-h0n62akupm3z20nt) (pib:21)
[13 Nov 2010 16:22]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[22 Nov 2010 14:30]
Kevin Lewis
James, very good description. Thank you.
[14 Dec 2010 23:41]
Paul DuBois
Setting report to Need Merge pending push into 5.1.x, 5.5.x, if it will go into those trees. If not, just close the report, it does not need a 5.6.x changelog entry.
[23 Dec 2010 22:59]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127568 3671 kevin.lewis@oracle.com 2010-12-23 Bug#43818 - Patch for mysql-5.1-innodb Avoid handler::info() call for three Information Schema tables; TABLE_CONSTRAINTS, KEY_COLUMN_USAGE, & REFERENTIAL_CONTRAINTS
[4 Jan 2011 20:50]
Kevin Lewis
Patch pushed to mysql-5.1-innodb
[27 Feb 2011 3:11]
Paul DuBois
Noted in 5.1.55, 5.5.9 changelogs. Queries involving InnoDB tables in the INFORMATON_SCHEMA tables TABLE_CONSTRANTS, KEY_COLUMN_USAGE, or REFERENTIAL_CONSTRAINTS were slower than necessary because statistics were rechecked more often than required, even more so when many foreign keys were present. The improvement to this may be of particular benefit to users of MySQL Enterprise Monitor with many monitored servers or tens of thousands of tables. CHANGESET - http://lists.mysql.com/commits/127568