MySQL Bugs: #43818: LOJ on I

Bug #43818	LOJ on I_S too slow
Submitted:	24 Mar 2009 4:05	Modified:	27 Feb 2011 3:11
Reporter:	Jared S (Silver Quality Contributor)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Information schema	Severity:	S5 (Performance)
Version:	5.1.32	OS:	Any (MS Windows, Linux)
Assigned to:	Kevin Lewis	CPU Architecture:	Any
Tags:	I_S, loj, qc

Description:
I_S script takes up to .5 second to perform execution

How to repeat:
-- swap 'your table' with a valid table name of selected schema

SELECT t1.referenced_table_name, t1.referenced_column_name, t0.ORDINAL_POSITION, t0.COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE, EXTRA, COLUMN_DEFAULT, COLUMN_COMMENT
FROM INFORMATION_SCHEMA.`COLUMNS` as t0
left outer JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS t1 ON t1.TABLE_SCHEMA = t0.TABLE_SCHEMA AND t1.TABLE_NAME = t0.TABLE_NAME AND t1.COLUMN_NAME = t0.COLUMN_NAME
WHERE t0.TABLE_SCHEMA = schema() AND t0.TABLE_NAME = 'your_table'
ORDER BY t0.TABLE_NAME, t0.ORDINAL_POSITION, t0.COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH

Suggested fix:
engineer steps through and adjusts optimizer.

Thank you for te problem report. Please, send the results of EXPLAIN for the query in your case. Read http://dev.mysql.com/doc/refman/5.1/en/information-schema-optimization.html for some new optimizations available.

In my case I've got:

mysql> explain SELECT t1.referenced_table_name, t1.referenced_column_name, t0.OR
DINAL_POSITION,
    -> t0.COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE, EXTRA, COLUMN_DEFAULT, COLUMN_C
OMMENT
    -> FROM INFORMATION_SCHEMA.`COLUMNS` as t0
    -> left outer JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS t1 ON t1.TABLE_SCH
EMA =
    -> t0.TABLE_SCHEMA AND t1.TABLE_NAME = t0.TABLE_NAME AND t1.COLUMN_NAME = t0
.COLUMN_NAME
    -> WHERE t0.TABLE_SCHEMA = schema() AND t0.TABLE_NAME = 't2'
    -> ORDER BY t0.TABLE_NAME, t0.ORDINAL_POSITION, t0.COLUMN_NAME, DATA_TYPE,
    -> CHARACTER_MAXIMUM_LENGTH\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t0
         type: ALL
possible_keys: NULL
          key: TABLE_SCHEMA,TABLE_NAME
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using where; Open_frm_only; Scanned 0 databases; Using temporary;
 Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra:
2 rows in set (0.02 sec)

and the query was fast enough:

mysql> SELECT t1.referenced_table_name, t1.referenced_column_name, t0.ORDINAL_PO
SITION,
    -> t0.COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE, EXTRA, COLUMN_DEFAULT, COLUMN_C
OMMENT
    -> FROM INFORMATION_SCHEMA.`COLUMNS` as t0
    -> left outer JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS t1 ON t1.TABLE_SCH
EMA =
    -> t0.TABLE_SCHEMA AND t1.TABLE_NAME = t0.TABLE_NAME AND t1.COLUMN_NAME = t0
.COLUMN_NAME
    -> WHERE t0.TABLE_SCHEMA = schema() AND t0.TABLE_NAME = 't2'
    -> ORDER BY t0.TABLE_NAME, t0.ORDINAL_POSITION, t0.COLUMN_NAME, DATA_TYPE,
    -> CHARACTER_MAXIMUM_LENGTH\G
*************************** 1. row ***************************
 referenced_table_name: NULL
referenced_column_name: NULL
      ORDINAL_POSITION: 1
           COLUMN_NAME: c1
           COLUMN_TYPE: varchar(2)
           IS_NULLABLE: YES
                 EXTRA:
        COLUMN_DEFAULT: NULL
        COLUMN_COMMENT:
1 row in set (0.14 sec)

mysql> select version();
+------------------+
| version()        |
+------------------+
| 5.1.32-community |
+------------------+
1 row in set (0.01 sec)

Please specify a Table with foriegn key in it.  This will increase your time of (0.14 sec) to (0.72 sec).

Maybe you have fully linked InnoDB for testing.  I have seperated these querys in my app for performance reasons.  But since I have 'implemented a workaround' I thought you may wish to have S5 against this issue.

Also I have performaed a test on 2 normal postcodes tables and when I perform a select LOJ on their suburb name(which is a varchar field), performance is exremely fast (0.02 secs).

I_S displays slow performance when joining varchar fields!

Thank you for the report.

Verified as described.

test case

Attachment: bug43818.test (application/octet-stream, text), 2.08 KiB.

option file

Attachment: bug43818-master.opt (application/octet-stream, text), 31 bytes.

In my case query on information_schema took 4 seconds while query on similar tables in test schema took less than 1 second.

This problem also pops up when MEM agent is used against a database with many tables (30k). See also bug #48660.

Create 30k tables (script attached).

explain
SELECT GROUP_CONCAT('\\\\\n*', t.table_schema, '.', t.table_name) AS table_list
  FROM information_schema.tables t
  LEFT JOIN information_schema.table_constraints c ON (t.table_schema = c.table_schema
                                                   AND t.table_name = c.table_name
                                                   AND c.constraint_type IN ('PRIMARY KEY','UNIQUE'))
 WHERE t.table_schema NOT IN ('mysql','information_schema')
   AND t.engine NOT IN ('ARCHIVE','FEDERATED')
   AND c.table_name IS NULL
;

+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                             |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------------------------+
|  1 | SIMPLE      | t     | ALL  | NULL          | NULL | NULL    | NULL | NULL | Using where; Open_frm_only; Scanned all databases |
|  1 | SIMPLE      | c     | ALL  | NULL          | NULL | NULL    | NULL | NULL | Using where; Not exists                           |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------------------------+

I assume the Open_frm_only "optimization is the reason for the slow down! See iostat:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda              49.50     2.97  405.94    0.99  6534.65    31.68    16.14     5.84   14.32   2.41  98.22
sda             144.00     0.00  496.00    0.00  7288.00     0.00    14.69     4.56    9.31   2.01  99.60
sda              20.20   167.68  512.12   17.17  5616.16   210.10    11.01    30.26    8.36   1.85  98.18
sda               0.00     0.00  294.12   53.92  2352.94  1670.59    11.56    14.39  113.43   2.77  96.47

Execution time takes:

1 row in set, 1 warning (1 min 23.71 sec)
1 row in set, 1 warning (1 min 12.94 sec)
1 row in set, 1 warning (1 min 4.36 sec)

Setting open_files_limit to 50000 does not help much:

1 row in set, 1 warning (1 min 17.55 sec)
1 row in set, 1 warning (1 min 5.76 sec)
1 row in set, 1 warning (1 min 0.19 sec)

Next idea was to do the same query on "normal" MEMORY tables:

set session max_heap_table_size = 128*1024*1024;

show table status;
+-------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
| Name              | Engine | Version | Row_format | Rows  | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation       | Checksum | Create_options | Comment |
+-------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
| table_constraints | MEMORY |      10 | Fixed      |    24 |           2504 |      130416 |       134001560 |            0 |         0 |           NULL | NULL        | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
| tables            | MEMORY |      10 | Fixed      | 30052 |           3545 |   107146204 |       134103805 |            0 |         0 |           NULL | NULL        | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
+-------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+

Execution plan looks different and performance is much better:
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                   |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------------------+
|  1 | SIMPLE      | t     | ALL  | NULL          | NULL | NULL    | NULL | 30052 | Using where             |
|  1 | SIMPLE      | c     | ALL  | NULL          | NULL | NULL    | NULL |    24 | Using where; Not exists |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------------------+

1 row in set, 1 warning (0.83 sec)
1 row in set, 1 warning (0.86 sec)                                                                                                                                                                   
1 row in set, 1 warning (0.88 sec)

Adding an index helps again:

alter table table_constraints add index (table_schema, table_name);

+----+-------------+-------+------+---------------+--------------+---------+---------------------------------------+-------+-------------------------+
| id | select_type | table | type | possible_keys | key          | key_len | ref                                   | rows  | Extra                   |
+----+-------------+-------+------+---------------+--------------+---------+---------------------------------------+-------+-------------------------+
|  1 | SIMPLE      | t     | ALL  | TABLE_SCHEMA  | NULL         | NULL    | NULL                                  | 30052 | Using where             |
|  1 | SIMPLE      | c     | ref  | TABLE_SCHEMA  | TABLE_SCHEMA | 388     | test.t.TABLE_SCHEMA,test.t.TABLE_NAME |     2 | Using where; Not exists |
+----+-------------+-------+------+---------------+--------------+---------+---------------------------------------+-------+-------------------------+

1 row in set, 1 warning (0.19 sec)
1 row in set, 1 warning (0.19 sec)
1 row in set, 1 warning (0.19 sec)

Impact: Huge memory consumption.

Skript to create 30k tables within 200 schemata

Attachment: create_30k_tables.php (application/x-httpd-php, text), 975 bytes.

Another problem situation is if multiple clients query I_S in parallel looking for tables that match certain patterns. This can be very expensive on some of our servers which have over 2000 tables split over a large number of databases.

Other RDBMSes handle this much better but that's because they don't have to search for the information by looking around the filesystems and each table's definition each time. They already store and keep this information in a format suitable for fast querying.  If you are used to querying the metadata inside the DB it becomes rather worrying that it is actually quicker to run scripts outside the database.

Straight SQL to build the database and test it.

Attachment: bug43818.sql (application/octet-stream, text), 3.18 KiB.

An analysis by Kevin Lewis of where the bulk of time is being spent.

Attachment: bug43818.analysis.txt (text/plain), 3.27 KiB.

I analyzed where the time was being spent in this query and found that the great bulk of the time was in InnoDB when the information_schema code requests that it recalculate statistics for each file.  The optimizer can do nothing about this.

See the attached file called bug43818.analysis.txt.  It shows that if the files being accessed are created in another engine such as MyISAM, then the slow performance does not occur.  But that is not generally an option.  For a test where 600 files are created in InnoDB, the query on my test machine took about 10 seconds.  Almost all of that time was spent in ha_innobase.cc; ha_innobase::info(uint flag). In particular, whenever the HA_STATUS_TIME flag is used, this function uses a lot of time to update the statistics about the database.  It probably should not be done for every row of output in information_schema.key_column_usage.  I am looking for a way to avoid calling handler::info(HA_STATUS_TIME) too often.

The bulk of extra time taken during these queries is in an InnoDB function named dict_update_statistics(table).  This must be done each time a table is opened for the first time, but it does not seem to be necessary when information_schema is querying TABLE_CONSTRANTS, KEY_COLUMN_USAGE, or REFERENTIAL_CONSTRAINTS.  

In this case, the query into INFORMATION_SCHEMA.KEY_COLUMN_USAGE will open every possible table that the MySQL instance knows about and update the statistics for each one.  Each table in InnoDB will have its statistics updated, which I do not think is necessary. Note that the first time this query is run, it must initially open all the tables, which will initialize these statistics at least the first time.

The proposed change (see attached file Bug43818.no-info.diff) does not fix the slowness of the query the first time it is run on a newly started server, but every subsequent query will rum many times faster, depending on the number of InnoDB tables in the system.

Proposed fix

Attachment: Bug43818.no-info.diff (application/octet-stream, text), 1.76 KiB.

Reviewers;  Please determine if the lack of newly updated statistics should be a problem for these three information schema tables.  Also, please determine if fixing the performance of all but the first query (before opening any InnoDB tables) will be sufficient.

Hello,

I assume ::info() is called for each row from INFORMATION_SCHEMA.KEY_COLUMN_USAGE.

http://dev.mysql.com/doc/refman/5.1/en/key-column-usage-table.html

This means that when a SELECT is made from that table, ::info() is called once for each table for each index in that table, for each column in that index that has a constraint on it.

This looks like an overkill to me.

I mean:

for each table
  for each index in that table
    for each column in that index that has a constraint on it

Sergey Glukhov wrote in an email;

KEY_COLUMN_USAGE, TABLE_CONSTRAINTS & REFERENTIAL_CONTRAINTS retrieve only DDL(or metadata) information, there are no fields which hold statistical
information. We use handler::get_foreign_key_list for retrieving necessary information and ::info() call does not affect get_foreign_key_list() behaviour.
So ::info() call is unnecessary atm.  Why was it added? I suppose It was added for TABLE_CONSTRAINTS by mistake and erroneously propagated to KEY_COLUMN_USAGE & REFERENTIAL_CONTRAINTS tables later.

::info() call could be useful if we have any plans to add statistical fields
into these tables but imho it's not good to have a mix of metadata & statistical info in one table. We have such tables(I_S.TABLES& I_S.STATISTICS) but these tables were implemented such way by historical reasons to be old SHOW commands compatible.

Taking into account that ::info() call does not affect metadata retrieving I think :info() should be removed. 

Vasil Dimov wrote;
[...]   Seconded.

Pushed to mysql-next-mr-bugfixing.

Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101002181053-6iotvl26uurcoryp) (version source revid:alexander.nozdrin@oracle.com-20101002180917-h0n62akupm3z20nt) (pib:21)

Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)

James, very good description.  Thank you.

Setting report to Need Merge pending push into 5.1.x, 5.5.x, if it will go into those trees. If not, just close the report, it does not need a 5.6.x changelog entry.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/127568

3671 kevin.lewis@oracle.com	2010-12-23
      Bug#43818 - Patch for mysql-5.1-innodb
      Avoid handler::info() call for three Information Schema tables;
      TABLE_CONSTRAINTS, KEY_COLUMN_USAGE, & REFERENTIAL_CONTRAINTS

Patch pushed to mysql-5.1-innodb

Noted in 5.1.55, 5.5.9 changelogs.

Queries involving InnoDB tables in the INFORMATON_SCHEMA tables
TABLE_CONSTRANTS, KEY_COLUMN_USAGE, or REFERENTIAL_CONSTRAINTS were
slower than necessary because statistics were rechecked more often
than required, even more so when many foreign keys were present. The
improvement to this may be of particular benefit to users of MySQL
Enterprise Monitor with many monitored servers or tens of thousands
of tables. 

CHANGESET - http://lists.mysql.com/commits/127568