Bug #98238 I_S.KEY_COLUMN_USAGE is very slow
Submitted: 15 Jan 2020 12:26 Modified: 6 May 2020 18:10
Reporter: Manuel Mausz Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Information schema Severity:S5 (Performance)
Version:8 OS:Any
Assigned to: CPU Architecture:Any

[15 Jan 2020 12:26] Manuel Mausz
Description:
After upgrading to MySQL 8 we noticed that certain I_S queries are slowed down significantly. The queries are so slow that upgrades of PHP based websites or even installing new PHP based software like Nextcloud runs into (PHP based) timeouts.

The slow queries all look like this:

  SELECT DISTINCT k.`CONSTRAINT_NAME`, k.`COLUMN_NAME`,
    k.`REFERENCED_TABLE_NAME`, k.`REFERENCED_COLUMN_NAME`
    /*!50116 , c.update_rule, c.delete_rule */
  FROM information_schema.key_column_usage k
  /*!50116 INNER JOIN information_schema.referential_constraints c
    ON c.constraint_name = k.constraint_name AND c.table_name = '<table>' */
  WHERE k.table_name = '<table>'
    AND k.table_schema = '<schema>'
    /*!50116 AND c.constraint_schema = '<schema>' */
    AND k.`REFERENCED_COLUMN_NAME` is not NULL

The query [1] is part of doctrine/dbal, which is a *very* common db abstraction layer used by many PHP based applications like Typo3, Contao, Nextcloud, Owncloud, etc.. [2]

As far as we can tell the query gets executed multiple times for every table in a database. It can be even executed multiple times during a single update.

After looking at the query itself it turns out that I_S.KEY_COLUMN_USAGE is a VIEW using UNION. Since MySQL still lacks support for pushing down conditions with views+unions and our database servers currently have 200.000 tables (and more) the server has to create a massive temp. table while using only a tiny fraction of the rows for the actual result. Optimizing VIEWs + UNIONs are covered in bug #36802 [3]

However as it turns out we can optimize I_S.KEY_COLUMN_USAGE itself by taking advantage of the fact that common database users only have access to a small subset of all tables/databases. We simply have to tell the query optimizer to prioritize looking up the schematas the user has access to first.

The first query of the union looks like this:

  SELECT ...
  FROM mysql.indexes idx
  JOIN mysql.tables tbl on idx.table_id = tbl.id
  JOIN mysql.schemata sch on tbl.schema_id = sch.id
  JOIN mysql.catalogs cat on cat.id = sch.catalog_id
  JOIN mysql.index_column_usage icu on icu.index_id = idx.id
  JOIN mysql.columns col on icu.column_id = col.id and idx.type in ('PRIMARY','UNIQUE')
  WHERE can_access_column(sch.name, tbl.name, col.name)
    and is_visible_dd_object(tbl.hidden, col.hidden <> 'Visible' or idx.hidden or icu.hidden)

To filter on the schematas we can add a `can_access_database(sch.name)` call, however the optimizer will still prioritize mysql.tables first. Using join-order optimizer hints in views is not supported, however we can make us of STRAIGHT_JOIN + a bit of reordering:

  SELECT ...
  FROM mysql.catalogs cat
  STRAIGHT_JOIN mysql.schemata sch on sch.catalog_id = cat.id and can_access_database(sch.name)
  STRAIGHT_JOIN mysql.tables tbl on tbl.schema_id = sch.id
  STRAIGHT_JOIN mysql.indexes idx ON idx.table_id = tbl.id
  JOIN mysql.index_column_usage icu on icu.index_id = idx.id
  JOIN mysql.columns col on icu.column_id = col.id and idx.type in ('PRIMARY','UNIQUE')
  WHERE can_access_column(sch.name, tbl.name, col.name)
    and is_visible_dd_object(tbl.hidden, col.hidden <> 'Visible' or idx.hidden or icu.hidden)

With this optimization the runtime of the I_S query in the common use-case significantly decreased. On one of our production servers it went down from 4.5s to 0.08s.

[1] https://github.com/doctrine/dbal/blob/01c22b7d1f58d5ae40a0e362bbef3fe685087b52/lib/Doctrin...
[2] https://packagist.org/packages/doctrine/dbal/dependents?order_by=downloads
[3] https://bugs.mysql.com/bug.php?id=36802

How to repeat:
I'll attach a bash script which generates SQL to create 100 databases with each containing 600 tables. The I_S query slows down as soon as you have a significant number of tables.

Suggested fix:
Using CAN_ACCESS_DATABASE + STRAIGHT_JOIN to filter on the schemata table early:

--- a/sql/dd/impl/system_views/key_column_usage.cc      2019-12-05 21:21:35.000000000 +0100
+++ b/sql/dd/impl/system_views/key_column_usage.cc      2020-01-14 16:54:28.955615259 +0100
@@ -64,10 +64,10 @@
   first_select.add_field(FIELD_REFERENCED_COLUMN_NAME, "REFERENCED_COLUMN_NAME",
                          "NULL");

-  first_select.add_from("mysql.indexes idx");
-  first_select.add_from("JOIN mysql.tables tbl ON idx.table_id=tbl.id");
-  first_select.add_from("JOIN mysql.schemata sch ON tbl.schema_id=sch.id");
-  first_select.add_from("JOIN mysql.catalogs cat ON cat.id=sch.catalog_id");
+  first_select.add_from("mysql.catalogs cat");
+  first_select.add_from("STRAIGHT_JOIN mysql.schemata sch ON sch.catalog_id=cat.id AND CAN_ACCESS_DATABASE(sch.name)");
+  first_select.add_from("STRAIGHT_JOIN mysql.tables tbl ON tbl.schema_id=sch.id");
+  first_select.add_from("STRAIGHT_JOIN mysql.indexes idx ON idx.table_id=tbl.id");
   first_select.add_from(
       "JOIN mysql.index_column_usage icu"
       " ON icu.index_id=idx.id");

I'm sure this optimization can be applied to more I_S views as well.
[15 Jan 2020 12:27] Manuel Mausz
bash script to generate a lot of tables

Attachment: test_schema_dbs.sh (application/x-shellscript, text), 7.28 KiB.

[15 Jan 2020 14:05] MySQL Verification Team
Hello Mr. Mausz,

Thank you very much for your bug report.

I have tested a modified version of your script and I think that you are right.

However, I am still unclear about what kind of changes are you proposing to our code. Are you proposing that we introduce  support for pushing down conditions with views and / or unions ??? Or are you proposing some other solutions. I hope that you understand that we can't restrict number of tables and / or databases for any user. That is a job of DBA, who can use our privilege system to accomplish it.

Hence, in short, I very much prone to verify your proposition, but it is not yet clear what methods do you propose for the improvement of the speed of querying the above table.

We do require your feedback.
[15 Jan 2020 14:24] Manuel Mausz
Hi Sinisa,

optimizing views + unions would be great however I assume this would be a huge task. So this is not what I'm proposing. 

I'm proposing to alter the view so that the databases/schemas the current user has access to get fetched *early* (before fetching all the tables) and thus the resultset of the join set gets greatly reduced.

The patch to accomplish the change can be found in the "Suggested fix" section. The actual change of the patch can be found above.

Hope this clears things up.
[15 Jan 2020 14:37] MySQL Verification Team
Thank you Mr. Mausz,

I am verifying your performance improvement suggestion.
[15 Jan 2020 14:45] Manuel Mausz
Great. Make sure to re-populate the I_S schema and run the query as a user which only has access to a subset of all databases/tables. Otherwise you won't see any speed improvement (obviously).
[15 Jan 2020 17:47] John LeSueur
Another way to approach a fix would be to define two separate views: INDEX_KEY_COLUMN_USAGE and REFERENTIAL_KEY_COLUMN_USAGE. I'm not sure how desirable that direction would be.
[20 Mar 2020 10:24] Pavel Hladik
Hi, do we have any news on that? We can't move to MySQL 8.x because of this bug. I'm curious if Oracle solve bug or Doctrine refactor query.
[20 Mar 2020 13:16] MySQL Verification Team
Hi All,

When this performance issue is fixed, this page will be updated and all of the subscribers will be notified.

We do not know when will any bug be fixed as this is an internal process in another department.

What you can do is click on the "Affects me" button in the upper right corner. That is all.
[3 Apr 2020 0:45] Øystein Grøvlen
The suggested fix will only work well when each individual user only has access to a limited set of databases/schemas.   I suggest instead to rewrite the view in a way that isolates the union in a lateral table.  That way, it will be possible to merge the view, and the conditions on schema name and table name may be applied before the lateral table is executed.  By using a lateral table only the indexes of the specified table will have to be accessed.  The view would look something like this:

CREATE VIEW my_fk_view AS (
SELECT
cat.name COLLATE utf8_tolower_ci AS CONSTRAINT_CATALOG,
sch.name COLLATE utf8_tolower_ci AS CONSTRAINT_SCHEMA,
constraints.CONSTRAINT_NAME,
cat.name COLLATE utf8_tolower_ci AS TABLE_CATALOG,
sch.name COLLATE utf8_tolower_ci AS TABLE_SCHEMA,
tbl.name COLLATE utf8_tolower_ci AS TABLE_NAME,
col.name COLLATE utf8_tolower_ci AS COLUMN_NAME,
constraints.ORDINAL_POSITION,
constraints.POSITION_IN_UNIQUE_CONSTRAINT,
constraints.REFERENCED_TABLE_SCHEMA,
constraints.REFERENCED_TABLE_NAME,
constraints.REFERENCED_COLUMN_NAME
FROM mysql.tables tbl
JOIN mysql.schemata sch ON tbl.schema_id = sch.id
JOIN mysql.catalogs cat on cat.id = sch.catalog_id,
LATERAL (
SELECT
idx.name AS CONSTRAINT_NAME,
icu.ordinal_position AS ORDINAL_POSITION,
NULL AS POSITION_IN_UNIQUE_CONSTRAINT,
NULL AS REFERENCED_TABLE_SCHEMA,
NULL AS REFERENCED_TABLE_NAME,
NULL AS REFERENCED_COLUMN_NAME,
icu.column_id,
idx.hidden OR icu.hidden AS HIDDEN
FROM mysql.indexes idx
JOIN mysql.index_column_usage icu ON icu.index_id = idx.id
WHERE idx.table_id = tbl.id
  AND idx.type IN ('PRIMARY', 'UNIQUE')
UNION
SELECT
fk.name COLLATE utf8_tolower_ci AS CONSTRAINT_NAME,
fkcu.ordinal_position AS ORDINAL_POSITION,
fkcu.ordinal_position AS POSITION_IN_UNIQUE_CONSTRAINT,
fk.referenced_table_schema AS REFERENCED_TABLE_SCHEMA,
fk.referenced_table_name AS REFERENCED_TABLE_NAME,
fkcu.referenced_column_name AS REFERENCED_COLUMN_NAME,
fkcu.column_id,
FALSE AS HIDDEN
FROM mysql.foreign_keys fk
JOIN mysql.foreign_key_column_usage fkcu ON fkcu.foreign_key_id = fk.id
WHERE fk.table_id = tbl.id
) CONSTRAINTS
JOIN mysql.columns col ON constraints.column_id = col.id
WHERE can_access_column(sch.name, tbl.name, col.name)
  AND is_visible_dd_object(tbl.hidden,
                           col.hidden <> 'Visible' OR constraints.HIDDEN)
);
[3 Apr 2020 0:53] Øystein Grøvlen
Using the above view instead of the original view reduces the execution time for the reported query from around 4 seconds to 0.03 seconds.  (Using the supplied test script)

EXPLAIN shows that the schema and table tables of the view are not merged into the top-level query so that conditions may be applied:

mysql> EXPLAIN SELECT DISTINCT k.`CONSTRAINT_NAME`, k.`COLUMN_NAME`,     k.`REFERENCED_TABLE_NAME`, k.`REFERENCED_COLUMN_NAME`, c.update_rule, c.delete_rule   FROM test.my_fk_view k  INNER JOIN information_schema.referential_constraints c     ON c.constraint_name = k.constraint_name AND c.table_name = 't2'   WHERE k.table_name = 't2'     AND k.table_schema = 'test'  AND c.constraint_schema = 'test'    AND k.`REFERENCED_COLUMN_NAME` is not NULL;
+----+-------------------+------------+------------+--------+------------------------+----------------+---------+-----------------------+-------+----------+--------------------------------------------------------------------------------+
| id | select_type       | table      | partitions | type   | possible_keys          | key            | key_len | ref                   | rows  | filtered | Extra                                                                          |
+----+-------------------+------------+------------+--------+------------------------+----------------+---------+-----------------------+-------+----------+--------------------------------------------------------------------------------+
|  1 | PRIMARY           | cat        | NULL       | index  | PRIMARY                | name           | 194     | NULL                  |     1 |   100.00 | Using index; Using temporary                                                   |
|  1 | PRIMARY           | cat        | NULL       | index  | PRIMARY                | name           | 194     | NULL                  |     1 |   100.00 | Using index; Using join buffer (Block Nested Loop)                             |
|  1 | PRIMARY           | sch        | NULL       | eq_ref | PRIMARY,catalog_id     | catalog_id     | 202     | mysql.cat.id,const    |     1 |   100.00 | Using index                                                                    |
|  1 | PRIMARY           | fk         | NULL       | ref    | schema_id,table_id     | schema_id      | 8       | mysql.sch.id          |    79 |   100.00 | NULL                                                                           |
|  1 | PRIMARY           | tbl        | NULL       | eq_ref | PRIMARY                | PRIMARY        | 8       | mysql.fk.table_id     |     1 |    10.00 | Using where                                                                    |
|  1 | PRIMARY           | tbl        | NULL       | ALL    | schema_id              | NULL           | NULL    | NULL                  | 63046 |   100.00 | Using where; Rematerialize (<derived3>); Using join buffer (Block Nested Loop) |
|  1 | PRIMARY           | sch        | NULL       | eq_ref | PRIMARY,catalog_id     | PRIMARY        | 8       | mysql.tbl.schema_id   |     1 |   100.00 | Using where                                                                    |
|  1 | PRIMARY           | <derived3> | NULL       | ref    | <auto_key1>            | <auto_key1>    | 195     | func                  |     2 |    83.33 | Using where                                                                    |
|  1 | PRIMARY           | col        | NULL       | eq_ref | PRIMARY                | PRIMARY        | 8       | constraints.column_id |     1 |   100.00 | Using where                                                                    |
|  3 | DEPENDENT DERIVED | idx        | NULL       | ref    | PRIMARY,table_id       | table_id       | 8       | mysql.tbl.id          |     2 |    40.00 | Using where                                                                    |
|  3 | DEPENDENT DERIVED | icu        | NULL       | ref    | index_id,index_id_2    | index_id       | 8       | mysql.idx.id          |     4 |   100.00 | NULL                                                                           |
|  4 | DEPENDENT UNION   | fk         | NULL       | ref    | PRIMARY,table_id       | table_id       | 8       | mysql.tbl.id          |     1 |   100.00 | NULL                                                                           |
|  4 | DEPENDENT UNION   | fkcu       | NULL       | ref    | PRIMARY,foreign_key_id | foreign_key_id | 8       | mysql.fk.id           |     1 |   100.00 | Using index                                                                    |
| NULL | UNION RESULT      | <union3,4> | NULL       | ALL    | NULL                   | NULL           | NULL    | NULL                  |  NULL |     NULL | Using temporary                                                                |
+----+-------------------+------------+------------+--------+------------------------+----------------+---------+-----------------------+-------+----------+--------------------------------------------------------------------------------+
14 rows in set, 3 warnings (0.00 sec)
[3 Apr 2020 0:57] Øystein Grøvlen
In order to let my experimental view access the dictionary tables, I turned off the access restriction on dictionary tables with this patch to 8.0.19:

diff --git a/sql/dd/impl/cache/dictionary_client.cc b/sql/dd/impl/cache/dictionary_client.cc
index 786ce196fb2..b0bf204a1e9 100644
--- a/sql/dd/impl/cache/dictionary_client.cc
+++ b/sql/dd/impl/cache/dictionary_client.cc
@@ -2707,7 +2707,7 @@ void Dictionary_client::remove_uncommitted_objects(
     // - The transaction is being committed, not rolled back.
     // - We're not allowing direct access to DD tables.
     if (!m_thd->is_dd_system_thread() && commit_to_shared_cache &&
-        DBUG_EVALUATE_IF("skip_dd_table_access_check", false, true)) {
+        false) { // DBUG_EVALUATE_IF("skip_dd_table_access_check", false, true)) {
       const typename T::Cache_partition *stored_object = nullptr;
       if (!Shared_dictionary_cache::instance()->get_uncached(
               m_thd, id_key, ISO_READ_UNCOMMITTED, &stored_object))
diff --git a/sql/dd/impl/dictionary_impl.cc b/sql/dd/impl/dictionary_impl.cc
index eb53d7944b9..aa6866d7e43 100644
--- a/sql/dd/impl/dictionary_impl.cc
+++ b/sql/dd/impl/dictionary_impl.cc
@@ -346,8 +346,8 @@ bool Dictionary_impl::is_dd_table_access_allowed(bool is_dd_internal_thread,
   */
   if (schema_length != MYSQL_SCHEMA_NAME.length ||
       strncmp(schema_name, MYSQL_SCHEMA_NAME.str, MYSQL_SCHEMA_NAME.length) ||
-      is_dd_internal_thread ||
-      DBUG_EVALUATE_IF("skip_dd_table_access_check", true, false))
+      is_dd_internal_thread ||
+      true) // DBUG_EVALUATE_IF("skip_dd_table_access_check", true, false))
     return true;

   // Now we need to get the table type.
diff --git a/sql/item_create.cc b/sql/item_create.cc
index c19940ee60b..97f982a49f5 100644
--- a/sql/item_create.cc
+++ b/sql/item_create.cc
@@ -1057,7 +1057,7 @@ class Internal_function_factory : public Create_func {
   Item *create_func(THD *thd, LEX_STRING function_name,
                     PT_item_list *item_list) override {
     if (!thd->parsing_system_view && !thd->is_dd_system_thread() &&
-        DBUG_EVALUATE_IF("skip_dd_table_access_check", false, true)) {
+        false) { // DBUG_EVALUATE_IF("skip_dd_table_access_check", false, true)) {
       my_error(ER_NO_ACCESS_TO_NATIVE_FCT, MYF(0), function_name.str);
       return nullptr;
     }
[3 Apr 2020 12:33] MySQL Verification Team
Thank you,  Øystein Grøvlen for your contribution.
[6 May 2020 18:10] Paul DuBois
Posted by developer:
 
Fixed in 8.0.21.

Queries on the INFORMATION_SCHEMA KEY_COLUMN_USAGE and
TABLE_CONSTRAINTS views could be slow due to UNION use in their
definitions. These were rewritten to move the UNION into a LATERAL
table to enable the optimizer to better use indexes.
[7 May 2020 12:32] MySQL Verification Team
Thank you, Paul ......