Bug #108963 Range analysis for IN list has O(n^2) runtime complexity
Submitted: 2 Nov 2022 7:40 Modified: 1 Jun 2023 20:22
Reporter: Manuel Ung Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0.31, 5.7.40 OS:Any
Assigned to: CPU Architecture:Any

[2 Nov 2022 7:40] Manuel Ung
Description:
When performing range analysis for IN lists with row value expressions (as described in https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html#row-constructor-range-opti...), the runtime complexity is O(n^2) where n is the number of elements in the IN list.

This means that some queries with many elements could take a long time to finish optimizing, and during that time the query is unkillable.

How to repeat:

We have to generate a query with 10000 elements in the IN list. One way to do that is to run this in bash. The first command generates a query with row value expressions. The second command generates one with scalar expressions (which does not have this bug).

$ echo "SELECT * FROM t WHERE (a, b) IN ($(for i in $(seq -f "%05g" 0 9999); do echo "('$i', 1),"; done; echo "('10000', 1)"))" > slowquery
$ echo "SELECT * FROM t WHERE a IN ($(for i in $(seq -f "%05g" 0 9999); do echo "'$i',"; done; echo "'10000'"))" > fastquery

On a release build:

mysql> create table t (a varchar(255), b int, primary key(a, b)) charset=latin1;
mysql> set range_optimizer_max_mem_size = 1000000000;

mysql> source ~/fastquery
Empty set (0.02 sec)

mysql> source ~/slowquery
Empty set (23.92 sec)

You can also verify that the slow query is unkillable (ie. stays in Killed state while consuming CPU) by issuing a KILL command from another connection.

Suggested fix:
The issue is in get_func_mm_tree_from_in_predicate, we construct the range tree by iteratively calling tree_or, which merges single element trees taken from the IN list.

tree_or merges two trees by inserting into the 1st tree by iterating through the 2nd tree. The issue is that we pass the single element tree as the 1st tree, and the current range tree as the 2nd tree. This means that we're constantly re-inserting the whole tree into a near empty tree for every element in the IN list.

A simple fix would be to switch the order of arguments to tree_or in get_func_mm_tree_from_in_predicate (similar to the scalar path). Another possible fix is to improve tree_or to always merge into the larger tree, so that the caller does not need to worry about the order of arguments.
[2 Nov 2022 10:22] MySQL Verification Team
Hello Manuel Ung,

Thank you for the report and feedback.

regards,
Umesh
[2 Nov 2022 10:24] MySQL Verification Team
- 8.0.31

rm -rf 108963/
bin/mysqld --no-defaults --initialize-insecure --basedir=$PWD --datadir=$PWD/108963 --log-error-verbosity=3
bin/mysqld_safe --no-defaults --basedir=$PWD --datadir=$PWD/108963 --core-file --socket=/tmp/mysql.sock  --port=13306 --log-error=$PWD/108963/log.err --mysqlx-port=33330 --mysqlx-socket=/tmp/mysql_x_ushastry.sock --log-error-verbosity=3  --secure-file-priv="" --local-infile=1 2>&1 &

 bin/mysql -uroot -S /tmp/mysql.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.31 MySQL Community Server - GPL

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.01 sec)

mysql> use test
Database changed
mysql> create table t (a varchar(255), b int, primary key(a, b)) charset=latin1;
Query OK, 0 rows affected (0.02 sec)

mysql> source fastquery;
Empty set (0.02 sec)

mysql>
mysql> source slowquery;
Empty set (33.66 sec)
[2 Nov 2022 10:26] MySQL Verification Team
- 5.7.40 affected as well

rm -rf 108963/
bin/mysqld --no-defaults --initialize-insecure --basedir=$PWD --datadir=$PWD/108963 --log-error-verbosity=3
bin/mysqld_safe --no-defaults --basedir=$PWD --datadir=$PWD/108963 --core-file --socket=/tmp/mysql.sock  --port=3306 --log-error=$PWD/108963/log.err --log-error-verbosity=3  --secure-file-priv="" --local-infile=1 2>&1 &

 bin/mysql -uroot -S /tmp/mysql.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.40 MySQL Community Server (GPL)

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> use test
Database changed
mysql>  create table t (a varchar(255), b int, primary key(a, b)) charset=latin1;
Query OK, 0 rows affected (0.02 sec)

mysql>  source fastquery;
Empty set (0.02 sec)

mysql> source slowquery;
Empty set (28.60 sec)

mysql>
[8 Jun 2023 0:05] Jon Stephens
Documented fix as follows in the MySQL 5.7.43, 8.0.34, and 8.1.0 changelogs:

    During optimization, range SEL_TREE creation uses logic which
    differs based on the left-hand side of the IN predicate. For a
    field item, each value on the right-hand side is added to an OR
    tree to create the necessary expression. In the case of a row
    item comparison (example: WHERE (a,b) IN ((n1,m1), (n2, m2),
    ...)), an expression in disjunctive normal form (DNF) is needed.
    A DNF expression is created by adding an AND tree with column
    values to an OR tree for each set of RHS values, the instead OR
    tree was added to the AND tree causing the tree merge to require
    exponential time due to O(n²) runtime complexity.

Closed.
[24 Jul 2024 8:51] MySQL Verification Team
This report is the original bug for the following one:

https://bugs.mysql.com/bug.php?id=100479
[25 Jul 2024 15:02] Jon Stephens
BUG#100479 is a duplicate of this bug.