MySQL Bugs: #92421: Queries with views and operations over local variables don't use indexes

Bug #92421	Queries with views and operations over local variables don't use indexes
Submitted:	14 Sep 2018 9:57	Modified:	26 Sep 2018 11:45
Reporter:	Andrii Ustymenko	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	5.7	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Since all mysql versions below 8 don't have ROW_NUMBER() function the common way to emulate is to increment a local variable. Like described here:

http://www.mysqltutorial.org/mysql-row_number/

This works fine with tables, but not with views. If the same approach is used against a view - mysql performs full table scan of the underlying table. 

Still, this was forking fine in all latest versions of mysql 5.6.

I think this related to big change of how views and derived tables are treated in mysql 5.7. I have found the next article which doesn't explain what is going on, but gives a clue that many things have been reworked.  

https://mysqlserverteam.com/derived-tables-in-mysql-5-7/

I consider this as a bug, since before upgrade to mysql 5.7 such queries over views did use underlying indexes.

How to repeat:
Install the latest mysql 5.7. I did everything on Centos 7 x86-64.

Create the test table, view over it and populate it with some data:

mysql> create table test (
    ->   `id` int(11) not null auto_increment,
    ->   `data` char(40),
    ->   primary key(id)
    -> ) engine=Innodb;
Query OK, 0 rows affected (0.63 sec)

mysql> create ALGORITHM=UNDEFINED view v_test as select * from test;
Query OK, 0 rows affected (0.04 sec)

[root@austymenko austymenko]#cat /dev/urandom | while read line ; do echo $line | tr -dc 'a-zA-Z0-9' | fold -w 20 | xargs -P 8 -I str echo "INSERT into test (\`data\`) VALUES ('str');" | mysql -A test; done

I have generated aroung 2 million records. Just to have representative dataset.

Set local session variable and run the next query against view and table:

mysql> set @r:=0;
Query OK, 0 rows affected (0.00 sec)

mysql> select *,@r:=@r+1 from test where id=1000000;
+---------+-----------------+----------+
| id      | data            | @r:=@r+1 |
+---------+-----------------+----------+
| 1000000 | QPkG2yfwWIJfvEz |        1 |
+---------+-----------------+----------+
1 row in set (0.02 sec)

mysql> select *,@r:=@r+1 from v_test where id=1000000;
+---------+-----------------+----------+
| id      | data            | @r:=@r+1 |
+---------+-----------------+----------+
| 1000000 | QPkG2yfwWIJfvEz |        2 |
+---------+-----------------+----------+
1 row in set (2.72 sec)

Explain shows that the first query uses PRIMARY KEY, while the second doesn't:

mysql> explain select *,@r:=@r+1 from test where id=1000000;
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | test  | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.11 sec)

mysql> explain select *,@r:=@r+1 from v_test where id=1000000;
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+---------+----------+-------+
| id | select_type | table      | partitions | type | possible_keys | key         | key_len | ref   | rows    | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+---------+----------+-------+
|  1 | PRIMARY     | <derived2> | NULL       | ref  | <auto_key0>   | <auto_key0> | 4       | const |      10 |   100.00 | NULL  |
|  2 | DERIVED     | test       | NULL       | ALL  | NULL          | NULL        | NULL    | NULL  | 1699704 |   100.00 | NULL  |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+---------+----------+-------+
2 rows in set, 1 warning (0.02 sec)

Changing ALGORITHM to TEMPTABLE or MERGED or playing with different optimizer settings didn't help. 
The same query in mysql 5.6 works fine:

mysql> set @r:=0;
Query OK, 0 rows affected (0.00 sec)

mysql>  select *,@r:=@r+1 from v_test where id=1000000;
+---------+-----------------+----------+
| id      | data            | @r:=@r+1 |
+---------+-----------------+----------+
| 1000000 | QPkG2yfwWIJfvEz |        1 |
+---------+-----------------+----------+
1 row in set (0.00 sec)

mysql> explain select *,@r:=@r+1 from v_test where id=1000000;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | test  | const | PRIMARY       | PRIMARY | 4       | const |    1 | NULL  |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.02 sec)

Please see: https://dev.mysql.com/doc/refman/5.7/en/view-restrictions.html

C.5 Restrictions on Views

View processing is not optimized:

    It is not possible to create an index on a view.

    Indexes can be used for views processed using the merge algorithm. However, a view that is processed with the temptable algorithm.......

Thank you for the link. Of course I have read it. 
If it wasn't possible to utilize the underlying table indexes I would assume nobody would use the views at all )

The next simple example shows that optimizer successfully finds underlying table index when querying the view:

mysql> select * from v_test where id=1000000;
+---------+-----------------+
| id      | data            |
+---------+-----------------+
| 1000000 | QPkG2yfwWIJfvEz |
+---------+-----------------+
1 row in set (0.07 sec)

mysql> explain select * from v_test where id=1000000;
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | test  | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.02 sec)

View is same as in my first comment. As I know when it is made with ALGORITHM=UNDEFINED mysql decides on its own whether it should be MERGE or TEMPTABLE.

The problem comes when you mix it with some operations over local variables. And since it was fine in mysql 5.6 (meaning that optimizer could find the underlying table index in such types of queries) I count that as a bug in the current version.

Hi Andrii,

this is due to a deliberate choice that was taken when rewriting derived tables and views in 5.7: When a user variable was assigned a value in a query block, merging of derived tables was disabled. See https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/derived-table-optimization.html

The reason for this is that the semantics of user variable assignment vary greatly when used in a single-table query versus e.g a join, and query merging might convert from one to the other. This could easily cause confusion on when user variables were updated, which we wanted to avoid. In your case, there is no join, so no confusion should be possible, but we still decided it was better to have a simple rule.

In 8.0, you can override this with a merge hint: /*+ merge(v_test) */, but this is unfortunately not implemented in 5.7.

Hi Roy,

Thank you for the confirmation and explanation. The real use case when this behavior causes problems is when you need to emulate for instance ROW_NUMBER() or RANK() function. 

On the given data it could have been something like below:

mysql> SELECT v.*, @r:= @r + 1 AS row_number
    -> FROM
    ->   (SELECT *
    ->    FROM v_test
    ->    WHERE id > 1000000
    ->      AND id < 1000010) v STRAIGHT_JOIN
    ->   (SELECT @r:=0) r;
+---------+----------------------+------------+
| id      | data                 | row_number |
+---------+----------------------+------------+
| 1000001 | dveLgzz              |          1 |
| 1000002 | pXpKybv6uG2          |          2 |
| 1000003 | DW7mtPPIfRqWDAGwYOkY |          3 |
| 1000004 | nWYYbFdpJKmTTcyMlCq0 |          4 |
| 1000005 | df8ts                |          5 |
| 1000006 | hHKWm4beP2c3MoHZAtPK |          6 |
| 1000007 | zQ95c1OTWDl6FOm4E6XI |          7 |
| 1000008 | WtQ2JHCuH8f0DEeHJHWC |          8 |
| 1000009 | ffi5rU01vAgHylwK     |          9 |
+---------+----------------------+------------+
9 rows in set (1.52 sec)

It is still the case for 5.7 since it doesn't have native ROW_NUBMER() function.
So, I assume this is not going to be fixed in any newer 5.7 releases?

Can't answer that, I have no influence over bug fixing priorities for 5.7...

Ok... So, what is next? Could you change the status of it to "Verified" or smth? And we hope that it will be taken to account for the next releases of 5.7.

Thank you for the feedback.

Hi MySQL team, this is bug actively worked on? We are hitting the same after upgrading from MySQL 5.6 to 5.7. I don't see any updates since 2018. 

Please update, Thanks!

Hi Suren,

I would recommend upgrading to MySQL 8.0, which has been available for more than 3 years.