MySQL Bugs: #102584: join on result of function

Bug #102584	join on result of function
Submitted:	12 Feb 2021 15:19	Modified:	15 Feb 2021 15:31
Reporter:	Dave Pullin (Basic Quality Contributor)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.7.31	OS:	Any (CentOS Linux release 7.6.1810 (Core))
Assigned to:		CPU Architecture:	Any

Description:
A derived table that selects a value returned by a function, 
is joined to a table using its primary key of the table:

select * from (select bug.func(pkey) as item from bug.table1 ) as data  join bug.table2 using(item);

results in either a server loop or a very long query (I didn't wait that long).

EXPLAIN shows 

Id	Select Type	Table	Partitions	Type	Possible Keys	Key	Key Len	Ref	Rows	Filtered	Extra
1	SIMPLE	table2		ALL					1655	100.0	
1	SIMPLE	table1		index		PRIMARY	4		13430	100.0	Using where; Using index; Using join buffer (Block Nested Loop)

whereas the  expected strategy is to look up table2 by primary key.

Reducing the number of rows such that the EXPLAIN does not use 'Using join buffer (Block Nested Loop)' eliminates the bug.

console log:
...
Server version: 5.7.31 MySQL Community Server (GPL)
...

mysql> create database if not exists bug;
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> drop function if exists bug.func;
Query OK, 0 rows affected (0.00 sec)

mysql> create function bug.func(in_date date) returns varchar(254) return "anything";
Query OK, 0 rows affected (0.00 sec)

mysql> drop table if exists bug.table1;
Query OK, 0 rows affected (0.00 sec)

mysql> create table bug.table1 (pkey int auto_increment primary key,dummy int); 
Query OK, 0 rows affected (0.04 sec)

mysql> insert into bug.table1 (dummy) select 1 from mysql.help_topic as a,mysql.help_topic as b,mysql.help_topic as c limit 13430;
Query OK, 13430 rows affected (0.04 sec)
Records: 13430  Duplicates: 0  Warnings: 0

mysql> drop table if exists bug.table2;
Query OK, 0 rows affected (0.00 sec)

mysql> create table bug.table2 (item varchar(254),dummy int); 
Query OK, 0 rows affected (0.00 sec)

mysql> insert into bug.table2 (dummy) select 1 from mysql.help_topic as a,mysql.help_topic as b,mysql.help_topic as c limit 1655;
Query OK, 1655 rows affected (0.01 sec)
Records: 1655  Duplicates: 0  Warnings: 0

mysql> select * from (select bug.func(pkey) as item from bug.table1 ) as data  join bug.table2 using(item);

/* does not return for a very long time or maybe forever */

How to repeat:
create database if not exists bug;

drop function if exists bug.func;
create function bug.func(in_date date) returns varchar(254) return "anything";

drop table if exists bug.table1;
create table bug.table1 (pkey int auto_increment primary key,dummy int); 
insert into bug.table1 (dummy) select 1 from mysql.help_topic as 
a,mysql.help_topic as b,mysql.help_topic as c limit 13430;

drop table if exists bug.table2;
create table bug.table2 (item varchar(254),dummy int); 
insert into bug.table2 (dummy) select 1 from mysql.help_topic as a,mysql.help_topic as b,mysql.help_topic as c limit 1655;

select * from (select bug.func(pkey) as item from bug.table1 ) as data  join bug.table2 using(item);
/* does not return for a very long time or maybe forever */

the select eventually completed

Empty set, 65535 warnings (5 min 14.99 sec)

(the warnings were 
Data truncated for column 'in_date' at row 1656 
which is a bug in my testcase and irrevlant to the reported bug)

so it is not a loop, just a terrible strategy

Hi Mr. Pullin,

Thank you, very much for your bug report.

However, this is not a bug, at least not in versions that are older than 8.0.

The essential problem here is that join can not be resolved by the usage of the index on the common column, since the executioner has actually to perform 1655 * 13430 comparisons, which makes 22.226.650 comparisons, which takes lots of time.

There are other problems with your query ......

- Your  function takes in a date type, but you feed it with  integers; so it will throw error on first evaluation.
 - The column that you want to lookup on is not part of an index, so there is no way that you can get an index lookup.
 - Even in the case that there is an index, selectivity would be 100%, as the column contains the same value (NULL) for all rows.

However, MySQL 8.0 actually executes this pretty efficiently without an index, due to hash join. Hash joins are not implemented in 5.7.

Not a bug.

I appreciate that failure to do an optimization is not a actually a bug, and I'm sorry that I introduced errors in trimming down the real life instantiation to a small test case, which did indeed make the optimization impossible.

Below  is the test case fixed. Note that adding 'limit 1000000000' changes the optimization and the select runs in  0.12 secs. Without it, it takes 1 min 44.36 sec. ( I don't see why the optimization is possible with the limit but not without it).

begin
create database if not exists bug;
drop function if exists bug.func;
create function bug.func(x int) returns int return 1;

drop table if exists bug.table1;
create table bug.table1 (pkey int auto_increment primary key,dummy int); 
insert into bug.table1 (dummy) select 1 from mysql.help_topic as a,mysql.help_topic as b,mysql.help_topic as c limit 13430;

drop table if exists bug.table2;
create table bug.table2 (item int auto_increment  primary key,dummy int); 
insert into bug.table2 (dummy) select 1 from mysql.help_topic as a,mysql.help_topic as b,mysql.help_topic as c limit 1655;

select * from (select bug.func(pkey) as item from bug.table1 limit 1000000000) as data  join bug.table2 using(item);

select * from (select bug.func(pkey) as item from bug.table1 ) as data  join bug.table2 using(item);

Hi Mr. Pullin,

That is simply how our 5.7 optimiser works. Vast majority of  these shortcomings have been fixed in 8.0 by new features.

Thanks.
I'm happy to have found a work around with the LIMIT to a billion rows.

Unfortunately the migration to V8 is a big deal for me due to many incompatibilities.

Hi Mr. Pullin,

You are welcome ......