Bug #82214 | Subquery returns empty list while identical query with JOIN returns result | ||
---|---|---|---|
Submitted: | 13 Jul 2016 9:42 | Modified: | 16 Nov 2016 1:26 |
Reporter: | Sveta Smirnova (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.6.5, 5.7.13 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | regression |
[13 Jul 2016 9:42]
Sveta Smirnova
[13 Jul 2016 11:13]
MySQL Verification Team
Hello Sveta, Thank you for the report and test case. Verified as described with 5.6.31. Regards, Umesh
[19 Sep 2016 16:30]
Marko Mäkelä
Posted by developer: In 8.0.1, the subquery will return the expected result when the semi-join optimization is disabled: set optimizer_switch='semijoin=off'; select si.name from performance_schema.setup_instruments si where 1 in (select si.name like concat('%', lower(engine), '%') from information_schema.engines); select si.name from performance_schema.setup_instruments si join information_schema.engines ON si.name like concat('%',lower(engines.engine),'%'); Curiously, the second query will return 909 rather than 606 records (303 duplicates). Changing the result column to COUNT(DISTINCT si.name) will yield 606 for both queries.
[20 Sep 2016 6:52]
Marko Mäkelä
Posted by developer: I found an explanation why the 'identical query with join' is actually not equivalent to the use of the subquery in 5.7 and later. In 5.6, both queries would return 91 rows if the work-around (set optimizer_switch='semijoin=off') is in place. In 5.7, I found out that the join is duplicating each record for the following subsystems: performance_schema, csv, innodb, myisam, myisammrg, archive, blackhole. Let us look at blackhole: select name from performance_schema.setup_instruments where name like '%blackhole%'; NAME wait/synch/mutex/blackhole/blackhole memory/blackhole/blackhole_share These would match both 'BLACKHOLE' and 'MEMORY' in information_schema.engines.engine. To prevent this, we can adjust the predicate, replacing CONCAT('%', LOWER(engine), '%') with CONCAT('%/', LOWER(engine), '/%') in both queries. With this change and the workaround (set optimizer_switch='semijoin=off'), both queries would return 288 rows instead of 483 and 669.
[20 Sep 2016 8:01]
Marko Mäkelä
Posted by developer: SET optimizer_switch='semijoin=on' was first available in MySQL 5.6.5, which is the oldest MySQL version where this fails. Disabling the semijoin optimization seems to work around this bug in every version that I tested.
[20 Sep 2016 8:29]
Marko Mäkelä
Posted by developer: There is something special about INFORMATION_SCHEMA.ENGINES. If the data is copied to a new table, the result will not be empty: create table t1(c varchar(9)); insert into t1 values ('ThinNoDB'), ('ThickNoDB'); select c from t1 where 1 in (select t1.c like concat('%', engine) from information_schema.engines); create table t2 select * from information_schema.engines; select c from t1 where 1 in (select t1.c like concat('%', engine) from t2); drop table t1,t2; Here, tested with both 5.6.5 and current 5.7, the second query returns the expected result (case-insensitively matching 'ThinNoDB' LIKE '%InnoDB'), while the first query returns the empty set if optimizer_switch='semijoin=on'.
[20 Sep 2016 9:44]
Marko Mäkelä
Posted by developer: This seems to affect some or all programmatically generated I_S tables, both built-in ones and those that are implemented as plugins, using different plugin APIs. On 8.0, where TABLES and CHARACTER_SETS are views on underlying InnoDB tables, the result is correct. Because FILES is still programmatically generated as of 8.0.0, it displays the same behaviour as ENGINES. (Note that I_S.FILES started displaying InnoDB tables in 5.7 with WL#7943.) Running the test with ./mtr --mysqld=--optimizer_switch=semijoin=off will return the correct result for each query. set global innodb_file_per_table=on; create table t1(c varchar(7)) engine=innodb; insert into t1 values ('foot1'), ('toobig5'), ('t1.ibd'), ('4096'); select c from t1 where 1 in (select t1.c like concat('%', table_name) from information_schema.tables); create table t2 select table_name c from information_schema.tables; select c from t1 where 1 in (select t1.c like concat('%', t2.c) from t2); drop table t2; select c from t1 where 1 in (select t1.c like concat('%', character_set_name) from information_schema.character_sets); create table t2 select character_set_name c from information_schema.character_sets; select c from t1 where 1 in (select t1.c like concat('%', t2.c) from t2); drop table t2; select c from t1 where 1 in (select file_name like concat('%', t1.c) from information_schema.files); create table t2 select file_name c from information_schema.files; select c from t1 where 1 in (select t2.c like concat('%', t1.c) from t2); drop table t2; select c from t1 where 1 in (select t1.c = page_size from information_schema.innodb_cmpmem); create table t2 select page_size c from information_schema.innodb_cmpmem; select c from t1 where 1 in (select t1.c = t2.c from t2); drop table t1,t2; The last query reports a bunch of warnings only when it is returning some results. This suggests that somehow the programmatically generated contents of the I_S table is not reaching the query execution: c 4096 Warnings: Warning 1292 Truncated incorrect DOUBLE value: 'foot1' Warning 1292 Truncated incorrect DOUBLE value: 'toobig5' Warning 1292 Truncated incorrect DOUBLE value: 't1.ibd' [repeated 4 more times; there are 5 rows in INNODB_CMPMEM]
[20 Sep 2016 9:55]
Marko Mäkelä
Posted by developer: The following test case proves that the I_S table is not even accessed when the semijoin optimization is enabled: create temporary table t select 4096 c; select c from t where 1 in (select t.c = page_size from information_schema.innodb_cmpmem); set optimizer_switch='semijoin=off'; select c from t where 1 in (select t.c = page_size from information_schema.innodb_cmpmem); A breakpoint on i_s_cmpmem_fill_low() will not be hit until the last query.
[20 Sep 2016 12:30]
Marko Mäkelä
Posted by developer: The problem seems to be that the query execution uses an empty temporary table instead of the I_S table. The situation does not seem to have changed between 5.6 and 8.0. Here is the table creation in 5.6: #0 0x00000000008486fc in create_schema_table (thd=0x1b930b0, table_list=0x7fffd0006698) at /home/marko/mysql/git/mysql/sql/sql_show.cc:7048 #1 0x0000000000849431 in mysql_schema_table (thd=0x1b930b0, lex=0x1b94f40, table_list=0x7fffd0006698) at /home/marko/mysql/git/mysql/sql/sql_show.cc:7277 #2 0x0000000000783d5a in open_and_process_table (thd=0x1b930b0, lex=0x1b94f40, tables=0x7fffd0006698, counter=0x1b94ff0, flags=0, prelocking_strategy=0x7fffeeea75d0, has_prelocking_list=false, ot_ctx=0x7fffeeea74e0) at /home/marko/mysql/git/mysql/sql/sql_base.cc:4622 #3 0x0000000000784e08 in open_tables (thd=0x1b930b0, start=0x7fffeeea7590, counter=0x1b94ff0, flags=0, prelocking_strategy=0x7fffeeea75d0) at /home/marko/mysql/git/mysql/sql/sql_base.cc:5163 #4 0x0000000000785f9e in open_normal_and_derived_tables (thd=0x1b930b0, tables=0x7fffd00052b8, flags=0) at /home/marko/mysql/git/mysql/sql/sql_base.cc:5870 #5 0x00000000007f60a3 in execute_sqlcom_select (thd=0x1b930b0, all_tables=0x7fffd00052b8) at /home/marko/mysql/git/mysql/sql/sql_parse.cc:5107 #6 0x00000000007eef25 in mysql_execute_command (thd=0x1b930b0) at /home/marko/mysql/git/mysql/sql/sql_parse.cc:2656 #7 0x00000000007f8d0c in mysql_parse (thd=0x1b930b0, rawbuf=0x7fffd0005030 "select c from t where 1 in\n(select t.c = page_size from information_schema.innodb_cmpmem)", length=89, parser_state=0x7fffeeea86c0) at /home/marko/mysql/git/mysql/sql/sql_parse.cc:6386 And here is the read of the table, which is returning HA_ERR_END_OF_FILE: #0 ha_heap::rnd_next (this=0x7fffd00129e0, buf=0x7fffd00128f0 "\377") at /home/marko/mysql/git/mysql/storage/heap/ha_heap.cc:372 #1 0x00000000006552f8 in handler::ha_rnd_next (this=0x7fffd00129e0, buf=0x7fffd00128f0 "\377") at /home/marko/mysql/git/mysql/sql/handler.cc:2688 #2 0x000000000099a7a4 in rr_sequential (info=0x7fffd0015128) at /home/marko/mysql/git/mysql/sql/records.cc:480 #3 0x00000000007c26f2 in join_init_read_record (tab=0x7fffd0015098) at /home/marko/mysql/git/mysql/sql/sql_executor.cc:2402 #4 0x00000000007bfee6 in sub_select (join=0x7fffd000fb30, join_tab=0x7fffd0015098, end_of_records=false) at /home/marko/mysql/git/mysql/sql/sql_executor.cc:1259 #5 0x00000000007bf8c6 in do_select (join=0x7fffd000fb30) at /home/marko/mysql/git/mysql/sql/sql_executor.cc:936 #6 0x00000000007bd868 in JOIN::exec (this=0x7fffd000fb30) at /home/marko/mysql/git/mysql/sql/sql_executor.cc:194 #7 0x000000000081f49c in mysql_execute_select (thd=0x1b930b0, select_lex=0x1b95648, free_join=true) at /home/marko/mysql/git/mysql/sql/sql_select.cc:1101 #8 0x000000000081f75a in mysql_select (thd=0x1b930b0, tables=0x7fffd00052b8, wild_num=0, fields=..., conds=0x7fffd0006c08, order=0x1b95810, group=0x1b95748, having=0x0, select_options=2147748608, result=0x7fffd0007008, unit=0x1b95000, select_lex=0x1b95648) at /home/marko/mysql/git/mysql/sql/sql_select.cc:1222 #9 0x000000000081d8e6 in handle_select (thd=0x1b930b0, result=0x7fffd0007008, setup_tables_done_option=0) at /home/marko/mysql/git/mysql/sql/sql_select.cc:101 #10 0x00000000007f6219 in execute_sqlcom_select (thd=0x1b930b0, all_tables=0x7fffd00052b8) at /home/marko/mysql/git/mysql/sql/sql_parse.cc:5134 #11 0x00000000007eef25 in mysql_execute_command (thd=0x1b930b0) at /home/marko/mysql/git/mysql/sql/sql_parse.cc:2656 #12 0x00000000007f8d0c in mysql_parse (thd=0x1b930b0, rawbuf=0x7fffd0005030 "select c from t where 1 in\n(select t.c = page_size from information_schema.innodb_cmpmem)", length=89, parser_state=0x7fffeeea86c0) at /home/marko/mysql/git/mysql/sql/sql_parse.cc:6386
[21 Sep 2016 8:49]
Marko Mäkelä
Posted by developer: The problem is that convert_subquery_to_semijoin() fails to copy the OPTION_SCHEMA_TABLE flag from the options of the subquery, causing JOIN::prepare_result() to skip the call to get_schema_tables_result().
[16 Nov 2016 1:26]
Paul DuBois
Posted by developer: Noted in 8.0.1 changelog. Subqueries that were converted to semi-joins and programmatically generated an INFORMATION_SCHEMA table could incorrectly treat the INFORMATION_SCHEMA table as empty. A workaround for this problem prior to the bug fix: SET optimizer_switch='semijoin=off';