Bug #42259 SELECT WHERE IN performs poorly (slow) with views and functions
Submitted: 22 Jan 7:36 Modified: 30 Sep 22:24
Reporter: Van Stokes
Status: Verified
Category:Server: DML Severity:S5 (Performance)
Version:5.4.2 OS:Microsoft Windows (XP Pro x64)
Assigned to: Bugs System Target Version:
Tags: SELECT, where, IN, slow, subquery benchmark
Triage: Triaged: D3 (Medium)

[22 Jan 7:36] Van Stokes
Description:
Running a SELECT WHERE IN yield poor performance.
Repose of my forum post: http://forums.mysql.com/read.php?10,243691,243691#msg-243691

- Table Information --
InventoryKey (INT) = Primary Key
ContInvKey (INT) = Indexed
OwnerType (TINYINT) + OwnerKey (INT) = Indexed

Trying to do a SELECT WHERE IN and it's taking over 500 seconds to return results! This
statement:

SELECT DISTINCT inventory.ContInvKey FROM inventory WHERE inventory.OwnerType = 1 AND
inventory.OwnerKey = 18 AND inventory.ContInvKey > 0;

..takes ~0.15 seconds to run and fetch 21 rows. However, if I do this statement:

SELECT * FROM inventory WHERE InventoryKey IN ( SELECT DISTINCT inventory.ContInvKey FROM
inventory WHERE inventory.OwnerType = 1 AND inventory.OwnerKey = 18 AND
inventory.ContInvKey > 0 );

.. it takes 500+ seconds to run and return 108 rows. Something is definitely wrong here.
I can repeat it every time. I monitored the server ( I am the only connection ) and I see
the server spike instantly when the query is received and then it flat lines. There
doesn't appear to be any activity for 499.85 seconds. It's idle!

How to repeat:
Create two tables to perform a similar query. Our Inventory table has ~1M rows.
Note that ALL WHERE fields use either PRIMARY KEY or INDEXED columns.

Suggested fix:
Inventory.ContInvKey is not UNIQUE. It should NOT matter. I don't understand why an "IN"
query runs slower than the basic statement. If I actually type the entire basic select
statement out

SELECT * FROM inventory WHERE InventoryKey = AAA OR InventoryKey = BBB OR InventoryKey =
CCC....

...with all 21 results the query runs fast (normal). It would appear to me there is poor
logic in the database server if it doesn't perform the query the same way. IMHO, The
server should be performing the query

IN ( SELECT DISTINCT inventory.ContInvKey FROM inventory WHERE inventory.OwnerType = 1
AND inventory.OwnerKey = 18 AND inventory.ContInvKey > 0 )

...FIRST ( in 0.15 seconds), and then taking that result and perform the OR in:

SELECT * FROM inventory WHERE InventoryKey WHERE InventoryKey = {RESULT1} OR InventoryKey
= {RESULT2} OR ...

"IN" is an "WHERE OR" by definition. MS and Oracle do not perform "IN" this poorly.
[22 Jan 9:25] Sergey Petrunya
Judging from the query text, this case should receive better handling in MySQL 6.0,  as it
is covered by the new 6.0 subquery optimizations (see
http://forge.mysql.com/wiki/Subquery_Works for details). 

Van, we would appreciate getting the dataset so that we can verify that this case is
handled properly in 6.0 (see the link, we're collecting such cases).  If you click in o
the "Files" tab, it will show instructions how to upload big files to our ftp.

Thanks,
[22 Jan 9:53] Valeriy Kravchuk
Thank you for the problem report. Please, send the results of:

EXPLAIN SELECT * FROM inventory WHERE InventoryKey IN ( SELECT DISTINCT
inventory.ContInvKey FROM
inventory WHERE inventory.OwnerType = 1 AND inventory.OwnerKey = 18 AND
inventory.ContInvKey > 0 )\G

show create table inventory\G

show table status like 'inventory'\G
[22 Jan 10:09] Van Stokes
I will submit a mysqldump of the database after the next auto backup completes.
[22 Jan 10:19] Valeriy Kravchuk
Please, upload your dump (compressed) to ftp://ftp.mysql.com/pub/mysql/upload/ and  inform
about the filename (include bug number, 42259 in it) here.

Based on EXPLAIN results, only 6.0.x may help you now, with its new subquery optimization
methods.
[22 Jan 12:43] Miguel Solorzano
Below the output I got on Windows Vista 64-bit and actually there is a great improvement
with 6.0 regarding 5.1:

g:\share\dbs>g:\share\dbs\6.0\bin\mysql -uroot --port=3600 --prompt="mysql 6.0 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 6.0.10-alpha-nt-log Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql 6.0 > use Origins
Database changed
mysql 6.0 > SELECT DISTINCT inventory.ContInvKey FROM inventory WHERE inventory.OwnerType
= 1 AND
    -> inventory.OwnerKey = 18 AND inventory.ContInvKey > 0\G
*************************** 1. row ***************************
ContInvKey: 741879
*************************** 2. row ***************************

<cut>

ContInvKey: 741916
*************************** 6. row ***************************
ContInvKey: 741915
6 rows in set (0.67 sec)

mysql 6.0 > SELECT * FROM inventory WHERE InventoryKey IN ( SELECT DISTINCT
inventory.ContInvKey FROM
    -> inventory WHERE inventory.OwnerType = 1 AND inventory.OwnerKey = 18 AND
    -> inventory.ContInvKey > 0 )\G
<cut>
   OrientationY: 0.0000
   OrientationZ: 0.0000
DateTimeCreated: 2008-04-23 16:03:17
RecLastModified: 2008-04-23 17:03:17
6 rows in set (0.03 sec)

mysql 6.0 >

g:\share\dbs>g:\share\dbs\5.1\bin\mysql -uroot --port=3510 --prompt="mysql 5.1 >"
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.31-nt-log Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql 5.1 >use origins
Database changed
mysql 5.1 >SELECT DISTINCT inventory.ContInvKey FROM inventory WHERE inventory.OwnerType
= 1 AND
    -> inventory.OwnerKey = 18 AND inventory.ContInvKey > 0\G
*************************** 1. row ***************************
ContInvKey: 741879

<cut>
*************************** 6. row ***************************
ContInvKey: 741915
6 rows in set (0.30 sec)

mysql 5.1 >SELECT * FROM inventory WHERE InventoryKey IN ( SELECT DISTINCT
inventory.ContInvKey FROM
    -> inventory WHERE inventory.OwnerType = 1 AND inventory.OwnerKey = 18 AND
    -> inventory.ContInvKey > 0 )\G

<cut>

   OrientationZ: 0.0000
DateTimeCreated: 2008-04-23 16:03:19
RecLastModified: 2008-04-23 17:03:19
6 rows in set (2 min 15.11 sec)

mysql 5.1 >
[22 Jan 13:02] Miguel Solorzano
Could you please provide the create view statement, to test your actual query?. Thanks in
advance.
[22 Jan 13:10] Miguel Solorzano
Sorry disregard my last comment the error I got is:

ERROR 1305 (42000): FUNCTION origins.InventoryGetOwnerTypeByName does not exist

to run your actual query. Thanks in advance.
[22 Jan 14:57] Miguel Solorzano
Thank you for the feedback. With the actual query which uses views and function I have not
noticed differences between 5.1 and 6.0.:

-----+--------------------+-----------------------+
21 rows in set (8 min 13.11 sec)

mysql 5.1 >show variables like "%version%";
    -> //
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| protocol_version        | 10                  |
| version                 | 5.1.31-nt-log       |
| version_comment         | Source distribution |
| version_compile_machine | unknown             |
| version_compile_os      | Win64               |
+-------------------------+---------------------+
5 rows in set (0.44 sec)

-------------------------------------------------

21 rows in set (8 min 2.73 sec)

mysql 6.0 > show variables like "%version%";
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| protocol_version        | 10                  |
| version                 | 6.0.10-alpha-nt-log |
| version_comment         | Source distribution |
| version_compile_machine | unknown             |
| version_compile_os      | Win64               |
+-------------------------+---------------------+
5 rows in set (0.04 sec)

mysql 6.0 >

I changed the Synopsis to reflect that.
[23 Jan 17:12] Sergey Petrunya
Confirm, the presence of VIEW disables 6.0 subquery optimizations, that's why there is no
speedup when a VIEW is used.

There is no really valid behind this - the optimizer is perfectly capable of processing
such VIEWs. It seems this case got disabled when we've put a (very rough) fix to disable
subquery processing in presence of LEFT JOINs (there are real issues with subqueries +
outer joins). 

This can and should be fixed before the 6.0 release
[7 Mar 0:41] Van Stokes
We installed 6.0.9-alpha on our beta testing server. WOW! What a performance increase from
over 5.1.x!!! It is a very noticeable improvement! All our users noticed it. The SELECT
WHERE IN (even with views) is instantaneous! Great job!

We ran into another bug - repeated calls to a stored procedure via the same thread cause
the thread to hang. We are opening a new bug for this one.
[10 Apr 16:00] Miguel Solorzano
See bug: http://bugs.mysql.com/bug.php?id=44201.
[5 Jun 16:16] Van Stokes
We have upgraded to MySQL 6.0.11.
There is still a performance problem when using WHERE IN.

This statement:

SELECT inventory_view.* FROM inventory_view WHERE InventoryKey IN ( SELECT DISTINCT
inventory.ContInvKey FROM inventory WHERE inventory.OwnerType = mOwnerType AND
inventory.OwnerKey = mOwnerKey AND ContInvKey > 0 )

Still takes over 162 seconds to run and fetch 10 rows.
If I break the query up into two separate queries like this:

SELECT DISTINCT inventory.ContInvKey FROM inventory WHERE inventory.OwnerType =
mOwnerType AND inventory.OwnerKey = mOwnerKey AND ContInvKey > 0;

...returns 10 rows and takes 0.0011 seconds to run. And if follow-up with:

SELECT inventory_view.* FROM inventory_view WHERE InventoryKey=Result1 OR
InventoryKey=Result2 OR InventoryKey=Result2.....

It returns 10 rows in 0.029 seconds.

Its painfully obvious the query optimizer still isn't working. This is a very *simple*
WHERE IN statement. There is no excuse for such poor performance.

-------------
EXPLAIN SELECT DISTINCT inventory.ContInvKey FROM inventory WHERE inventory.OwnerType =
InventoryGetOwnerTypeByName( "CHARACTER" ) AND inventory.OwnerKey =
CharacterGetKeyByCallsign( "AmenRa" ) AND ContInvKey > 0;

1, 'SIMPLE', 'inventory', 'ref', 'Inventory_Cont_IDX,Inventory_Owner_IDX',
'Inventory_Owner_IDX', '5', 'const,const', 68, 'Using where; Using temporary'

-------------
EXPLAIN SELECT inventory_view.* FROM inventory_view WHERE InventoryKey IN ( SELECT
DISTINCT inventory.ContInvKey FROM inventory WHERE inventory.OwnerType =
InventoryGetOwnerTypeByName( "CHARACTER" ) AND inventory.OwnerKey =
CharacterGetKeyByCallsign( "AmenRa" ) AND ContInvKey > 0 );

1, 'PRIMARY', 'inventory', 'ALL', 'Inventory_Item_IDX', '', '', '', 642970, 'Using
where'
1, 'PRIMARY', 'item', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'origins.inventory.ItemKey',
1, ''
1, 'PRIMARY', 'inventorylocation', 'eq_ref', 'PRIMARY,InventoryLocation_InventoryKey_fk',
'PRIMARY', '4', 'func', 1, 'Using where'
2, 'DEPENDENT SUBQUERY', 'inventory', 'ref', 'Inventory_Cont_IDX,Inventory_Owner_IDX',
'Inventory_Owner_IDX', '5', 'const,const', 68, 'Using where; Using temporary'
[3 Jul 16:40] Van Stokes
We tested MySQL server 5.4-beta and the performance problem still persists.
[27 Sep 13:31] Van Stokes
We have downgraded to 5.4.2 from 6.0.x

The WHERE IN is still a major issue. It's extremely slow.

The statement takes 13 seconds to run:

DELETE FROM Inventory WHERE InventoryKey IN ( SELECT ObjKey FROM MissionObjective WHERE
MissionKey = 5 AND ObjKeyType = 3 );

The INVENTORY table has 1M+ rows. The MISSIONOBJECTIVE table has ONE row!

EXPLAIN SELECT ObjKey FROM MissionObjective WHERE MissionKey = 5 AND ObjKeyType = 3;

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	MissionObjective	ref	PRIMARY,MissionObjective_FK01	PRIMARY	5	const,const	1	Using
index

My original example that started this bug is still an issue too. We use raw table selects
and view selects in our queries.
[30 Sep 22:24] Van Stokes
Windows XP x64 (dedicated)
MySQL x64 5.4.2

SELECT InventoryGetKeyByProperty( 10000, "Thesdia Station", 0, 0, 0 );

Execute Time: 00:00:00:032

SELECT * FROM InventoryLocation WHERE InventoryKey = 1011559;

Execute Time: 00:00:00:047

SELECT * FROM InventoryLocation WHERE InventoryKey = InventoryGetKeyByProperty( 10000,
"Thesdia Station", 0, 0, 0 );

Execute Time: 00:03:37:155

That's THREE MINUTES!

SQL is read RIGHT TO LEFT. It is obvious from this simple statement you are reading LEFT
TO RIGHT! SOLVE for InventoryGetKeyByProperty() *FIRST* THEN you perform the SELECT!

I hate to be so obstinate but this is getting ridiculous.