MySQL Bugs: #21262: Subquery hangs server with mysqld using 100% cpu

Bug #21262	Subquery hangs server with mysqld using 100% cpu - see also bug #18765
Submitted:	24 Jul 2006 23:35	Modified:	1 Sep 2006 18:35
Reporter:	Harry Clauson	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Documentation	Severity:	S2 (Serious)
Version:	5.0.22	OS:	Linux (Linux)
Assigned to:	Paul DuBois	CPU Architecture:	Any
Tags:	subquery slow hang

Description:
This problem is described almost exactly by the existing bug report #18765.  However, my instance is happening on either SCO unix or AIX but does not happen on my Windows server.

Basically I use a select * query to retrive all columns from the same table referenced by the subquery.  When running just the subquery as a main query, it correctly returns the scalar result instantly.  However, when running the same query as a select * subquery, mysqld hangs indefinitely using 100% cpu.  I subsequently determined that this was not an infinite loop, but was related to both the number of current rows and deleted rows in the table.  With approximately 14,000 rows in my production table, this subquery hung mysqld for nearly an hour.  By deleting most of the rows from the table, the subquery will take less than a minute.

How to repeat:
Run this simplified and annotated sql:

/* This demonstrates trouble with a scalar subquery which uses a lot of cpu for
   a very long time.  The same query returns immediately if used as a main
   query.  Also the amount of time (& cpu) taken by the subquery appears to be
   proportional to both the no. of current rows in the table and the no. of
   deleted rows in the table.  For that reason, I delete most of the rows
   loaded into the table so that the subquery will not take many minutes of
   cpu to complete.
*/

/* Step 1:  use the database of your choice */
use test

/* Step 2:  create the empty table */
drop table if exists locx;
create table locx (
	loc int unsigned not null,
	store int unsigned not null,
	primary key (loc),
	unique (store)
);

/* Step 3:  store a procedure which will load the table with many rows */
delimiter //
drop procedure if exists load_locx//
create procedure load_locx(i int)
begin
	while i>0 do
		insert into locx values (i,i);
		set i=i-1;
	end while;
end
//
delimiter ;

/* Step 4:  load a lot of rows into the table */
lock tables locx write;
call load_locx(14000);
unlock tables;

/* Step 5:  delete most of the rows (so that the subquery won't take so long) */
delete from locx where loc!=119 limit 13600;

/* Step 6:  run the scalar subquery as a main query - this alway returns
   instantly with the correct result (119).
*/
(select (
 if(@s:=119,
  (select loc from locx where store=@s),
  null)) as loc);

/* Step 6:  run the very same scalar subquery for its intended purpose of
   selecting all columns from the subject table.  The query will eventually
   return the correct results, but will take a great deal of time and cpu to
   do so.
*/
select * from locx where loc=
(select (
 if(@s:=119,
  (select loc from locx where store=@s),
  null)) as loc);

Suggested fix:
Since the subquery executes correctly as a main query, it appears that it is mishandled as a result of the outer select * query.

Thank you for the report.

Verified on Linux using last BK sources as reporter described.

Hi Harry,

Is there any particular reason you first add a lot of (14000)
rows, and then delete most of them leaving 4000 rows. How is
that different from directly inserting just 4000 rows?

You are correct in that it shouldn't make any difference - but due to this bug it does.  I observe that the time the query "hangs" is longer when there are a lot of deleted rows, and shorter if there are less.  This leads me to believe that the query is doing a physical scan of every row in the table (including deleted ones).  However the query will independently run faster or slower depending on how many non-deleted rows there are, even with the same no. of deleted rows.

If I run the test with half the no. of deleted rows, the query is twice as fast.  But if I run the test with the same no. of deleted rows but half the non-deleted rows, the query also runs twice as fast.  The two combine such that a large no. of non-deleted rows causes the query to run so long that it would appear to be an infinite loop - but it's not.

What this seems to indicate is that the behavior of the query involves not just the no. of rows in total, but the product of the no. of rows and the no. of undeleted rows.  This is as if it is doing something like one full table scan per undeleted row!

I hope this helps you.

Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Additional info:
The query :
select * from locx where loc=
(select (
 if(@s:=119,
  (select loc from locx where store=@s),
  null)) as loc);

is slow because the subquery contained in it :

select (
 if(@s:=119,
  (select loc from locx where store=@s),
  null)) as loc

executes once for each row of the outer context.

This subquery is compiled as UNCACHEABLE SUBQUERY. This is evident from the EXPLAIN :
explain select * from locx lo where loc= (select (if(@s:=119,(select loc from locx where store=@s),null)) as loc);
--------------
explain select * from locx lo where loc= (select (if(@s:=119,(select loc from locx where store=@s),null)) as loc)
--------------

+----+----------------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type          | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+----------------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | PRIMARY              | lo    | ALL  | NULL          | NULL | NULL    | NULL |  400 | Using where | 
|  3 | UNCACHEABLE SUBQUERY | locx  | ALL  | NULL          | NULL | NULL    | NULL |  400 | Using where | 
+----+----------------------+-------+------+---------------+------+---------+------+------+-------------+

"UNCACHEABLE SUBQUERY" here means the subquery is re-evaluated for each row of the outer context.
This differs from how correlated subqueries (see Section 13.2.8.7. of the manual : http://dev.mysql.com/doc/refman/5.0/en/correlated-subqueries.html) are executed e.g:

mysql> explain select * from locx lo where loc= (select (if(@s:=119,(select loc from locx where store=lo.store),null)) as loc);
--------------
explain select * from locx lo where loc= (select (if(@s:=119,(select loc from locx where store=lo.store),null)) as loc)
--------------

+----+--------------------+-------+--------+---------------+---------+---------+---------------+------+-------------+
| id | select_type        | table | type   | possible_keys | key     | key_len | ref           | rows | Extra       |
+----+--------------------+-------+--------+---------------+---------+---------+---------------+------+-------------+
|  1 | PRIMARY            | lo    | eq_ref | PRIMARY       | PRIMARY | 4       | func          |    1 | Using where | 
|  3 | DEPENDENT SUBQUERY | locx  | eq_ref | store         | store   | 4       | test.lo.store |    1 |             | 
+----+--------------------+-------+--------+---------------+---------+---------+---------------+------+-------------+

A "DEPENDENT SUBQUERY" is re-evaluated only once for each set of different values of the variables from its outer context.

Cacheability of subqueries is subject to the restrictions detailed in section 5.14.1. of the manual (http://dev.mysql.com/doc/refman/5.0/en/query-cache-how.html). 
Quoting user variables is one such restriction.

Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

Adding note to EXPLAIN section of manual:

DEPENDENT SUBQUERY evaluation differs from UNCACHEABLE SUBQUERY
evaluation. For DEPENDENT SUBQUERY, the subquery is re-evaluated only
once for each set of different values of the variables from its outer
context. For UNCACHEABLE SUBQUERY, the subquery is re-evaluated for
each row of the outer context. Cacheability of subqueries is subject
to the restrictions detailed in (query-cache-how). For example,
referring to user variables makes a subquery uncacheable.