Bug #63905 handler_read_next is suprisingly high and slows queries
Submitted: 2 Jan 2012 23:47 Modified: 16 Jan 2012 18:34
Reporter: Philip Gladstone Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.1.60 OS:Windows
Assigned to: CPU Architecture:Any

[2 Jan 2012 23:47] Philip Gladstone
Description:
In my production environment, a notionally very simple query takes a long time and does a huge amount of io (according to the slow query log). Investigation reveals that queries of the sort:

select max(f1) from t1 where f2 = 1;  (and there is an index on t1(f2,f1))

execute very quickly and do a single key read. This is good.

However, queries of the form

select max(f1) from t1 where f2 in (1,10);

execute slowly and can do hundreds of thousands of read_next operations. Yes, I can optimize the query into:

select max(f) from (select max(f1) as f from t1 where f2 = 1 UNION select max(f1) as f from t1 where f2 = 10) as b;

and this is then executed efficiently. However, this seem deeply wrong (and it also isn't clear that it will scale as I add more values into the query).

[Of course, my production environment is a lot more complex, but this simple case demonstrates the problem. The file below demonstrates the problem with the mysql command line tool.]

How to repeat:
The following SQL command file demonstrates the problem. The sql_no_cache is there to prevent any query caching from helping. In my production environment, the main table is updated sufficiently often that caching does not fix the problem.  This example also uses a second table as a join, and this also causes some surprising counts as well.

I would expect that no output value of the show status like 'handler_read%' would be more than 10. 

-- MySQL dump 10.13  Distrib 5.1.52, for redhat-linux-gnu (x86_64)
--
-- Host: 127.0.01    Database: ss
-- ------------------------------------------------------
-- Server version       5.1.52

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

--
-- Table structure for table `t1`
--

DROP TABLE IF EXISTS `t1`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `t1` (
  `f1` int(11) DEFAULT NULL,
  `f2` int(11) DEFAULT NULL,
  KEY `t1f2f1` (`f2`,`f1`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `t1`
--

LOCK TABLES `t1` WRITE;
/*!40000 ALTER TABLE `t1` DISABLE KEYS */;
INSERT INTO `t1` VALUES (1,1),(2,1),(3,1),(4,2),(2,3),(10,10),(0,1),(1,10);
/*!40000 ALTER TABLE `t1` ENABLE KEYS */;
UNLOCK TABLES;

--
-- Table structure for table `t2`
--

DROP TABLE IF EXISTS `t2`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `t2` (
  `f2` int(11) DEFAULT NULL,
  `f3` varchar(10) DEFAULT NULL,
  KEY `t2f3` (`f3`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `t2`
--

LOCK TABLES `t2` WRITE;
/*!40000 ALTER TABLE `t2` DISABLE KEYS */;
INSERT INTO `t2` VALUES (1,'a'),(2,'b'),(10,'c');
/*!40000 ALTER TABLE `t2` ENABLE KEYS */;
UNLOCK TABLES;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;

-- Dump completed on 2012-01-02 18:03:58

insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;

flush status;
select sql_no_cache max(f1) as 'Query f2 = 1' from t1 where f2 = 1;
show status like 'handler_read%';

flush status;
select sql_no_cache max(f1) as 'Query f2 = 10' from t1 where f2 = 10;
show status like 'handler_read%';

flush status;
select sql_no_cache max(f1) as 'Query f2 in (1,10)' from t1 where f2 in (1,10);
show status like 'handler_read%';

flush status;
select sql_no_cache max(f) as 'Complex union' from (select sql_no_cache max(f1) as f from t1 where f2 = 1 UNION select max(f1) as f from t1 where f2 = 10) as b;
show status like 'handler_read%';

flush status;
select sql_no_cache max(f1) as 'Query f3 = a (using subquery)' from t1 where f2 = (select t2.f2 from t2 where f3 = 'a');
show status like 'handler_read%';

flush status;
select sql_no_cache max(f1) as 'Query f3 = a (using join)' from t1 join t2 on t1.f2 = t2.f2 where f3 = 'a';
show status like 'handler_read%';

flush status;
select sql_no_cache max(f1) as 'Query f3 = c (using subquery)' from t1 where f2 = (select t2.f2 from t2 where f3 = 'c');
show status like 'handler_read%';

flush status;
select sql_no_cache max(f1) as 'Query f3 = c (using join)' from t1 join t2 on t1.f2 = t2.f2 where f3 = 'c';
show status like 'handler_read%';

flush status;
select sql_no_cache max(f1) as 'Query f2 IN subquery' from t1 where f2 IN (select t2.f2 from t2 where f3 in('a', 'c'));
show status like 'handler_read%';

flush status;
select sql_no_cache max(f) as 'Query using complex join' from (select sql_no_cache max(f1) as f from t1 where f2 = (select t2.f2 from t2 where f3 = 'a')
                                        UNION select max(f1) as f from t1 where f2 = (select t2.f2 from t2 where f3 = 'c')) as b;
show status like 'handler_read%';
[3 Jan 2012 0:07] MySQL Verification Team
Please try version 5.1.60 and comment the results her. Thanks.
[3 Jan 2012 2:32] Philip Gladstone
I just tried with 5.1.60 on Windows and I get the same results.
[3 Jan 2012 2:43] Philip Gladstone
I just tried 5.5.19 on windows (using MyISAM storage engine) and it gives the same large numbers (though there are now 7 status variables that match 'handler_read%')
[3 Jan 2012 2:47] Philip Gladstone
[You have to remove some of the sql_no_cache clauses to make the test case work in 5.5.19. It appears that SQL_NO_CACHE is not permitted in subqueries]
[3 Jan 2012 11:03] MySQL Verification Team
Please print here your results. Thanks.
[3 Jan 2012 13:39] Philip Gladstone
The results from running the command file on 5.1.60 on Windows. I have annotated the output.

Query f2 = 1
3
Variable_name   Value
Handler_read_first      0
Handler_read_key        1   * A single key read is all that is needed
Handler_read_next       0
Handler_read_prev       0
Handler_read_rnd        0
Handler_read_rnd_next   0

Query f2 = 10
10
Variable_name   Value
Handler_read_first      0
Handler_read_key        1     *  Same here
Handler_read_next       0
Handler_read_prev       0
Handler_read_rnd        0
Handler_read_rnd_next   0

Query f2 in (1,10)
10
Variable_name   Value
Handler_read_first      0
Handler_read_key        2      * Two key reads is all that is *needed*
Handler_read_next       1536   * but it appears to be scanning some table
Handler_read_prev       0
Handler_read_rnd        0
Handler_read_rnd_next   0

Complex union
10
Variable_name   Value
Handler_read_first      0
Handler_read_key        2   * We can eliminate the scan with the nasty UNION
Handler_read_next       0
Handler_read_prev       0
Handler_read_rnd        0
Handler_read_rnd_next   6   * I'm not sure about these, but the number is small.

Query f3 = a (using subquery)
3
Variable_name   Value
Handler_read_first      0
Handler_read_key        2   * Should be two -- one for each table
Handler_read_next       1   * Not sure about this, but number is small
Handler_read_prev       0
Handler_read_rnd        0
Handler_read_rnd_next   0

Query f3 = a (using join)
3
Variable_name   Value
Handler_read_first      0
Handler_read_key        2      * one for each table
Handler_read_next       1025   * What is it scanning?
Handler_read_prev       0
Handler_read_rnd        0
Handler_read_rnd_next   0

Query f3 = c (using subquery)
10
Variable_name   Value
Handler_read_first      0
Handler_read_key        2
Handler_read_next       1
Handler_read_prev       0
Handler_read_rnd        0
Handler_read_rnd_next   0
Query f3 = c (using join)
10
Variable_name   Value
Handler_read_first      0
Handler_read_key        2
Handler_read_next       513
Handler_read_prev       0
Handler_read_rnd        0
Handler_read_rnd_next   0

Query f2 IN subquery
10
Variable_name   Value
Handler_read_first      1
Handler_read_key        3072    * Seems to be doing a lot of key operations
Handler_read_next       3584    * and even more scanning!
Handler_read_prev       0
Handler_read_rnd        0
Handler_read_rnd_next   0

Query using complex join
10
Variable_name   Value
Handler_read_first      0
Handler_read_key        4   * We can get the numbers small with the UNION approach
Handler_read_next       2
Handler_read_prev       0
Handler_read_rnd        0
Handler_read_rnd_next   6
[3 Jan 2012 13:40] Philip Gladstone
SQL commands to demonstrate problem. Run against empty database.

Attachment: ss.sql (application/octet-stream, text), 4.32 KiB.

[14 Jan 2012 9:58] Sveta Smirnova
Thank you for the feedback.

Hanler_read_next is increased when engine reads next row from index. So if index is used for f2 in (1,10) it has to read all index entries with f2 which satisfy condition, then find max value. In the test case provided there are 1536 such entries:

select sql_no_cache count(f2) as 'Query f2' from t1 where f2 in (1,10);
Query f2
1536

So having 1536 in Handler_read_next is expected. This is not whole index, neither data file.

Regarding to cases when f2=CONSTANT optimizer uses special optimization:

explain extended select sql_no_cache max(f1) as 'Query f2 = 10' from t1 where f2 = 10;
id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	Select tables optimized away

This is why you don't see such increase in this case.

So this is not a bug.
[15 Jan 2012 16:32] Philip Gladstone
I'd like to reopon this as a performance enhancement request. It seems that the OR handling in the optimizer is rather weak.

Is it always the case that 

SELECT x FROM y WHERE a OR b 

should be rewritten (for performance) as

SELECT x FROM y WHERE a UNION SELECT x FROM y WHERE b

where 'a' and 'b' are mutually exclusive?
[16 Jan 2012 18:34] Sveta Smirnova
Thank you for the feedback.

Verified as performance enhancement request.

> Is it always the case that 
>
> SELECT x FROM y WHERE a OR b 
>
> should be rewritten (for performance) as
>
> SELECT x FROM y WHERE a UNION SELECT x FROM y WHERE b
>
> where 'a' and 'b' are mutually exclusive?

Not, I don't think this would be a case, say, if a or b query unique key. Also I don't think this would be faster for more than 2 conditions even in setup similar to discussed in this report.