Bug #68814 MySQL optimizer consider few rows to examine but it is not really obvious why
Submitted: 29 Mar 2013 15:33 Modified: 23 Apr 2015 6:14
Reporter: Shahriyar Rzayev (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.6.10 OS:Any (Centos 6.3)
Assigned to: CPU Architecture:Any
Tags: count(*) with where, Explain plan

[29 Mar 2013 15:33] Shahriyar Rzayev
Description:
Dear experts, i have encountered an interesting issue and i think the exact explanation will get from you.

So my table structure:

CREATE TABLE `sales` (
  `SALES_ID` int(8) NOT NULL AUTO_INCREMENT,
  `CUSTOMER_ID` decimal(8,0) NOT NULL,
  `PRODUCT_ID` decimal(8,0) NOT NULL,
  `SALE_DATE` datetime NOT NULL,
  `QUANTITY` decimal(8,0) NOT NULL,
  `SALE_VALUE` decimal(8,0) NOT NULL,
  `DEPARTMENT_ID` decimal(8,0) DEFAULT '0',
  `SALES_REP_ID` decimal(8,0) DEFAULT '0',
  `GST_FLAG` decimal(8,0) DEFAULT NULL,
  `sale_status` char(1) DEFAULT NULL,
  `FREE_SHIPPING` char(1) DEFAULT '',
  `DISCOUNT` decimal(8,0) unsigned DEFAULT '0',
  PRIMARY KEY (`SALES_ID`),
  KEY `sales_cust_idx` (`CUSTOMER_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=2500004 DEFAULT CHARSET=latin1

i have tested similar queries in result.
1.
mysql> select count(*) from sales;
+----------+
| count(*) |
+----------+
|  2500003 |
+----------+
1 row in set (0.59 sec)

2.
mysql> select count(*) from sales where sales_id>0;
+----------+
| count(*) |
+----------+
|  2500003 |
+----------+
1 row in set (0.90 sec)

0.90 and 0.59...much more slow.

But QEPs are very interesting:

First Query:

mysql> explain select count(*) from sales\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sales
         type: index
possible_keys: NULL
          key: sales_cust_idx
      key_len: 4
          ref: NULL
         rows: 2489938
        Extra: Using index
1 row in set (0.00 sec)

Second query:

mysql> explain select count(*) from sales where sales_id>0\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sales
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 1244969
        Extra: Using where; Using index
1 row in set (0.00 sec)

From this result: Using where+Using Index is slow from Using Index.
But why the second query examines 1244969 rows? is it magical?

Another thing i must say that i found that there is no difference between QEPs of:

mysql> explain select count(*) from sales where sales_id>1800000\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sales
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 1244969
        Extra: Using where; Using index
1 row in set (0.00 sec)

and

mysql> explain select count(*) from sales where sales_id>0\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sales
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 1244969
        Extra: Using where; Using index
1 row in set (0.00 sec)

What is it? Optimizer bug? feature?

How to repeat:
Just download sample database import it and test on your own:

http://examples.oreilly.com/9780596100896/mysqlspp.zip

explain select count(*) from sales where sales_id>1800000\G

and

explain select count(*) from sales where sales_id>0\G

Suggested fix:
I have no idea.
[11 Apr 2013 8:52] Jørgen Løland
Hi Shahriyar,

The 'rows' column of EXPLAIN does not output the actual number of rows that would be read by the statement but rather an estimate. See:

http://dev.mysql.com/doc/refman/5.6/en/explain-output.html#explain-output-columns

That being said, the estimate is wrong. The underlying problem is within the InnoDB storage engine, which tries to balance the quality the estimates against the cost of calculating the estimate. In cases where much more than half the rows match a range, InnoDB simply assumes that #rows/2 is a good enough estimate. 

The difference in execution time between these queries is due to the following:
 1) A different access method is chosen: index scan vs range access.
 2) In the second query, the predicate "sales_id>0" has to be evaluated 
    for 2.4 million rows.

I'll transfer this bug to the InnoDB team and let them decide on the action.
[11 Apr 2013 20:12] Shahriyar Rzayev
Thanks for reply.
It is still unclear and unlogical that MySQL optimizer does not see difference beetween >0 and >1.800.000 :)
I want and official reply for this issue from InnoDB team as you say.
[22 Apr 2015 15:22] Shane Bester
It seems I've also filed one for this already that I never found before reading
http://mysql.az/playing-with-count-optimizer-work/

https://bugs.mysql.com/bug.php?id=73386
[23 Apr 2015 6:14] Shahriyar Rzayev
Shane Bester,
Yes, it seems to be similar. Also very interesting.
Thank you for reading.