Bug #78192 InnoDB estimation is way too wrong for column with mostly the same values
Submitted: 24 Aug 2015 13:55 Modified: 27 Aug 2015 8:48
Reporter: Andrii Nikitin Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.5.25 OS:Any
Assigned to: CPU Architecture:Any

[24 Aug 2015 13:55] Andrii Nikitin
Description:
InnoDB estimates twice less rows for column which has mostly the same values. This further affects Optimizer as it suggests very weak index for the condition.

How to repeat:
happens without regard of innodb_persistent_stat configuration
Able to repeat in 5.6.25, 5.5.45, 5.5.25
5.1.63 didn't show problem in my tests

drop table b;
create table b(a int(1), index(a));

insert into b select 1 from  mysql.help_topic a, mysql.help_topic b LIMIT 15000;
# Records: 15000  Duplicates: 0  Warnings: 0

update b set a=2 where rand()>0.75 limit 100;
# Rows matched: 100  Changed: 100  Warnings: 0

explain select count(*) from b where a=1;
# | 1 | SIMPLE | b | ref  | a | a | 5 | const | 7500 | Using index |

select count(*) from b where a=1;
# |    14900 |

Suggested fix:
Row estimation should give much less error
[24 Aug 2015 13:57] Andrii Nikitin
This most probably is duplicate of bug #73386  with the only difference that this bug was not repeated on 5.1 so far, while 73386 was.
[27 Aug 2015 8:38] Olav Sandstå
As suggested by Andrii, this is a duplicte of Bug#73386. For range estimates, InnoDB never returns an estimate that is more than half of the number of records in the table.