MySQL Bugs: #78192: InnoDB estimation is way too wrong for column with mostly the same values

Bug #78192	InnoDB estimation is way too wrong for column with mostly the same values
Submitted:	24 Aug 2015 13:55	Modified:	27 Aug 2015 8:48
Reporter:	Andrii Nikitin	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.5.25	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
InnoDB estimates twice less rows for column which has mostly the same values. This further affects Optimizer as it suggests very weak index for the condition.

How to repeat:
happens without regard of innodb_persistent_stat configuration
Able to repeat in 5.6.25, 5.5.45, 5.5.25
5.1.63 didn't show problem in my tests

drop table b;
create table b(a int(1), index(a));

insert into b select 1 from  mysql.help_topic a, mysql.help_topic b LIMIT 15000;
# Records: 15000  Duplicates: 0  Warnings: 0

update b set a=2 where rand()>0.75 limit 100;
# Rows matched: 100  Changed: 100  Warnings: 0

explain select count(*) from b where a=1;
# | 1 | SIMPLE | b | ref  | a | a | 5 | const | 7500 | Using index |

select count(*) from b where a=1;
# |    14900 |

Suggested fix:
Row estimation should give much less error

This most probably is duplicate of bug #73386  with the only difference that this bug was not repeated on 5.1 so far, while 73386 was.

As suggested by Andrii, this is a duplicte of Bug#73386. For range estimates, InnoDB never returns an estimate that is more than half of the number of records in the table.