Bug #78192 InnoDB estimation is way too wrong for column with mostly the same values
Submitted: 24 Aug 2015 13:55 Modified: 27 Aug 2015 8:48
Reporter: Andrii Nikitin Email Updates:
Status: Duplicate Impact on me:
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.5.25 OS:Any
Assigned to: CPU Architecture:Any

[24 Aug 2015 13:55] Andrii Nikitin
InnoDB estimates twice less rows for column which has mostly the same values. This further affects Optimizer as it suggests very weak index for the condition.

How to repeat:
happens without regard of innodb_persistent_stat configuration
Able to repeat in 5.6.25, 5.5.45, 5.5.25
5.1.63 didn't show problem in my tests

drop table b;
create table b(a int(1), index(a));

insert into b select 1 from  mysql.help_topic a, mysql.help_topic b LIMIT 15000;
# Records: 15000  Duplicates: 0  Warnings: 0

update b set a=2 where rand()>0.75 limit 100;
# Rows matched: 100  Changed: 100  Warnings: 0

explain select count(*) from b where a=1;
# | 1 | SIMPLE | b | ref  | a | a | 5 | const | 7500 | Using index |

select count(*) from b where a=1;
# |    14900 |

Suggested fix:
Row estimation should give much less error
[24 Aug 2015 13:57] Andrii Nikitin
This most probably is duplicate of bug #73386  with the only difference that this bug was not repeated on 5.1 so far, while 73386 was.
[27 Aug 2015 8:38] Olav Sandstå
As suggested by Andrii, this is a duplicte of Bug#73386. For range estimates, InnoDB never returns an estimate that is more than half of the number of records in the table.