Bug #85771 The optimizer should ignore distinct for queries on the PK column(s)
Submitted: 3 Apr 2017 21:47 Modified: 4 Apr 2017 15:12
Reporter: monty solomon Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version: OS:Any
Assigned to: CPU Architecture:Any
Tags: 5.7.12

[3 Apr 2017 21:47] monty solomon
Description:
The optimizer does not understand that the PK is unique and therefore the distinct is a no-op and can be ignored.

How to repeat:
CREATE TABLE `domain_crawled_times` (
  `domain` varchar(255) NOT NULL,
  `lastCrawledTimestamp` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`domain`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8

mysql> explain select count(distinct domain) from domain_crawled_times\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: domain_crawled_times
   partitions: NULL
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 767
          ref: NULL
         rows: 24860758
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.01 sec)

mysql> explain select count(domain) from domain_crawled_times\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Select tables optimized away
1 row in set, 1 warning (0.00 sec)

Suggested fix:
Update optimizer code
[4 Apr 2017 15:12] MySQL Verification Team
Hi!

I fully agree with you and find it a very welcome feature request for our optimizer.

Verified.