Bug #99121 Performance schema does a meaningless summary according to digest
Submitted: 31 Mar 2020 8:52 Modified: 31 Mar 2020 11:27
Reporter: Xiong Wang Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S3 (Non-critical)
Version:8.0, 8.0.19 OS:Any
Assigned to: CPU Architecture:Any

[31 Mar 2020 8:52] Xiong Wang
Description:
Currently, the performance schema uses SQL_DIGEST to do summary. However, the different query might have the same digest. For example,
CREATE TABLE t (a INT, b INT);
SELECT * FROM t ORDER BY 1;SELECT * FROM t ORDER BY 2;

Here "ORDER BY 1" means to sort by the first column. "ORDER BY 2" means to sort by the second column.

The above two query have the same digest now, SELECT * FROM t ORDER BY ?. Such a digest may cause performance schema do a meaningless summary like the following test:

mysql> select * from performance_schema.events_statements_summary_by_digest where digest='78e2366a980be718863a08ce28b72254d9a82c8d10d794da97ff53609a737414'\G;
*************************** 1. row ***************************
                SCHEMA_NAME: test
                     DIGEST: 78e2366a980be718863a08ce28b72254d9a82c8d10d794da97ff53609a737414
                DIGEST_TEXT: SELECT * FROM `t` ORDER BY ?, ...
                 COUNT_STAR: 3
             SUM_TIMER_WAIT: 7987850000
             MIN_TIMER_WAIT: 2267225000
             AVG_TIMER_WAIT: 2662616000
             MAX_TIMER_WAIT: 3063698000
              SUM_LOCK_TIME: 2981000000
                 SUM_ERRORS: 0
               SUM_WARNINGS: 0
          SUM_ROWS_AFFECTED: 0
              SUM_ROWS_SENT: 3
          SUM_ROWS_EXAMINED: 5
SUM_CREATED_TMP_DISK_TABLES: 0
     SUM_CREATED_TMP_TABLES: 0
       SUM_SELECT_FULL_JOIN: 0
 SUM_SELECT_FULL_RANGE_JOIN: 0
           SUM_SELECT_RANGE: 0
     SUM_SELECT_RANGE_CHECK: 0
            SUM_SELECT_SCAN: 3
      SUM_SORT_MERGE_PASSES: 0
             SUM_SORT_RANGE: 0
              SUM_SORT_ROWS: 2
              SUM_SORT_SCAN: 2
          SUM_NO_INDEX_USED: 2
     SUM_NO_GOOD_INDEX_USED: 0
                 FIRST_SEEN: 2020-03-31 15:29:10.469038
                  LAST_SEEN: 2020-03-31 15:37:23.503679
                QUANTILE_95: 3162277660
                QUANTILE_99: 3162277660
               QUANTILE_999: 3162277660
          QUERY_SAMPLE_TEXT: select * from t order by 2, '3'
          QUERY_SAMPLE_SEEN: 2020-03-31 15:37:23.503679
    QUERY_SAMPLE_TIMER_WAIT: 2267225000
1 row in set (0.00 sec)

ERROR:
No query specified

mysql> select * from t order by 1, '3';
+---+------+------+------+
| a | b    | c    | d    |
+---+------+------+------+
| 1 |    2 | NULL | NULL |
+---+------+------+------+
1 row in set (0.00 sec)

mysql> select * from performance_schema.events_statements_summary_by_digest where digest='78e2366a980be718863a08ce28b72254d9a82c8d10d794da97ff53609a737414'\G;
*************************** 1. row ***************************
                SCHEMA_NAME: test
                     DIGEST: 78e2366a980be718863a08ce28b72254d9a82c8d10d794da97ff53609a737414
                DIGEST_TEXT: SELECT * FROM `t` ORDER BY ?, ...
                 COUNT_STAR: 4
             SUM_TIMER_WAIT: 9987118000
             MIN_TIMER_WAIT: 1999268000
             AVG_TIMER_WAIT: 2496779000
             MAX_TIMER_WAIT: 3063698000
              SUM_LOCK_TIME: 3722000000
                 SUM_ERRORS: 0
               SUM_WARNINGS: 0
          SUM_ROWS_AFFECTED: 0
              SUM_ROWS_SENT: 4
          SUM_ROWS_EXAMINED: 6
SUM_CREATED_TMP_DISK_TABLES: 0
     SUM_CREATED_TMP_TABLES: 0
       SUM_SELECT_FULL_JOIN: 0
 SUM_SELECT_FULL_RANGE_JOIN: 0
           SUM_SELECT_RANGE: 0
     SUM_SELECT_RANGE_CHECK: 0
            SUM_SELECT_SCAN: 4
      SUM_SORT_MERGE_PASSES: 0
             SUM_SORT_RANGE: 0
              SUM_SORT_ROWS: 2
              SUM_SORT_SCAN: 2
          SUM_NO_INDEX_USED: 2
     SUM_NO_GOOD_INDEX_USED: 0
                 FIRST_SEEN: 2020-03-31 15:29:10.469038
                  LAST_SEEN: 2020-03-31 16:42:18.154381
                QUANTILE_95: 3162277660
                QUANTILE_99: 3162277660
               QUANTILE_999: 3162277660
          QUERY_SAMPLE_TEXT: select * from t order by 1, '3'
          QUERY_SAMPLE_SEEN: 2020-03-31 16:42:18.154381
    QUERY_SAMPLE_TIMER_WAIT: 1999268000
1 row in set (0.00 sec)

The above two queries have nothing to with each other. But because digest is computed to be the same, they are summarized. IMO, such an aggregation is incorrect. 

How to repeat:
Reference the part of Description.

Suggested fix:
I made a patch which can make digest computation to distinguish ”SELECT * FROM t ORDER BY 1“ with “SELECT * FROM t ORDER BY 2”. If you verify the above problem, I would like to contribute.
[31 Mar 2020 11:27] MySQL Verification Team
Hello Xiong Wang,

Thank you for the report and feedback.

regards,
Umesh
[23 Jun 2020 2:23] Xiong Wang
one patch to make digest distinguish ORDER/GROUP BY const with other const

Attachment: 0001-Issue-28046405-Improve-the-sql-digest-so-that-perfor.patch (application/octet-stream, text), 52.01 KiB.