Description:
Currently, the performance schema uses SQL_DIGEST to do summary. However, the different query might have the same digest. For example,
CREATE TABLE t (a INT, b INT);
SELECT * FROM t ORDER BY 1;SELECT * FROM t ORDER BY 2;
Here "ORDER BY 1" means to sort by the first column. "ORDER BY 2" means to sort by the second column.
The above two query have the same digest now, SELECT * FROM t ORDER BY ?. Such a digest may cause performance schema do a meaningless summary like the following test:
mysql> select * from performance_schema.events_statements_summary_by_digest where digest='78e2366a980be718863a08ce28b72254d9a82c8d10d794da97ff53609a737414'\G;
*************************** 1. row ***************************
SCHEMA_NAME: test
DIGEST: 78e2366a980be718863a08ce28b72254d9a82c8d10d794da97ff53609a737414
DIGEST_TEXT: SELECT * FROM `t` ORDER BY ?, ...
COUNT_STAR: 3
SUM_TIMER_WAIT: 7987850000
MIN_TIMER_WAIT: 2267225000
AVG_TIMER_WAIT: 2662616000
MAX_TIMER_WAIT: 3063698000
SUM_LOCK_TIME: 2981000000
SUM_ERRORS: 0
SUM_WARNINGS: 0
SUM_ROWS_AFFECTED: 0
SUM_ROWS_SENT: 3
SUM_ROWS_EXAMINED: 5
SUM_CREATED_TMP_DISK_TABLES: 0
SUM_CREATED_TMP_TABLES: 0
SUM_SELECT_FULL_JOIN: 0
SUM_SELECT_FULL_RANGE_JOIN: 0
SUM_SELECT_RANGE: 0
SUM_SELECT_RANGE_CHECK: 0
SUM_SELECT_SCAN: 3
SUM_SORT_MERGE_PASSES: 0
SUM_SORT_RANGE: 0
SUM_SORT_ROWS: 2
SUM_SORT_SCAN: 2
SUM_NO_INDEX_USED: 2
SUM_NO_GOOD_INDEX_USED: 0
FIRST_SEEN: 2020-03-31 15:29:10.469038
LAST_SEEN: 2020-03-31 15:37:23.503679
QUANTILE_95: 3162277660
QUANTILE_99: 3162277660
QUANTILE_999: 3162277660
QUERY_SAMPLE_TEXT: select * from t order by 2, '3'
QUERY_SAMPLE_SEEN: 2020-03-31 15:37:23.503679
QUERY_SAMPLE_TIMER_WAIT: 2267225000
1 row in set (0.00 sec)
ERROR:
No query specified
mysql> select * from t order by 1, '3';
+---+------+------+------+
| a | b | c | d |
+---+------+------+------+
| 1 | 2 | NULL | NULL |
+---+------+------+------+
1 row in set (0.00 sec)
mysql> select * from performance_schema.events_statements_summary_by_digest where digest='78e2366a980be718863a08ce28b72254d9a82c8d10d794da97ff53609a737414'\G;
*************************** 1. row ***************************
SCHEMA_NAME: test
DIGEST: 78e2366a980be718863a08ce28b72254d9a82c8d10d794da97ff53609a737414
DIGEST_TEXT: SELECT * FROM `t` ORDER BY ?, ...
COUNT_STAR: 4
SUM_TIMER_WAIT: 9987118000
MIN_TIMER_WAIT: 1999268000
AVG_TIMER_WAIT: 2496779000
MAX_TIMER_WAIT: 3063698000
SUM_LOCK_TIME: 3722000000
SUM_ERRORS: 0
SUM_WARNINGS: 0
SUM_ROWS_AFFECTED: 0
SUM_ROWS_SENT: 4
SUM_ROWS_EXAMINED: 6
SUM_CREATED_TMP_DISK_TABLES: 0
SUM_CREATED_TMP_TABLES: 0
SUM_SELECT_FULL_JOIN: 0
SUM_SELECT_FULL_RANGE_JOIN: 0
SUM_SELECT_RANGE: 0
SUM_SELECT_RANGE_CHECK: 0
SUM_SELECT_SCAN: 4
SUM_SORT_MERGE_PASSES: 0
SUM_SORT_RANGE: 0
SUM_SORT_ROWS: 2
SUM_SORT_SCAN: 2
SUM_NO_INDEX_USED: 2
SUM_NO_GOOD_INDEX_USED: 0
FIRST_SEEN: 2020-03-31 15:29:10.469038
LAST_SEEN: 2020-03-31 16:42:18.154381
QUANTILE_95: 3162277660
QUANTILE_99: 3162277660
QUANTILE_999: 3162277660
QUERY_SAMPLE_TEXT: select * from t order by 1, '3'
QUERY_SAMPLE_SEEN: 2020-03-31 16:42:18.154381
QUERY_SAMPLE_TIMER_WAIT: 1999268000
1 row in set (0.00 sec)
The above two queries have nothing to with each other. But because digest is computed to be the same, they are summarized. IMO, such an aggregation is incorrect.
How to repeat:
Reference the part of Description.
Suggested fix:
I made a patch which can make digest computation to distinguish ”SELECT * FROM t ORDER BY 1“ with “SELECT * FROM t ORDER BY 2”. If you verify the above problem, I would like to contribute.