Bug #89559 P_S recording wrong digest/digest_text for select statements using views
Submitted: 6 Feb 2018 20:45 Modified: 2 Jul 2021 17:49
Reporter: dob bod Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S3 (Non-critical)
Version:5.7.21 OS:Any
Assigned to: Marc ALFF CPU Architecture:Any
Tags: digest, DIGEST_TEXT, events_statements_summary_by_digest, performance_schema

[6 Feb 2018 20:45] dob bod
Description:
The table performance_schema.events_statements_summary_by_digest is recording the same digest/digest_text for very different select statements that use views. Actually, the digest/digest_text of the view is being recorded. 
The bug is seen in MySQL 8.0 too.
I suspect that MySQL Enterprise Monitor is affected by this bug too, grouping several different statements under the same digest.

How to repeat:
1) Execute any select statement that uses a view. 
2) Find the view's digest/digest_text in the table events_statements_summary_by_digest:
select * from performance_schema.events_statements_summary_by_digest order by LAST_SEEN desc limit 3;
3) Add a where condition or a join to the statement. Execute again.
4) Observe the difference in the COUNT_STAR column:
select * from performance_schema.events_statements_summary_by_digest order by LAST_SEEN desc limit 3;
*) In MySQL 8.0.4, observe the difference in the fields DIGEST_TEXT and QUERY_SAMPLE_TEXT.

Suggested fix:
The digest should be calculated based on the text of the select statement instead the text of the view.
[7 Feb 2018 16:36] MySQL Verification Team
Hi!

Thank you for your bug report. 

What is the algorithm used for the views that you were testing ???

If your views were TEMPTABLE, then your report should be looked at. However, if you use MERGE views, then there is no way of separating queries on those views with queries on the tables used by those vuews.

Thank you .....
[7 Feb 2018 16:37] MySQL Verification Team
If you use TEMPTABLE views, then please provide the exact P_S queries that you have used ....
[7 Feb 2018 19:27] dob bod
Hi!
In my tests, It seems that the results are the same independently of the algorithm.
The script I am using is below:

create database tempor;

use tempor;

create table temp_a (  id integer primary key, nome varchar(50) );

create table temp_b (  id integer primary key, id_a integer, nome varchar(50), foreign key (id_a) references temp_a(id) );

insert into temp_a values ( 1, 'casa');

insert into temp_a values ( 2, 'barco');

insert into temp_a values ( 3, 'predio');

insert into temp_a values ( 4, 'navio');

insert into temp_b values ( 1 , 1 , 'triplex');

insert into temp_b values ( 2 , 2 , 'lancha model B');

insert into temp_b values ( 3 , 1 , 'sobrado');

insert into temp_b values ( 4 , 1 , 'mansao');

#create ALGORITHM = MERGE view ab_view as ( select a.id as id_a, a.nome as nome_a, b.id as id_b, b.nome as nome_b from temp_a a, temp_b b where a.id = b.id_a );
create ALGORITHM = TEMPTABLE view ab_view as ( select a.id as id_a, a.nome as nome_a, b.id as id_b, b.nome as nome_b from temp_a a, temp_b b where a.id = b.id_a );

select sleep(5);

select * from ab_view;

#the previous select statement statistics are recorded under the DIGEST / DIGEST_TEXT of the view
select * from performance_schema.events_statements_summary_by_digest order by LAST_SEEN desc limit 5\G

select sleep(1);

select * from ab_view where nome_b like '%tri%';

#the previous select statement statistics are recorded under the DIGEST / DIGEST_TEXT of the view
select * from performance_schema.events_statements_summary_by_digest order by LAST_SEEN desc limit 5\G

select sleep(1);

select a.* from temp_a a , ab_view b where a.id = b.id_a;

#the previous select statement statistics are recorded under the DIGEST / DIGEST_TEXT of the view
select * from performance_schema.events_statements_summary_by_digest order by LAST_SEEN desc limit 5\G

select sleep(1);

select a.* from temp_a a , ab_view b where a.id = b.id_a and a.nome like '%bar%';

#the previous select statement statistics are recorded under the DIGEST / DIGEST_TEXT of the view
select * from performance_schema.events_statements_summary_by_digest order by LAST_SEEN desc limit 5\G

select sleep(1);

select a.* from temp_a a , ab_view b where a.id = b.id_a and a.nome like '%bar%' and a.id = 2;

#the previous select statement statistics are recorded under the DIGEST / DIGEST_TEXT of the view
select * from performance_schema.events_statements_summary_by_digest order by LAST_SEEN desc limit 5\G

select sleep(1);

select * from ab_view where id_a = 1;

#the previous select statement statistics are recorded under the DIGEST / DIGEST_TEXT of the view
select * from performance_schema.events_statements_summary_by_digest order by LAST_SEEN desc limit 5\G

drop database tempor;
[7 Feb 2018 20:02] dob bod
In other words, when I execute "select * from ab_view where nome_b like '%tri%'", I expect to see something like "select * from ab_view where nome_b like '?'" as a DIGEST_TEXT, and when I execute "select a.* from temp_a a , ab_view b where a.id = b.id_a and a.nome like '%bar%'", I expect to see "select a.* from temp_a a , ab_view b where a.id = b.id_a and a.nome like '%?%'". However, in both cases, I am seeing the view's digest:
"( SELECT `a` . `id` AS `id_a` , `a` . `nome` AS `nome_a` , `b` . `id` AS `id_b` , `b` . `nome` AS `nome_b` FROM `tempor` . `temp_a` `a` JOIN `tempor` . `temp_b` `b` WHERE ( `a` . `id` = `b` . `id_a` ) )"
[8 Feb 2018 13:24] MySQL Verification Team
Hi!

Specifying algorithm in CREATE VIEW does not need necessarily that MySQL will use that algorithm. Sometimes it can't use it and will choose, by itself, the other one. So, the only thing missing after CREATE VIEW , in both cases, is that you send us view attributes, so that we can see what was the actual algorithm used.

Thank you in advance.
[8 Feb 2018 19:32] dob bod
script output with view merge

Attachment: MERGE.txt (text/plain), 38.57 KiB.

[8 Feb 2018 19:33] dob bod
script output with view temptable

Attachment: TEMPTABLE.txt (text/plain), 37.66 KiB.

[8 Feb 2018 19:35] dob bod
Hello!

What command do I need to execute to get the view attributes?
I suppose MySQL is choosing the specified algorithm. When using "ALGORITHM = TEMPTABLE", SUM_CREATED_TMP_TABLES is incremented. When using "ALGORITHM = MERGE", it's not.

Thank you.
[9 Feb 2018 13:03] MySQL Verification Team
Hi!

You have not provided any info that we asked from you in your two latest attachments.

Also, our Reference Manual has all the answers to the questions that you asked. Read entire 23.5 chapter and mostly sections 23.5.2 and 23.5.5.
[9 Feb 2018 15:14] dob bod
Hi! 

Sorry. I think I know what you want now:

show create table ab_view
--------------
*************************** 1. row ***************************
                View: ab_view
         Create View: CREATE ALGORITHM=TEMPTABLE DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `ab_view` AS (select `a`.`id` AS `id_a`,`a`.`nome` AS `nome_a`,`b`.`id` AS `id_b`,`b`.`nome` AS `nome_b` from (`temp_a` `a` join `temp_b` `b`) where (`a`.`id` = `b`.`id_a`))
character_set_client: utf8
collation_connection: utf8_general_ci
1 row in set (0.00 sec)

select * from views where table_name like 'ab_view'
--------------
*************************** 1. row ***************************
       TABLE_CATALOG: def
        TABLE_SCHEMA: tempor
          TABLE_NAME: ab_view
     VIEW_DEFINITION: (select `a`.`id` AS `id_a`,`a`.`nome` AS `nome_a`,`b`.`id` AS `id_b`,`b`.`nome` AS `nome_b` from `tempor`.`temp_a` `a` join `tempor`.`temp_b` `b` where (`a`.`id` = `b`.`id_a`))
        CHECK_OPTION: NONE
        IS_UPDATABLE: NO
             DEFINER: root@localhost
       SECURITY_TYPE: DEFINER
CHARACTER_SET_CLIENT: utf8
COLLATION_CONNECTION: utf8_general_ci
1 row in set (0.00 sec)

Thank you.
[9 Feb 2018 16:19] MySQL Verification Team
Hi!

How have you setup statement digest instrumentation ???

Can you also let us know the setup of the P_S consumers and P_S events ???
[9 Feb 2018 16:59] dob bod
performance_schema setup

Attachment: SHOW_SETUP.txt (text/plain), 127.86 KiB.

[9 Feb 2018 16:59] dob bod
Hello!

The case is just an example. It was ran in a fresh install with default configuration. No setup changes were done.
That being said, the configurations I have here are attached above.

Thanks
[12 Feb 2018 17:15] MySQL Verification Team
The output from the test case.

Attachment: output.txt (text/plain), 43.04 KiB.

[12 Feb 2018 17:16] MySQL Verification Team
Hi!

I have got a result that is uploaded in the file named output.txt.

Is it the same result as yours ???

If it is the same result, what is the problem ??? That query's text is changed and does not include view name ???
[12 Feb 2018 22:16] dob bod
Hi!!
The last select is not the same, but I can explain the problem anyway.
The problem is that every query should have their own unique DIGEST and DIGEST_TEXT, since they are different. For instance, 
"select * from ab_view where nome_b like '%tri%'", 
should have a DIGEST_TEXT like 
"select * from ab_view where nome_b like '?'", 
and 
"select a.* from temp_a a , ab_view b where a.id = b.id_a and a.nome like '%bar%' and a.id = 2"
should have a DIGEST_TEXT like
"select a.* from temp_a a , ab_view b where a.id = b.id_a and a.nome like '?' and a.id = ?".
However, every execution that has a view involved is recorded under the DIGEST 89c4e1026cee35d60dd22568af23905f. All the statistics for different queries are grouped. For instance, SUM_SELECT_FULL_JOIN has a value of 2 in the end. How could I know which of the queries is doing SUM_SELECT_FULL_JOIN?

Thank you.
[13 Feb 2018 16:24] dob bod
The last select is not the same, but I can explain the problem anyway.
The problem is that every query should have their own unique DIGEST and DIGEST_TEXT, since they are different. For instance, 
"select * from ab_view where nome_b like '%tri%'", 
should have a DIGEST_TEXT like 
"select * from ab_view where nome_b like '?'", 
and 
"select a.* from temp_a a , ab_view b where a.id = b.id_a and a.nome like '%bar%' and a.id = 2"
should have a DIGEST_TEXT like
"select a.* from temp_a a , ab_view b where a.id = b.id_a and a.nome like '?' and a.id = ?".
However, every execution that has a view involved is recorded under the DIGEST 89c4e1026cee35d60dd22568af23905f. All the statistics for different queries are grouped. For instance, SUM_SELECT_FULL_JOIN has a value of 2 in the end. How could I know which of the queries is doing SELECT_FULL_JOIN?

Thank you.
[13 Feb 2018 16:37] dob bod
Sorry for duplicating the answer, I was trying to correct the text.
[13 Feb 2018 17:01] MySQL Verification Team
Hi!

Turns out this is intended behaviour. However, I do agree with you so I am verifying this report as the feature request.
[14 Feb 2018 16:14] dob bod
Hi!!

It would be great if that feature could be implemented. 
Tools based on the performance_schema (like MySQL Enterprise Monitor, Percona PMM, etc ) can be almost useless if a lot of view are used (is my case).

Thank you very much!
[23 May 2019 7:09] Uday Sitaram
Hello Sinisa/Team,

May I know how come this is a feature request rather a bug. I could reproduce the case locally with the help of the sample queries provided. If the same behavior is present even with ALGORITHM=TEMPTABLE, I am not sure why this is not a bug. It would help if any references to code that supports this feature request thing are shared in this. 

Also, please let us know if there a way, we can ensure this will be a bug rather a feature request.
[23 May 2019 13:07] MySQL Verification Team
Last question was already answered by me, if you took time to read all comments.

Current behaviour is the one by design. I decided to make it a feature request, that would be available as an option.
[30 Jun 2021 8:33] Marc ALFF
All,

Upon revisiting and re evaluating this report, some clarifications are needed here.

When a query like "SELECT * FROM v1" is executed, the query digest is supposed to be computed from the query text, and from the query text only.

In this case, the tokens seen are 'SELECT', '*', 'FROM', 'v1', which represents the digest text, and a hash of it is computed for the digest itself.

This query digest should not change whether 'v1' is a table or a view, and even less change depending on the view definition.

As a result, the original report filed was indeed correct, and the behavior observed with views is indeed considered a bug.

Re classifying as a bug report.
[30 Jun 2021 11:35] MySQL Verification Team
Thank you, Marc.
[2 Jul 2021 17:49] Paul DuBois
Posted by developer:
 
Fixed in 5.7.36, 8.0.27.

For all SELECT statements on a view, the query digest was based on
the view definition. As a result, different queries had the same
digest and aggregated together in the Performance Schema
events_statements_summary_by_digest table, so statistics in that
table were not usable for distinguishing distinct SELECT statements.

The query digest for each SELECT statement on a view now is based on
the SELECT, not the view definition. This enables distinguishing
distinct SELECT statements in the events_statements_summary_by_digest
table. However, tools that use query digests may need some adjustment
to account for this change. For example, MySQL Enterprise Firewall
and query rewrite plugins rely on query digests and existing rules
for them that are associated with views may need to be updated.
[5 Jul 2021 11:34] MySQL Verification Team
Thank you, Paul.