Bug #82963 Rows appear in P_S when using DISTINCT, GROUP_BY, or UNION
Submitted: 13 Sep 2016 2:53 Modified: 15 Sep 2016 9:53
Reporter: monty solomon Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S3 (Non-critical)
Version:5.7.15 OS:Any
Assigned to: Marc ALFF CPU Architecture:Any

[13 Sep 2016 2:53] monty solomon
Description:
Running a query against the P_ S or sys tables returns fewer rows than when running the query with DISTINCT, GROUP BY, or UNION.

How to repeat:
Execute the following statement and observe the number of rows returned.

SELECT `performance_schema`.`memory_summary_global_by_event_name`.`EVENT_NAME` AS `event_name` FROM `performance_schema`.`memory_summary_global_by_event_name` WHERE (`performance_schema`.`memory_summary_global_by_event_name`.`CURRENT_NUMBER_OF_BYTES_USED` > 0);

Then execute the following statement and observe the number of rows returned is higher.

SELECT DISTINCT`performance_schema`.`memory_summary_global_by_event_name`.`EVENT_NAME` AS `event_name` FROM `performance_schema`.`memory_summary_global_by_event_name` WHERE (`performance_schema`.`memory_summary_global_by_event_name`.`CURRENT_NUMBER_OF_BYTES_USED` > 0);

The following rows appear when using DISTINCT.

memory/memory/HP_SHARE
memory/memory/HP_INFO
memory/memory/HP_PTRS

There are similar discrepancies when executing the following two statements.

SELECT event_name, current_alloc FROM sys.x$memory_global_by_current_bytes;

SELECT event_name, current_alloc FROM sys.x$memory_global_by_current_bytes UNION SELECT event_name, current_alloc FROM sys.x$memory_global_by_current_bytes;

The following statement using GROUP BY returns rows containing "memory/memory" that do not appear when not using the GROUP BY.

SELECT SUBSTRING_INDEX(event_name,'/',2) AS code_area FROM sys.x$memory_global_by_current_bytes GROUP BY SUBSTRING_INDEX(event_name,'/',2) ORDER BY code_area;

SELECT SUBSTRING_INDEX(event_name,'/',2) AS code_area FROM sys.x$memory_global_by_current_bytes ORDER BY code_area;
[13 Sep 2016 2:57] monty solomon
Using GROUP BY with the alias returns a different value for the row that appears as described in the previous comment.

mysql> SELECT SUBSTRING_INDEX(event_name,'/',2) AS code_area, SUM(current_alloc) AS current_alloc FROM sys.x$memory_global_by_current_bytes GROUP BY SUBSTRING_INDEX(event_name,'/',2) ORDER BY SUM(current_alloc) DESC;
+---------------------------+---------------+
| code_area                 | current_alloc |
+---------------------------+---------------+
| memory/performance_schema |     158562408 |
| memory/sql                |       1890283 |
| memory/memory             |       1243240 |
| memory/innodb             |        335480 |
| memory/myisam             |        116067 |
| memory/mysys              |           770 |
+---------------------------+---------------+
6 rows in set (0.00 sec)

mysql> SELECT SUBSTRING_INDEX(event_name,'/',2) AS code_area, SUM(current_alloc) AS current_alloc FROM sys.x$memory_global_by_current_bytes GROUP BY code_area ORDER BY SUM(current_alloc) DESC;
+---------------------------+---------------+
| code_area                 | current_alloc |
+---------------------------+---------------+
| memory/performance_schema |     158562408 |
| memory/sql                |       1889003 |
| memory/memory             |       1420688 |
| memory/innodb             |        335480 |
| memory/myisam             |        116067 |
| memory/mysys              |           770 |
+---------------------------+---------------+
6 rows in set (0.00 sec)
[14 Sep 2016 6:34] MySQL Verification Team
Hello monty,

Thank you for the report.

Thanks,
Umesh
[15 Sep 2016 9:51] Marc ALFF
Table performance_schema.memory_summary_global_by_event_name
reports the memory used -- currently -- in the entire server,
broken down by memory instruments.

This includes the memory used when executing the query itself.

By changing the query structure, the execution changed to perform additional operations, in particular using a temporary HEAP table.

So, with a group by clause, a heap table is used, and this allocates more memory ... reported by the performance schema memory statistics table.

Rows are "added" in the query result because the WHERE clause uses:
  CURRENT_NUMBER_OF_BYTES_USED > 0
but the table itself did not change: the list of rows is the same,
it corresponds to all the memory instruments present in the server code.

Note that executing concurrently another query that allocates memory from a different area in the server would cause additional rows to be reported too,
and this is actually the expected result for the query written.

Closing as not a bug.