Bug #79285 Undocumented negative low estimates in p_s.memory_summary_global_by_event_name
Submitted: 15 Nov 2015 21:38 Modified: 20 Nov 2015 9:37
Reporter: Szymon Komendera Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S4 (Feature request)
Version:5.7, 5.7.9 OS:Any
Assigned to: CPU Architecture:Any

[15 Nov 2015 21:38] Szymon Komendera
Description:
## BACKGROUND:

Performance schema documentation contains the following statement regarding values for low estimates in memory summary tables:

"
For lower estimates in summary tables other than memory_summary_global_by_event_name, it is possible for values to go negative if memory ownership is transferred between threads.
"

Reference: https://dev.mysql.com/doc/refman/5.7/en/memory-summary-tables.html

The above suggests that "memory_summary_global_by_event_name" should not contain negative values for low estimates, or at least it does not explain why such values may be present in said table.

## ISSUE:

Table "memory_summary_global_by_event_name" reports negative low estimates for internal memory structures. See below for the observed behavior of these counters.

How to repeat:
## GENERAL DESCRIPTION:

1. Launch MySQL 5.7.9 with performance_schema enabled,
2. Enable memory instrumentation and confirm there are no negative values in the table (yet),
3. Generate some traffic against internal/user tables to cause memory allocation/deallocation in internal MySQL structures. 
4. While generating traffic, continuously check counters and observe negative values appearing in the table.

## DEMO:

*** SESSION 1 ***

mysql> status
--------------
mysql  Ver 14.14 Distrib 5.7.9, for Linux (x86_64) using  EditLine wrapper

(cut)
Server version:		5.7.9 Source distribution
Protocol version:	10
Connection:		127.0.0.1 via TCP/IP
(cut)
Uptime:			10 sec

Threads: 1  Questions: 96  Slow queries: 0  Opens: 192  Flush tables: 1  Open tables: 185  Queries per second avg: 9.600
--------------

mysql> update performance_schema.setup_instruments set enabled = 'yes' where name like 'memory/%';
Query OK, 305 rows affected (0.00 sec)
Rows matched: 375  Changed: 305  Warnings: 0

mysql> select * from performance_schema.memory_summary_global_by_event_name where low_count_used < 0\G
Empty set (0.00 sec)

*** SESSION 2 ***

mysql@sandbox:~$ mysql -h127.0.0.1 -P5709 -uroot -ppass
(cut)

mysql> select count(*) from information_schema.tables where table_schema = 'repro';
+----------+
| count(*) |
+----------+
|    20000 |
+----------+
1 row in set (0.08 sec)

Tables in schema "repro" are empty InnoDB tables with two INT columns, named table_1 through table_2000.

mysql> quit

(start running scripts to generate dummy traffic)

mysql@sandbox:~$ for i in `seq 1 10000`; do mysql -h127.0.0.1 -P5709 -uroot -ppass -e "select count(*) from repro.table_$i;" > /dev/null 2>&1; done;
(waited for script completion)
mysql@sandbox:~$ for i in `seq 1 10000`; do mysql -h127.0.0.1 -P5709 -uroot -ppass -e "select count(*) from repro.table_$i;" > /dev/null 2>&1; done;
(waited for script completion)
mysql@sandbox:~$ for i in `seq 1 10000`; do mysql -h127.0.0.1 -P5709 -uroot -ppass -e "select count(*) from repro.table_$i;" > /dev/null 2>&1; done;
(terminated during execution)

*** SESSION 1 ***

While running scripts, pull data from performance_schema.memory_summary_global_by_event_name. In this case I'm pulling selected values for innodb/memory/mem0mem so that the result is more readable, however similar behavior can be seen for other memory/innodb/% events.

mysql@sandbox:~$ while true; do mysql -B --skip-column-names -h127.0.0.1 -P5709 -uroot -ppass -e "select low_count_used, current_count_used, high_count_used from performance_schema.memory_summary_global_by_event_name where event_name = 'memory/innodb/mem0mem';" 2>/dev/null && sleep 1; done;
0	0	0
0	0	0
0	0	0
0	1050	1055 <== this is where the 1st bash loop started
0	4935	4940
0	8790	8795
0	12645	12650
0	16515	16520
0	20385	20390
0	24365	24365
0	28306	28310
0	32053	32058
0	35734	35740
0	39401	39408
0	43083	43089
0	46779	46786
0	50475	50482
0	54171	54178
0	57853	57860
0	61521	61528
0	65217	65223
0	68915	68920
0	72597	72603
0	76280	76286
0	79990	79997
0	83742	83748
0	87424	87430
0	91119	91126
0	94800	94807
0	98498	98502
0	102166	102170
0	105846	105852
0	109515	109521
0	113197	113204
0	116851	116858
0	120518	120525
0	124186	124193
0	127858	127860
0	131591	131597
0	135302	135308
0	139026	139033
0	141993	142000
0	141992	142000
0	141991	142000
0	141991	142000
-112126	29864	142000 <== this is where the 1st bash loop completed
-112126	29864	142000
-112140	29849	142000
-112140	29848	142000
-112140	33362	145508
-112140	37086	149233
-112140	40795	152942
-112140	44520	156666
-112140	48229	160376
-132384	31612	164002
-132384	35337	167727
-132384	39080	171470
-132384	42832	175223
-132384	46528	178919
-132384	50238	182628
-132384	53948	186339
-132384	57671	190062
-132384	61423	193814
-132384	65161	197552
-132384	68870	201261
-132384	72594	204985
-132384	76389	208779
-132384	80141	212532
-132384	83893	216284
-132384	87646	220036
-132384	91439	223830
-132384	95206	227596
-132384	98961	231348
-132384	102681	235071
-132384	106419	238810
-132384	110171	242561
-132384	113966	246356
-132384	117717	250108
-132384	121455	253846
-132384	125207	257598
-132384	128946	261336
-132384	132684	265075
-132384	136408	268798
-132384	140187	272578
-132384	141994	274384
-132384	141996	274385
-132384	141993	274385
-132384	141992	274385
-132384	141992	274385
-244496	29880	274385 <== this is where the 2nd bash loop completed
-244496	29879	274385
-244496	29878	274385
-244496	29878	274385
-244524	30801	275332
-244524	34553	279084
-244524	38291	282821
-244524	42057	286587
-244524	45852	290382
-244524	49618	294148
-244524	53314	297845
-269696	31810	301512
-269696	35565	305268
-269696	35984	305687 <== this is where the 3rd bash loop was interrupted
^C

## OBSERVATIONS

1. We have negative values in the table, which does not seem to be a documented behavior.
2. Value "high_count_used" (3rd column) is always growing and appears to be roughly the sum of "low_count_used" and "current_count_used". This should not happen because, per documentation, "high_count_used" should correspond to maximum value reached by "current_count_used", which is not the case as proven by above output. Here, "current_count_used" drops but "high_count_used" keeps increasing. 

Suggested fix:
Document the behavior and meaning of low/current/high counters in aforementioned p_s table.