Bug #89379 summary_by_user tables get extremely slow with many users.
Submitted: 24 Jan 2018 11:41 Modified: 24 Jan 2018 12:26
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S5 (Performance)
Version:5.7.20, 5.7.21 OS:Any
Assigned to: Marc ALFF CPU Architecture:Any

[24 Jan 2018 11:41] Daniël van Eeden
Description:
Some P_S queries stared to take multiple minutes on a system with many users.

Setting performance_schema_accounts_size doesn't work for this as that would also influence the accounts table 

mysql> select count(*) from performance_schema.events_statements_summary_by_user_by_event_name;
+----------+
| count(*) |
+----------+
|   297730 |
+----------+
1 row in set (4 min 37.37 sec)

mysql> select count(*) from performance_schema.events_statements_summary_by_program;
+----------+
| count(*) |
+----------+
|       17 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from performance_schema.events_statements_summary_by_thread_by_event_name;
+----------+
| count(*) |
+----------+
|   769120 |
+----------+
1 row in set (0.41 sec)

mysql> select count(*) from performance_schema.accounts;
+----------+
| count(*) |
+----------+
|     5572 |
+----------+
1 row in set (1.83 sec)

Might be related to:
Bug #88834 	Uneven slowdown on systems with many users

'perf top' shows these:
PFS_connection_iterator::visit_user
sanitize_account 

How to repeat:
On a system with many users (e.g. 5k entries in mysql.user) query the P_S tables which summarize by account
[24 Jan 2018 11:53] Daniël van Eeden
Set to "S5 (Performance)"
[24 Jan 2018 12:26] MySQL Verification Team
Hello Daniël,

Thank you for the report and feedback.

Thanks,
Umesh
[18 Jun 2018 8:35] MySQL Verification Team
Saw this on 5.7.21 today while trying to dump the P_S tables...

mysql> show processlist;
+-----+------+-----------+--------------------+---------+------+--------------+-------------------------------------------------------------------------------+
| Id  | User | Host      | db                 | Command | Time | State        | Info                                                                          |
+-----+------+-----------+--------------------+---------+------+--------------+-------------------------------------------------------------------------------+
| 151 | root | ::1:61922 | performance_schema | Query   |    0 | starting     | show processlist                                                              |
| 293 | root | ::1:62142 | performance_schema | Query   | 1748 | Sending data | SELECT /*!40001 SQL_NO_CACHE */ * FROM `memory_summary_by_user_by_event_name` |
+-----+------+-----------+--------------------+---------+------+--------------+-------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> select count(*) from mysql.user;
+----------+
| count(*) |
+----------+
|    31169 |
+----------+
1 row in set (0.00 sec) 

Stacks taken a few times look like:
Call Site
mysqld!PFS_connection_iterator::visit_user
mysqld!table_mems_by_user_by_event_name::make_row
mysqld!table_mems_by_user_by_event_name::rnd_next
mysqld!ha_perfschema::rnd_next
mysqld!handler::ha_rnd_next
mysqld!rr_sequential
mysqld!sub_select
mysqld!do_select
mysqld!JOIN::exec
...
[13 Oct 2020 0:44] Pei Chih Wen
This also happens in MySQL 8.0.21.

mysql> select count(*) from performance_schema.events_statements_summary_by_user_by_event_name;
+----------+
| count(*) |
+----------+
|    16458 |
+----------+
1 row in set (4.49 sec)

mysql> select count(*) from performance_schema.events_statements_summary_by_program;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from performance_schema.events_statements_summary_by_thread_by_event_name;
+----------+
| count(*) |
+----------+
|   183148 |
+----------+
1 row in set (0.07 sec)

mysql> select count(*) from performance_schema.accounts;
+----------+
| count(*) |
+----------+
|    11000 |
+----------+
1 row in set (0.14 sec)

And this blocks some of our metric collection because group by user queries take a few seconds to return. Also a weird thing is that processlist shows there are only around 800 connecting users but somehow performance_schema.accounts has much more entries.

mysql> select count(*) from information_schema.processlist;
+----------+
| count(*) |
+----------+
|      790 |
+----------+
1 row in set (0.01 sec)
[30 Aug 2021 11:19] Simon Mudd
On an 8.0.25 host I see:

root@host-8-0-25 [(none)]> select count(*), count(distinct event_name), count(distinct user) from performance_schema.events_statements_summary_by_user_by_event_name;
+----------+----------------------------+----------------------+
| count(*) | count(distinct event_name) | count(distinct user) |
+----------+----------------------------+----------------------+
|    24476 |                        211 |                  115 |
+----------+----------------------------+----------------------+
1 row in set (26.36 sec)

root@host-8-0-25 [(none)]>
[30 Mar 13:31] Marc ALFF
ANALYSIS
========

This is due to how summary tables are implemented,
by materializing rows per event_name one at a time.

Assuming:

N_THREAD = select count(*) from performance_schema.threads;
N_ACCOUNT = select count(*) from performance_schema.accounts;
N_USER = select count(*) from performance_schema.users;
N_HOST = select count(*) from performance_schema.hosts;

The time complexity of a SELECT statement on each table is as follows.

- table events_xxx_summary_by_thread_by_yyy:
  O(N_THREAD), which is O(N)

- table events_xxx_summary_by_account_by_yyy:
  O(N_ACCOUNT * N_THREAD), which is O(N*N)

- table events_xxx_summary_by_user_by_yyy:
  O(N_USER * N_THREAD) + O(N_ACCOUNT * N_THREAD), which is O(N*N)

- table events_xxx_summary_by_host_by_yyy:
  O(N_HOST * N_THREAD) + O(N_ACCOUNT * N_THREAD), which is O(N*N)

- table events_xxx_summary_global_by_yyy:
  O(N_THREAD) + O(N_ACCOUNT) + O(N_HOST), which is O(N)

This affects waits, stages, statements, transactions and error summaries.
[30 Mar 13:34] Marc ALFF
In particular,
- table events_waits_summary_by_account_by_event_name
- table events_waits_summary_by_user_by_event_name
- table events_waits_summary_by_host_by_event_name

- table events_stages_summary_by_account_by_event_name
- table events_stages_summary_by_user_by_event_name
- table events_stages_summary_by_host_by_event_name

- table events_statements_summary_by_account_by_event_name
- table events_statements_summary_by_user_by_event_name
- table events_statements_summary_by_host_by_event_name

- table events_errors_summary_by_account_by_error
- table events_errors_summary_by_user_by_error
- table events_errors_summary_by_host_by_error

are affected, because of the O(N*N) loop
[30 Mar 13:37] Marc ALFF
See related:

Bug#106839 provide counters for error 013730 (wait_timeout period exceeded)
[31 Mar 8:21] Simon Mudd
So conclusion for this is that some P_S tables may not be usable "as is" (for busy systems) and we should look at the underlying p_s tables and possibly do the work on the client side, by pulling both tables, applying filtering if needed and using the results as needed?

If that's the case it might be worth documenting this as it may be more practical. 
 Offloading the N*N join to the client may make more sense and allows the server to do other more important work.  Clearly for those unable to do this the results provided may be good enough even if there is potentially a long wait.