Bug #69382 Performance schema, degradation under stress when internal buffers are full
Submitted: 2 Jun 2013 19:42 Modified: 9 Jul 2013 14:16
Reporter: Takashi Ichii Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S3 (Non-critical)
Version:5.6.11 OS:Any
Assigned to: Marc ALFF CPU Architecture:Any

[2 Jun 2013 19:42] Takashi Ichii
Description:
The temporary file for load data infile passes through find_or_create_file() in ./storage/perfschema/pfs_instr.cc. However, in most case we cannot see it
in P_S.file_instances because it's temporary.

If performance_schema_max_file_instances is big, likely because there're many partitioned
tables and so open_files_limit is big, it seems to take painful time.

I'm wondering whether it's the issue that find_or_create_file() is slow, or just about temporary files.

How to repeat:
I examined on Solaris11.1 with dtrace but I believe the issue doesn't depend on platform.

1. restart mysqld

2. trace with this script:
--
#!/usr/sbin/dtrace -s

pid$target::*find_or_create_file*:entry
/self->start == 0/
{
    self->start = vtimestamp;
    self->klass = copyinstr(arg2);
}

pid$target::*find_or_create_file*:return
/self->start != 0/
{
    elapsed = vtimestamp - self->start;
    printf("%s\t%d", self->klass, elapsed);
    self->start = 0;
}
--

3. if there's a load data event in relaylog, you can see it in the output of dtrace.

4. I set open_files_limit to 65,536, then performance_schema_max_file_instances is 100,825.
In this situation, I create many tables like this, to simulate many temporary tables:
--
$ seq 100000 | xargs -i -P12 mysql -Dtest -e'create table foo{} (id int) engine=myisam; drop table foo{};'
--

5. The elapsed time in the output of dtrace tends to be large.

Suggested fix:
- modify mysql_file_open() in sql/sql_load.cc to skip PSI_FILE_CALL
- modify the search algorithm in find_or_create_file() in storage/perfschema/pfs_instr.cc
to be faster even if file_max is large number.
[11 Jun 2013 20:35] Marc ALFF
One possible root cause that can create overhead with the performance schema file io instrumentation is not having enough files instrumented for the load.

To check this, run:
SHOW GLOBAL STATUS LIKE "performance_schema%";

In general, every lost counter should stay at zero.

A value in Performance_schema_file_instances_lost indicate how many times the internal performance schema table got full, which causes find_or_create_file() to spin.

A value that continues to climb is a definitive sign of a tuning problem, and the server is operating in very degraded conditions in this case.

The solution is to increase the value of performance_schema_max_file_instances.

Now, assuming sizing of performance_schema_max_file_instances is not the issue or has been fixed, a related question I guess is how to disable this instrumentation.

There is no need to modify code to disable a particular instrumentation point, like file io performed in sql_load in this case.

The first parameter given to mysql_file_open() in sql/sql_load.cc is key_file_load, which indicates how the file io is instrumented.

In this case (see mysqld.cc, all_server_files[], entry key_file_load),
the instrument is exposed in table performance_schema.setup_instruments as "wait/io/file/sql/load"

To disable file io for this specific place,
either do it dynamically:

UPDATE performance_schema.setup_instruments
  SET ENABLED='NO' WHERE NAME="wait/io/file/sql/load"

or even statically in the my.cnf file:

performance-schema-instrument='wait/io/file/sql/load=OFF'

This will cause the code, when invoking the instrumentation point mysql_file_open(key_file_load, ...), to not collect performance data, and not go into find_or_create_file()

To fully investigate this issue,
please indicate the result of:

SHOW ENGINE PERFORMANCE_SCHEMA STATUS;
SHOW GLOBAL VARIABLES LIKE "performance_schema%";
SHOW GLOBAL STATUS LIKE "performance_schema%";
select count(*) from performance_schema.file_instances;
select event_name, count(event_name) from performance_schema.file_instances group by event_name;
[12 Jun 2013 11:12] Marc ALFF
Hi Takashi,

I got the data, thanks for the feedback.
Since the data was private, I am not mentioning numbers here.

The following parameters are too low:
- performance_schema_max_file_instances
- performance_schema_max_table_instances

causing instrumentation failures as reported in:
- Performance_schema_file_instances_lost
- Performance_schema_table_instances_lost

To resolve the problem,

1) Increase performance_schema_max_table_instances

The performance schema can not keep up with the table instrumentation,
because there are more tables in the database than performance_schema_max_table_instances.

Heuristic to size this parameter:
- count or estimate how many tables are in the database
- account for growth if necessary
- multiply this number by 1/0.80, so that the internal buffer is at worst 80% full at peak volume.

2) Increase performance_schema_max_file_instances

This parameter is severely undersized, causing a very high growth in Performance_schema_file_instances_lost, and the root cause of overhead here.

Heuristic to size this parameter:
- count or estimate how many files are in the file system, used by mysql (count how many files are in the data directory)
- account for miscellaneous (logs, replication, temporary) files 
- account for growth if necessary
- multiply this number by 1/0.80, so that the internal buffer is at worst 80% full at peak volume.

Depending on the storage engine used, account for at least 2 (frm, ibd) or 3 (frm, MYD, MYI) files per table, so if there are a lot of tables, a lot of files will be needed to.

3) Restart the server.

If restarting the server is not practical, then until then disable dynamically the entire file io instrumentation, and the table instrumentation, to avoid a degraded execution.

UPDATE performance_schema.setup_instruments
  SET ENABLED='NO'
  WHERE NAME like "wait/io/file/%";

UPDATE performance_schema.setup_instruments
  SET ENABLED='NO'
  WHERE NAME like "wait/io/table/%"

UPDATE performance_schema.setup_instruments
  SET ENABLED='NO'
  WHERE NAME like "wait/lock/table/%"

Regards,
-- Marc
[12 Jun 2013 11:17] Takashi Ichii
Hi Marc,

Thank you for your comments. I got what is the point of my issue.

I checked the value of file_instances_lost and max_file_instances at both master and slave.
Please note that this server is different from my first comment but same issue is there.

At master:
> | Performance_schema_file_instances_lost        | 5950079  |
> | performance_schema_max_file_instances                  | 25207 |

I must fix the value of performance_schema_max_file_instances as you said.
(I understand just increasing the value may not make sense because the lost value is too large.)

At slave:
> | Performance_schema_file_instances_lost        | 388121684 |
> | performance_schema_max_file_instances                  | 25207 |

The lost value is extremely large rather than that of master.

Then, I find out it's because of key_file_log_event_data at Create_file_log_event::do_apply_event
or Append_block_log_event in sql/log_event.cc.

So I think disabling wait/io/file/log_event_data gets my issue resolved.
Please tell me If my understanding is wrong.
[12 Jun 2013 11:20] Marc ALFF
When the performance schema internal buffers are undersized, the performance schema can not instrument all objects, and some events are reported as "lost".

This is expected given the sizing available.

Now, in this particular mode of execution (internal buffers 100% full),
the performance schema enter a spin loop that causes severe performance degradations.

This (the degradation) is a verified bug.
The performance schema should still report lost events, but at least not consume so much time once a table has been detected as full.

Changing the title accordingly.
[12 Jun 2013 11:21] Takashi Ichii
Sorry I couldn't post my comment after uploading the file without a break.
I'll check your newest comments.
[12 Jun 2013 11:29] Marc ALFF
Hi Takashi,

The problem is that there is not enough room to keep track of all files.

This is not specific to a given file (LOAD DATA INFILE, replication logs, ...), so the same issue will appear again for a different file io.

The best is to disable ALL the file io then (wait/io/file/%).

Also, note that the file io statistics reported by the performance schema are partial (under estimated), given that some files were not instrumented.

-- Marc
[12 Jun 2013 12:16] Takashi Ichii
Hi Marc,

> 2) Increase performance_schema_max_file_instances
...
> - account for miscellaneous (logs, replication, temporary) files 

Yes, the point for me is that I should take into account replication of load data queries
also creates many temporary files.

> This is not specific to a given file (LOAD DATA INFILE, replication logs, ...), so the same issue will appear again for a different file io.

It's not very easy to estimate how many temporary files would be created
for replicating load data queries, compared to the number of files for tables.

> The performance schema should still report lost events, but at least not consume so much time once a table has been detected as full.

Anyway, this is perfect for me.

> The best is to disable ALL the file io then (wait/io/file/%).

I hope this would be just a workaround because P_S is useful...

Thank you,
Takashi
[9 Jul 2013 14:16] Paul DuBois
Noted in 5.6.13, 5.7.2 changelogs.

When an internal buffer was too small for the workload, the
Performance Schema could spend a lot of time in an internal spin loop
attempting to allocate a memory buffer, and fail.