MySQL Bugs: #88719: binlog purge opens and searches all binlogs for GTIDs, even if GTIDs not in use

Bug #88719	binlog purge opens and searches all binlogs for GTIDs, even if GTIDs not in use
Submitted:	30 Nov 2017 14:26	Modified:	9 Jan 2018 10:38
Reporter:	Riccardo Pizzi	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.7.19	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
We have noticed that every time purge_logs_before_date() is invoked (currently, every time a binlog is rotated), the server opens each and every binlog available to look for global transaction IDs, even if these are not in use.

When GTIDs not in use, purging binlogs should not require to open and inspect each file. We noticed this on a server with over 3,500 binlog files, and these are all examined every minute or so... we consider this a serious regression.

Some debugging indicates that the reason is the variable lost_gtids not being set to NULL when binlog_cc:init_gtid_sets() is executed (check is around line 4767). 

This is not happening in latest 5.6.

How to repeat:
Use a debugger and set a breakpoint in the above function, when GTIDs not in use the mentioned variable isn't NULL and therefore binlog inspection loop is executed

Hi Riccardo,

Thanks for the report, will check it out, but why in the name of.. you have 2000+ binlog files there?

thanks
Bogdan

busy server, and 4 days retention

Hi,

Verified as described. 

thanks for submission
all best
Bogdan

Note that there is a workaround that works. Set the binlog_gtid_simple_recovery to ON.

https://dev.mysql.com/doc/refman/5.7/en/replication-options-gtids.html#sysvar_binlog_gtid_... 

kind regards
Bogdan

Sorry Bogdan, but it is already set to ON so it doesn't seem to help, at least in my case.

Hi,

That's interesting, neither me nor the dev from replication team looking at this can't reproduce the problem with binlog_gtid_simple_recovery set to ON.

Can you please triple check :D

thanks
Bogdan

It is on by default in 5.7.19, you know that. Not using GTIDs, I don't see why I should bother to change any GTIDs related default anyways...

Thanks
Rick

Hi,
Yes, it is on since 5.7.7, but I tried it with ON originally and didn't reproduced the problem so I assumed you have it OFF as with OFF I can reproduce the problem... guy's from dev team returned it back to me with "known workaround" but I do still believe it's a bug even when it goes needlessly trough all files with OFF too.

Anyhow I'll retest again with different settings but would appreciate if you can give me your exact my.cnf just to not waste too much time chasing different settings.

thanks
Bogdan

pid-file                       =/var/run/mysqld/mysqld.pid
port                           = 3306

user                           = mysql
socket                         = /db/data/mysql.sock
pid_file                       = /db/data/mysql.pid

max_allowed_packet             = 128M
max_connect_errors             = 65535
wait_timeout                   = 3600
sysdate_is_now                 = 1
default_storage_engine         = InnoDB
skip-name-resolve              = 1

datadir                        = /db/data
tmpdir                         = /db/tmp

secure_file_priv = ''
sql_mode = 'NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'
log_timestamps = 'SYSTEM'

event_scheduler                = OFF

server-id                      = 7771167
log_bin                        = /storage/binlog/dbmd02
expire_logs_days               = 7
sync_binlog                    = 1
binlog_format                  = MIXED
log_slave_updates              = 1
relay-log                      = relay-bin
read_only                      = 1
super_read_only                = 0
binlog_cache_size              = 10M
slave_compressed_protocol      = 1

tmp_table_size                 = 128M
max_heap_table_size            = 128M
sort_buffer_size = 8M
join_buffer_size = 8M
query_cache_type               = 0
query_cache_size               = 0
max_connections                = 4500
thread_cache_size              = 100
open_files_limit               = 65535
table_definition_cache         = 4096
table_open_cache               = 4096

innodb_flush_method = O_DIRECT
innodb_log_files_in_group = 2
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 1
innodb_file_per_table = 1
innodb_buffer_pool_size = 80G

innodb_file_format=barracuda
innodb_log_buffer_size = 64M
innodb_lock_wait_timeout = 120
innodb_write_io_threads=8
innodb_read_io_threads=8

key_buffer_size = 16M

log-error=/db/log/mysql5_errors.log
log_queries_not_using_indexes  = 0
slow_query_log = 1
slow_query_log_file=/db/log/mysql5_slow_query.log
long_query_time = 1

innodb_io_capacity = 600
back_log=1024
auto_increment_offset = 1
auto_increment_increment = 2
innodb_buffer_pool_instances = 8
innodb_adaptive_hash_index=0
innodb_track_changed_pages=ON
innodb_buffer_pool_load_at_startup = ON
innodb_buffer_pool_dump_at_shutdown = ON
log_warnings=2
max_binlog_size=104857600

Whoops, forget to tell this is Percona variation of MySQL.... but just disregard Percona specific variables. Thank you!

Hi Bogdan,

I think  the reason why binlog_gtid_simple_recovery workaround failed is Rick compressed binlog in place on local disk, so all the cold(old) binlogs are all invalid, and a scan is needed to find a valid one.

I agree that gtid_purged set should not be updated while GTID id OFF.

I have verified what actually happens in my case with the variable set to ON: MySQL iterates all compressed binlogs until it finds the first one which is not, and can thus be opened, then it only reads that one (a single binlog).

The manual page for the variable is incorrect, as it  says this search only happens at start or restart and that's not true - it happens at every binlog closure, and also it says it iterates two binlogs while it only iterates one.

Still I don't think it is a good idea to search for the older binlog files every time purge_logs_before_date() is called (at every binlog closure)...

In fact the manual page contradicts itself... it first says "This variable controls how binary log files are iterated during the search for GTIDs when MySQL starts or restarts"  then when talking of the case where it is ON it says "this ensures only two binary log files are iterated during server restart or when binary logs are being purged".

So why is MySQL opening only a single binlog in my case? It is looking for the oldest and the newest says the manual but it doesn't open either of the two, it only opens the first one which can be successfully read then it stops.

Hi,
With regards to documentation, it can always be better but this is really not a typical case.

With regards to reading, you have bunch of invalid binlogs from what I read so behavior is not ideal. How are you compressing old binlogs? After the fact with some logrotate or you use patch from Choury or ?

The bug is still verified as neither the behavior nor the documentation is ideal in this case but with regards to compression it's common to move the compressed binlogs to the second disk (or even second server) so they don't stay there and the whole issue does not exist :D

all best
Bogdan

Hi,

I use my own script as documented in this blog post: http://mysqlnoob.blogspot.it/2016/12/in-place-transparent-compression-of-mysql-binary.logs...

It is a corner case as the servers storage can't be expanded so my solution actually is the only possible solution.

But manual page contradictions aside, binlog scanning:

- should be completely avoided if GTIDs not in use
- regardless of GTIDs use, should not run every time a binlog is closed (what's the purpose of this?)

Also still the behaviour I see is not what the manual describes...

Hi,
What you are doing is wrong on so many levels ... counting on mysql ignoring missformatted files is big time wrong, that can change without notice... mysql can crash or not start or delete them or ... 

- move them out of mysql dir
- edit index file and remove them from there

anyhow, I can't reproduce the problem with "normal" binlog files and with oracle binary. I agree documentation is not ideal but you are really pushing it attm. That is not really a good solution to disk space issues. Also if you are keeping all those files there just for the sake of backup, keeping them with datadir on same disk on same server is not really safe

all best
Bogdan

Sorry to disagree with you but I'm not doing anything wrong.

In my opinion, what is wrong is for mysql to search for GTIDs when they are not in use.

I know you are the king here, so I'll stop here.

Rick

Hi mate, thanks for crowning me but I'm no king, neither here nor anywhere else.. I'm just giving you my opinion.

> what is wrong is for mysql to search for GTIDs when they are not in use.

That *is* wrong and that's why bug is in *verified* state. That's as far as I can take it, now is up to someone else, my work ends there. If the dev team decide to fix it now, later or never I can't say.

What I believe is very wrong, not that it makes this bug any less a bug, is that you are (and according to that blog post you are not alone) giving MySQL archive files masquerading as binlogs and I fear that can bite you when you least expect. I would not be surprised if MySQL in one of next releases just crash when it start reading corrupt binlogs or worse, start filling logs with "ah binlogs are messed up" eating even more disk space :(

IMO the only valid solutions are either get them out of index file or use the archive patch that will compress the binlog files automatically as discussed in bug #48396, #46435 and in associated (deleted) github repo. You might want to chime in the bug #48396 and make the feature request "more prominent".

all best
Bogdan

Hi Riccardo, 

If you need to keep your binlogs safe (for point-in-time recovery purpose for example), the best advice is to save them somewhere else than in the current datadir as Bogdan already advised it.

Tamas made a nice blog about this sometime ago (https://www.percona.com/blog/2012/01/18/backing-up-binary-log-files-with-mysqlbinlog/) as you can stream the binlogs directly to somewhere else.

But of course you are right, if GTID are not used, then the server should not try to find them but we should let the replication developer decide/comment on this (maybe there is some hidden reason?)

Regards,

Hey lefred, nice to read you here :-)

As I have specified, compressing them in place was the only option I had to keep them locally, and I find this better than streaming them over the net to some other place, as if I need them, I need them locally not elsewhere.

There's nothing wrong in my opinion to compress binlogs that MySQL is not supposed to use anymore (well, this holds true for 5.6 but not for 5.7 as I have found).

Actually, I also find it weird the need to read them all at startup.
On another cluster, which has binlog folder on a SAN storage, the server startup takes long time (minutes) because it needs to open and read 4000+ files from the SAN, generating a lot of I/O in the process. Why?
I don't think there is a need to read ALL the old binlogs when MySQL starts, if it has been shut down normally and you do not use GTIDs. What do you think?

Ciao Riccardo,

After some discussion with the development team, we wanted to provide you a better and definite answer.

So, indeed, we recognize that it might be necessary to store compressed binlogs on the same partition or the same physical media as the MySQL data. We also recognize that it may be appealing to store those compressed binlog files with the same name and path as the uncompressed ones (even if I think that in that case pple are really looking for troubles isn't it ?), so the server will purge them automatically. However, this is not something we activerly support. It's impossible for the server to distinguish these files from corruptes ones. While we do handle corner cases even when binlogs files are truly corrupted, we cannot guarantee that there are no cases where performance degrades. So we ALWAYS recommend to the user to move the files to a different location or to use a different filename (e.g. append a .gz suffix), and use some other mechanism to purge those files (e.g. a cronjob).

Now back to your concern about reading the binlogs. In fact, even when gtid_mode=off, the 5.7 server does maintain correct values for gtid_executed and gtid_purged. This is needed because we allow gtid_mode to be switched back and forth dynamically while preserving GTID state. Therefore, when purging a binary log, the server needs to read gtid_purged from the Previous_gtids_log_event of the oldest binary log, even when gtid_mode=OFF. In this procedure, the server has to handle a number of corner cases. One such corner case is that the server at some previous point in time crashed while writing a Previous_gtids_log_event, resulting in an invalid/corrupted binary log. To produce correct values for gtid_purged when that has happened, the procedure for reading Previous_gtids_log_event has move to the next binary log and try again, in case it reads a corrupted binary log. It is this mechanism that causes the server to open all the compressed files in your case.

We could maybe implement very specific workarounds that won't work or won't have the required performance in all the cases, and certainly not the most common. Also as you may already be aware, we have now enable binlogs by default and as soon as we fix some other issues, we would like to have GTID enabled by default too. We really think that the future for most environments will be the use of GTID.

In conclusion, Riccardo, I think that for this problem a minor change in your procedure (like a rename of the compressed binlogs and a cronjob to purge them) would be the recommended option.

Thank you for using MySQL and don't hesitate if you have any other questions.

Felice anno nuovo.

lefred.

Thanks for the thorough explanation.
I still believe that the server should handle GTID ON and GTID OFF situations separately, and save I/O when possible. But of course it's your call.

To me it seems that MySQL ecosystem is actually moving toward (pseudo) sync replication (Galera, group replication) so maybe binary logs will not be needed at all in the future, and all these I/O penalties tied to them will  be avoided altogether.

Thanks again,
Rick

1. I think the official way to backup binlogs is to use mysqlbinlog to read them from the server (not from disk). However that takes some manual work to make mysqlbinlog into a reliable service.

2. A native compressed binlog would be very welcome. That could still allow the server to read them back if needed.
Bug #48396 	Compressed Binlog

3. Note that PostgreSQL allows one to set an archive_command and a restore_command to allow archival and restore. This might be remote, in the cloud or somewhere else. This also allow one to encrypt/restore.

4. I think the server should read the first bytes of the file and check if the match the binlog magic. If it does it should read the first few events to get the Previous_gtids_log_event. This should probably be less than 1M per file.
So the amount of data read should be minimal. I'm not sure this is the case.

5. Shouldn't it use the mysql.gtid_executed table to allow one to use GTID without binlogs? ( https://dev.mysql.com/doc/refman/5.7/en/replication-gtids-concepts.html#replication-gtids-... ).