Bug #64503 mysql frequently ignores --relay-log-space-limit
Submitted: 1 Mar 2012 5:27 Modified: 14 Mar 2012 8:44
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.1 OS:Any
Assigned to: CPU Architecture:Any

[1 Mar 2012 5:27] Mark Callaghan
Description:
There is a race that allows MySQL to frequently ignore --relay-log-space-limit. To reproduce this:
1) let the SQL slave thread catch up to the IO thread so that it sets rli->ignore_log_space_limit to 1 in next_event()
2) give the SQL thread a workload that it is slow to execute so that it doesn't set ignore_log_space_limit to zero. That is done when the SQL thread exits and when the SQL thread reaches EOF of a log file that is not hot (in use by the IO thread). When it reaches EOF in that case it would call purge_first_log which sets ignore_log_space_limit to 0.

This reproduces the problem because the IO thread ignores --relay-log-space-limit when ignore_log_space_limit is set

How to repeat:
I will attach an MTR case. It is more likely to reproduce the problem when this is added to the start of next_event()

if (rli->ignore_log_space_limit == 1) {
   fprintf(stderr, "next_event sleep total %d, ignore %d\n", 
              (int) rli->log_space_total, (int) rli->ignore_log_space_limit); 
    usleep(200000); 
}

Then tail var/log/mysqld.2.err when running the attached mtr case. I reproduced this for unmodified 5.1.60 and with --relay-log-space-limit=1000000 there was more than 4MB of relay logs.

Suggested fix:
Don't know yet but the protocol for using/reading ignore_log_space_limit doesn't work.
[1 Mar 2012 5:29] Mark Callaghan
repro case

Attachment: rpl_trx_limit.test (application/octet-stream, text), 2.04 KiB.

[1 Mar 2012 5:29] Mark Callaghan
repro case

Attachment: rpl_trx_limit-master.opt (application/octet-stream, text), 26 bytes.

[1 Mar 2012 5:29] Mark Callaghan
repro case

Attachment: rpl_trx_limit-master.opt (application/octet-stream, text), 26 bytes.

[1 Mar 2012 5:29] Mark Callaghan
repro case

Attachment: rpl_trx_limit-slave.opt (application/octet-stream, text), 104 bytes.

[1 Mar 2012 5:31] Mark Callaghan
An example of the output in var/log/mysqld.2.err from the repro case. "total" should not exceed 1M

next_event sleep total 4038688, ignore 1
next_event sleep total 4044657, ignore 1
next_event sleep total 4056897, ignore 1
next_event sleep total 4060569, ignore 1
next_event sleep total 4064649, ignore 1
next_event sleep total 4069545, ignore 1
next_event sleep total 4073625, ignore 1
next_event sleep total 4078113, ignore 1
next_event sleep total 4081785, ignore 1
next_event sleep total 4085865, ignore 1
[2 Mar 2012 2:09] Mark Callaghan
Wow, who would have thought that a variable named "ignore_log_space_limit" causes the problem?
I tested a simple hack:
* change appendv() to return a variable indicating whether a new log file is opened
* change queue_event() to set rli->ignore_log_space_limit=0 when appendv() created a new file

With this change relay-log-space-limit is respected.
[6 Mar 2012 11:34] MySQL Verification Team
http://mysqlha.blogspot.com/2012/03/mysql-can-ignore-relay-log-space-limit.html
[14 Mar 2012 8:44] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html
[14 Mar 2012 8:45] Jon Stephens
Fixed in 5.1+. Documented in the 5.1.63, 5.523, and 5.6.6 changelogs as follows:

        The --relay-log-space-limit option was sometimes ignored.

        More specifically, when the SQL thread went to sleep, it allowed
        the I/O thread to queue additional events in such a way that the
        relay log space limit was bypassed, and the number of events in
        the queue could grow well past the point where the relay logs
        needed to be rotated. Now in such cases, the SQL thread checks to 
        see whether the I/O thread should rotate and provide the SQL 
        thread a chance to purge the logs (thus freeing space).

        Note that, when the SQL thread is in the middle of a
        transaction, it cannot purge the logs; it can only ask for more
        events until the transaction is complete. Once the transaction
        is finished, the SQL thread can immediately instruct the I/O
        thread to rotate.

Closed.