MySQL Bugs: #7998: Replication should be more clever about when to replicate RELEASE

Bug #7998	Replication should be more clever about when to replicate RELEASE_LOCK()
Submitted:	19 Jan 2005 0:20	Modified:	15 Mar 2005 9:40
Reporter:	Hartmut Holzgraefe	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	4.1.8	OS:
Assigned to:	Guilhem Bichot	CPU Architecture:	Any

Description:
The Master logs all RELEASE_LOCK() calls in the binary log although only those related to GET_LOCK() calls used in INSERT() or UPDATE() statements would need to be locked.

How to repeat:
Set up replication, do a SELECT GET_LOCK('foo'); DO RELEASE_LOCK('foo');
have a look at the binlog contents

The idea to fix this ( (C) Sergei G): in GET_LOCK(), do nothing if this is slave thread. So, no need for RELEASE_LOCK(). Work to do:
1) verify that this is really ok in all cases
2) fix our testsuite which uses the fact that slave can be blocked with GET_LOCK() (as it's not going to be the case anymore).

Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

fixed in 5.0.3
ChangeSet@1.1778, 2005-03-02 17:52:38+01:00, guilhem@mysql.com
  Last part of fix for BUG#7998 "Replication should be more clever about when to replicate RELEASE_LOCK()" + fixes after merge

Not sure where to add this, I hope it gets noticed on a closed bug.

In the changelog it is stated:
 When a client releases a user-level lock, DO RELEASE_LOCK() will not be written to the binary log anymore

The explanation is that the slave does no more actually take the lock. (if part of an insert or update), the differentiation between master and slave on lock functions (like is_free_lock) will now be even bigger. (anyway documented in 6.7 http://dev.mysql.com/doc/mysql/en/replication-features.html)

I would like to add, that this change might add other problems too. The binary lock, can also be used to restore your db from a snapshot. (recreation from backup, with bin log). If no release_log is locked anymore, this is more likely to fail. (it would anyway if locks had been optained via selects, but it would have worked, if locks had been obtained via update)

If there are any future plans, to make user lock function reproducable (for replication or for restoration), then this is a step backwards.

Also if the realease lock is not written, this can affect setups, where the slave is pre 4.11 and the master 4.11 or higher.

If an application executes many get_lock("A", 3600); do release_lock("A");
the slave will quickly be hours behind.

Also people could ignore the warning in the doc (above) and replicate (currently succesfull), by obtaining all there locks in updates/inserts.
If the lock is not taken anymore, for those people replication will break on is_free_lock

Hi Martin,
First, you are right, making such a change in 4.1.11, stable version, would be criminal; fortunately it's not the case, I make the change only in 5.0.3; the changelog is wrong! I'm fixing it now. Please accept our apologies, and our thanks for pointing this out before 4.1.11 goes out.

This said, the change will break replication when master is 5.0.3 and slave is 5.0.2, but this does not matter as these are all alpha versions.
Replication anyway does not work when master is 5.0 and slave is 3.23 or 4.x, so no issue here.

Whether replication should replicate the locking situation from master to slave, which is how it was, and what you are hinting at (I think), is an open question. One could also say that replication's only responsibility is to replicate data changes; not to block connections on slave because they are blocked on master (like pre-5.0.3 replication of GET_LOCK() used to). I say this is an open question, that is I'm not giving an answer to this question; but I think that other RDBMSs propagate data changes, not locking situations.

When you write:
> Also people could ignore the warning in the doc (above) and replicate
> (currently succesfull), by obtaining all there locks in
> updates/inserts.

Obtaining all their locks in updates/inserts does not guarantee that the GET_LOCK() will be successful. GET_LOCK() can fail because of timeout. Imagine this:
connection1: INSERT INTO t VALUES(GET_LOCK("a",2));
connection1 will get the lock immediately, and thus insert 1 into the table.
connection2: INSERT INTO t VALUES(GET_LOCK("a",2));
connection2 will timeout after 2 seconds, and thus insert 0 into the table. GET_LOCK() does not succeed here and the application should not assume it will always return 1.

Using the values returned by GET_LOCK(), IS_FREE_LOCK(), IS_USED_LOCK() to insert into tables does not work with replication. The change I made may make the problems a bit more frequent; but for someone using pre-5.0.3, the problems were just waiting to happen from time to time, when there was an unusual concurrency... So I don't think it's an issue to make the problem possibly more frequent.

> If the lock is not taken anymore, for those people replication will
> break on is_free_lock

Replication of IS_FREE_LOCK() was already broken. I explained it in the manual:
The functions handling user-level locks: @code{GET_LOCK()},
@code{RELEASE_LOCK()}, @code{IS_FREE_LOCK()}, @code{IS_USED_LOCK()} are
replicated without the slave knowing the concurrency context on master; so
these functions should not be used to insert into a master's table as the
content on slave would differ (i.e. do not do
@code{INSERT INTO mytable VALUES(GET_LOCK(...))}.

There is no easy way we can make them replicate properly. They will be replicated properly (i.e. the above INSERT will insert the same data on master and slave; I'm not talking about propagating the lock itself) when we feature row-level binary logging (i.e. instead of writing the queries to the binlog, we'll write the row changes - being coded now for 5.1).

The reason I had to do the change about RELEASE_LOCK() was that we had a customer who was doing SELECT GET_LOCK(); some read-only stuff; SELECT RELEASE_LOCK();
and so got a binlog containing *almost only* millions of useless DO RELEASE_LOCK(). Eating disk space and network traffic for nothing.

I understand some of the explanations below may not sound satisfying to you. I hope you can work around any difficulties added for you by this change.

Thanks for the explanation, absolutly fine by this.

I agree that the actual "lock" (as in block others) is not necessarly to be replicated (and would not guarantee equal results on the slave/restoration run). I will add a feature request for the replication of  the results of get_lock, is_free_lock, and release_lock, in a timestamp like manner (include the results/state in the log entry, if needed by query)

Thanks

Hi Martin,
I think there is one case where timestamp-like replication of GET_LOCK() and others would not work:
imagine a table "t" contains a column "a" with these values (N rows):
"lock1"
"lock2"
...
"lockN"
and one does:
INSERT INTO u SELECT GET_LOCK(a,2) FROM t;
Then there is no way, with a SET VALUE_FOR_GET_LOCK prepended to the query in the binlog, to make this replicate properly, because the SET can only set one value, not N.
This case (when the argument of the function is multiple rows of a table) is the reason why we currently don't replicate LOAD_FILE() and UID() with the timestamp-like way.
I think the only perfect fix will be row-level binary logging.