Bug #7998 Replication should be more clever about when to replicate RELEASE_LOCK()
Submitted: 19 Jan 2005 1:20 Modified: 15 Mar 2005 10:40
Reporter: Hartmut Holzgraefe
Status: Closed
Category:Server: Replication Severity:S3 (Non-critical)
Version:4.1.8 OS:
Assigned to: Guilhem Bichot Target Version:

[19 Jan 2005 1:20] Hartmut Holzgraefe
Description:
The Master logs all RELEASE_LOCK() calls in the binary log although only those related to
GET_LOCK() calls used in INSERT() or UPDATE() statements would need to be locked.

How to repeat:
Set up replication, do a SELECT GET_LOCK('foo'); DO RELEASE_LOCK('foo');
have a look at the binlog contents
[30 Jan 2005 23:29] Guilhem Bichot
The idea to fix this ( (C) Sergei G): in GET_LOCK(), do nothing if this is slave thread.
So, no need for RELEASE_LOCK(). Work to do:
1) verify that this is really ok in all cases
2) fix our testsuite which uses the fact that slave can be blocked with GET_LOCK() (as
it's not going to be the case anymore).
[2 Mar 2005 17:59] Guilhem Bichot
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

fixed in 5.0.3
ChangeSet@1.1778, 2005-03-02 17:52:38+01:00, guilhem@mysql.com
  Last part of fix for BUG#7998 "Replication should be more clever about when to
replicate RELEASE_LOCK()" + fixes after merge
[14 Mar 2005 19:03] Martin Friebe
Not sure where to add this, I hope it gets noticed on a closed bug.

In the changelog it is stated:
 When a client releases a user-level lock, DO RELEASE_LOCK() will not be written to the
binary log anymore

The explanation is that the slave does no more actually take the lock. (if part of an
insert or update), the differentiation between master and slave on lock functions (like
is_free_lock) will now be even bigger. (anyway documented in 6.7
http://dev.mysql.com/doc/mysql/en/replication-features.html)

I would like to add, that this change might add other problems too. The binary lock, can
also be used to restore your db from a snapshot. (recreation from backup, with bin log).
If no release_log is locked anymore, this is more likely to fail. (it would anyway if
locks had been optained via selects, but it would have worked, if locks had been obtained
via update)

If there are any future plans, to make user lock function reproducable (for replication
or for restoration), then this is a step backwards.
[14 Mar 2005 19:22] Martin Friebe
Also if the realease lock is not written, this can affect setups, where the slave is pre
4.11 and the master 4.11 or higher.

If an application executes many get_lock("A", 3600); do release_lock("A");
the slave will quickly be hours behind.

Also people could ignore the warning in the doc (above) and replicate (currently
succesfull), by obtaining all there locks in updates/inserts.
If the lock is not taken anymore, for those people replication will break on is_free_lock
[14 Mar 2005 22:38] Guilhem Bichot
Hi Martin,
First, you are right, making such a change in 4.1.11, stable version, would be criminal;
fortunately it's not the case, I make the change only in 5.0.3; the changelog is wrong!
I'm fixing it now. Please accept our apologies, and our thanks for pointing this out
before 4.1.11 goes out.

This said, the change will break replication when master is 5.0.3 and slave is 5.0.2, but
this does not matter as these are all alpha versions.
Replication anyway does not work when master is 5.0 and slave is 3.23 or 4.x, so no issue
here.

Whether replication should replicate the locking situation from master to slave, which is
how it was, and what you are hinting at (I think), is an open question. One could also say
that replication's only responsibility is to replicate data changes; not to block
connections on slave because they are blocked on master (like pre-5.0.3 replication of
GET_LOCK() used to). I say this is an open question, that is I'm not giving an answer to
this question; but I think that other RDBMSs propagate data changes, not locking
situations.

When you write:
> Also people could ignore the warning in the doc (above) and replicate
> (currently succesfull), by obtaining all there locks in
> updates/inserts.

Obtaining all their locks in updates/inserts does not guarantee that the GET_LOCK() will
be successful. GET_LOCK() can fail because of timeout. Imagine this:
connection1: INSERT INTO t VALUES(GET_LOCK("a",2));
connection1 will get the lock immediately, and thus insert 1 into the table.
connection2: INSERT INTO t VALUES(GET_LOCK("a",2));
connection2 will timeout after 2 seconds, and thus insert 0 into the table. GET_LOCK()
does not succeed here and the application should not assume it will always return 1.

Using the values returned by GET_LOCK(), IS_FREE_LOCK(), IS_USED_LOCK() to insert into
tables does not work with replication. The change I made may make the problems a bit more
frequent; but for someone using pre-5.0.3, the problems were just waiting to happen from
time to time, when there was an unusual concurrency... So I don't think it's an issue to
make the problem possibly more frequent.

> If the lock is not taken anymore, for those people replication will
> break on is_free_lock

Replication of IS_FREE_LOCK() was already broken. I explained it in the manual:
The functions handling user-level locks: @code{GET_LOCK()},
@code{RELEASE_LOCK()}, @code{IS_FREE_LOCK()}, @code{IS_USED_LOCK()} are
replicated without the slave knowing the concurrency context on master; so
these functions should not be used to insert into a master's table as the
content on slave would differ (i.e. do not do
@code{INSERT INTO mytable VALUES(GET_LOCK(...))}.

There is no easy way we can make them replicate properly. They will be replicated
properly (i.e. the above INSERT will insert the same data on master and slave; I'm not
talking about propagating the lock itself) when we feature row-level binary logging (i.e.
instead of writing the queries to the binlog, we'll write the row changes - being coded
now for 5.1).

The reason I had to do the change about RELEASE_LOCK() was that we had a customer who was
doing SELECT GET_LOCK(); some read-only stuff; SELECT RELEASE_LOCK();
and so got a binlog containing *almost only* millions of useless DO RELEASE_LOCK().
Eating disk space and network traffic for nothing.

I understand some of the explanations below may not sound satisfying to you. I hope you
can work around any difficulties added for you by this change.
[15 Mar 2005 10:12] Martin Friebe
Thanks for the explanation, absolutly fine by this.

I agree that the actual "lock" (as in block others) is not necessarly to be replicated
(and would not guarantee equal results on the slave/restoration run). I will add a
feature request for the replication of  the results of get_lock, is_free_lock, and
release_lock, in a timestamp like manner (include the results/state in the log entry, if
needed by query)

Thanks
[15 Mar 2005 10:40] Guilhem Bichot
Hi Martin,
I think there is one case where timestamp-like replication of GET_LOCK() and others would
not work:
imagine a table "t" contains a column "a" with these values (N rows):
"lock1"
"lock2"
...
"lockN"
and one does:
INSERT INTO u SELECT GET_LOCK(a,2) FROM t;
Then there is no way, with a SET VALUE_FOR_GET_LOCK prepended to the query in the binlog,
to make this replicate properly, because the SET can only set one value, not N.
This case (when the argument of the function is multiple rows of a table) is the reason
why we currently don't replicate LOAD_FILE() and UID() with the timestamp-like way.
I think the only perfect fix will be row-level binary logging.