Bug #26622 MASTER_POS_WAIT does not work as documented
Submitted: 25 Feb 2007 20:05 Modified: 26 Mar 2008 20:16
Reporter: Baron Schwartz (Basic Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.0.24a-Debian_9-log, 4.1 BK, 5.1 BK, 5.2-falcon OS:Linux (Linux, Ubuntu)
Assigned to: Andrei Elkin CPU Architecture:Any
Tags: qc

[25 Feb 2007 20:05] Baron Schwartz
Description:
MASTER_POS_WAIT return values are different than expected when the server is not a slave.  It returns -1 instead of NULL.

How to repeat:
mysql> help master_pos_wait
Name: 'MASTER_POS_WAIT'
Description:
Syntax:
MASTER_POS_WAIT(log_name,log_pos[,timeout])

This function is useful for control of master/slave synchronization. It
blocks until the slave has read and applied all updates up to the
specified position in the master log. The return value is the number of
log events the slave had to wait for to advance to the specified
position. The function returns NULL if the slave SQL thread is not
started, the slave's master information is not initialized, the
arguments are incorrect, or an error occurs. It returns -1 if the
timeout has been exceeded. If the slave SQL thread stops while
MASTER_POS_WAIT() is waiting, the function returns NULL. If the slave
is past the specified position, the function returns immediately.

mysql> show slave status;
Empty set (0.00 sec)

mysql> select master_pos_wait('foo', 98);
+----------------------------+
| master_pos_wait('foo', 98) |
+----------------------------+
|                         -1 | 
+----------------------------+
1 row in set (0.00 sec)
[26 Feb 2007 10:33] Sveta Smirnova
test case

Attachment: rpl_bug26622.test (application/octet-stream, text), 171 bytes.

[26 Feb 2007 10:34] Sveta Smirnova
Thank you for the report.

Verified as described. All versions are affected.
[4 Sep 2007 16:15] Andrei Elkin
As the help claims that "the function returns NULL if the slave SQL thread is not started" the result must be NULL not -1.
[4 Sep 2007 16:51] Guilhem Bichot
For the bug fixer: this patch should fix the problem:
===== sql/rpl_rli.cc 1.12 vs edited =====
*** /tmp/bk_rpl_rli.cc-1.12_F40QVS      2007-06-09 08:29:43 +02:00
--- edited/sql/rpl_rli.cc       2007-09-04 18:50:01 +02:00
***************
*** 611,617 ****
    DBUG_ENTER("st_relay_log_info::wait_for_pos");

    if (!inited)
!     DBUG_RETURN(-1);

    DBUG_PRINT("enter",("log_name: '%s'  log_pos: %lu  timeout: %lu",
                        log_name->c_ptr(), (ulong) log_pos, (ulong) timeout));
--- 611,617 ----
    DBUG_ENTER("st_relay_log_info::wait_for_pos");

    if (!inited)
!     DBUG_RETURN(-2);

    DBUG_PRINT("enter",("log_name: '%s'  log_pos: %lu  timeout: %lu",
                        log_name->c_ptr(), (ulong) log_pos, (ulong) timeout));
and something similar for 5.0.
[13 Dec 2007 16:08] Daniel Saavedra
Any hint on when this bug will be fixed in 5.0 branch? We were planning to use 5.0.45 for a production environment with our own way to manage replication and we depend on MASTER_POS_WAIT to work as documented, otherwise we will have to work out a dirty workaround for this.
[14 Dec 2007 14:15] Daniel Saavedra
We have implemented a workaround for this that stays in a while loop if master_pos_wait returns -1 but exits after n retries.

Still after master_pos_wait returns -1 incorrectly and we decide to break the loop and go on using that DB, all subsequent writes are rejected. We are using apache commons pool with JDBC driver and we get this error:

Can't execute the query because you have a conflicting read lock

Could this be a side effect of the master_pos_wait erroneous return value? I know our setup is specific (master-master replication and using two connection pools to access the DBs), but don't know how to reproduce it using your test case.... 

We have even tried to execute an unlock tables after aborting the while loop but with no effect. Is something we might be doing wrong? The user has enough permission. Doing unlock tables with root user through mysql has no effect.
[6 Mar 2008 12:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43517

ChangeSet@1.2595, 2008-03-06 14:49:21+02:00, aelkin@mysql1000.(none) +3 -0
  Bug#26622  MASTER_POS_WAIT does not work as documented
  
  MASTER_POS_WAIT return values are different than expected when the server is not a slave. 
  It returns -1 instead of NULL.
  
  Fixed with correcting  st_relay_log_info::wait_for_pos() to return the proper
  value in the case of rli info is not inited.
[7 Mar 2008 19:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43617

ChangeSet@1.2596, 2008-03-07 21:14:28+02:00, aelkin@mysql1000.(none) +6 -0
  Bug #26622 MASTER_POS_WAIT does not work as documented
  
  Affected tests fixing. After the fix for st_relay_log_info::wait_for_pos() that
  handles widely used select('master-bin.xxxx',pos) invoked by mysqltest
  there appeared to be four tests that either tried synchronizing when
  the slave was stopped or used incorrect synchronization method like
  to call `sync_with_master' from the current connection being to the
  master itself.
  
  Fixed with correcting the current connection or/and using the correct
  synchronization macro when possible.
[25 Mar 2008 11:21] Bugs System
Pushed into 5.0.60
[25 Mar 2008 11:22] Bugs System
Pushed into 5.1.24-rc
[26 Mar 2008 19:00] Bugs System
Pushed into 6.0.5-alpha
[26 Mar 2008 20:16] Jon Stephens
Documented in the 5.0.60, 5.1.24, and 6.0.5 changelogs as follows:

        MASTER_POS_WAIT() did not return NULL when the server was not a slave.
[30 Mar 2008 0:36] Jon Stephens
Also noted in the 5.1.23-ndb-6.3.11 changelog.
[30 Mar 2008 0:36] Jon Stephens
Also documented in the 5.1.23-ndb-6.3.11 changelog.