MySQL Bugs: #34058: InnoDB recovery prints wrong master binlog position

Bug #34058	InnoDB recovery prints wrong master binlog position
Submitted:	25 Jan 2008 15:21	Modified:	27 Jan 2009 3:30
Reporter:	Baron Schwartz (Basic Quality Contributor)	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.0.40, 5.0.60sp1	OS:	Linux (Gentoo)
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	qc

Description:
I deliberately "crashed" MySQL by taking an LVM snapshot from a slave while the slave was running, without any FLUSH TABLES or other interruption.  I wanted to experiment with cloning a slave this way.

The server uses only InnoDB tables, except for system tables of course.  When I took the LVM snapshot, nothing was happening except a once-per-second "heartbeat" update.

When I copied the snapshot to another machine and started MySQL, InnoDB reported an invalid master binary log position after it performed recovery:

080125  8:04:22  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
080125  8:04:23  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 3239 4262709555.
InnoDB: Doing recovery: scanned up to log sequence number 3239 4267952128
InnoDB: Doing recovery: scanned up to log sequence number 3239 4273195008

[ snip ]

080125  8:07:27  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 5
3 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
InnoDB: Apply batch completed
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 115, file name portland-bin.001717
InnoDB: Last MySQL binlog file position 0 589600615, file name ./galax-bin.001376
080125  8:11:25  InnoDB: Started; log sequence number 3240 887995861
080125  8:11:25 [Note] Recovering after a crash using mysql-bin
080125  8:12:01 [Note] Starting crash recovery...
080125  8:12:01 [Note] Crash recovery finished.
080125  8:12:01 [ERROR] Failed to open the relay log './mysql-relay-bin.003154' (relay_log_pos 643703516)
080125  8:12:01 [ERROR] Could not find target log during relay log initialization
080125  8:12:01 [ERROR] Failed to initialize the master info structure
080125  8:12:01 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.0.40-log'  socket: '/tmp/recover.sock'  port: 0  Gentoo Linux mysql-5.0.40

The part that looks wrong to me is this:

InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 115, file name portland-bin.001717

The master is portland, and on portland the file portland-bin.001717 at position 115 is invalid.  (mysqlbinlog cannot read the event starting there).

The REAL position, as I found by examining the binary log and master.info after recovery, and looking at the log events, is portland-bin.001717, position 643703376.  So InnoDB is reporting a number that is very far from the truth.  If I had started my slave at 115 (even if it were a valid offset) it would replay a lot of events that had already been applied.

A possibly important detail: I did not realize the binary log and InnoDB log contained log file names, and when I set up this slave I wanted to change the binary and relay log file names.  They were galax-bin and galax-relay-bin, and I changed them to mysql-bin and mysql-relay-bin as follows:

* In /etc/mysql/my.cnf
* I renamed all the files on the filesystem
* I did a search/replace in the .index files, relay-log.info files etc

I did this BEFORE I started mysqld, which is probably why I got the "Could not find target log" error at the end of the log message above :-)  Does this have anything to do with the InnoDB master-log offsets?

I have looked in trx_sys_print_mysql_master_log_pos() and trx_sys_print_mysql_binlog_offset() in the InnoDB innobase/trx/trx0sys.c file, but don't understand the code well enough to know what is really happening.

How to repeat:
Repeating will take about 3 hours to take another snapshot and try again :-)  I can do this if you want me to.

Baron,

I am afraid MySQL AB developers removed the code that stores the MASTER binlog position of a SLAVE in InnoDB's data files. And we have forgotten to remove the printf() from the InnoDB crash recovery code.

I hope this one is still correct:

InnoDB: Last MySQL binlog file position 0 589600615, file name ./galax-bin.001376

?

I hope not too many users will suffer from the removal of this feature.

Vasil,

please check if the MASTER binlog position storage of a SLAVE has been removed in 5.0, and remove the corresponding InnoDB printf() if that is the case. Or was it removed only in 5.1?

Regards,

Heikki

That is very disappointing.  This is an extremely useful feature if it works :-(

The binlog position is still correct, though.

Heikki,

Can you tell what version this was ? Was it actually intensional or accidental ? 

There are VERY many cases when it is extremely helpful.

For example it is about only way to deal with Slave which has power failure (so its master.info file is out of sync and relay logs probably trashed)

I can't see when this change was committed.  The most relevant message on the commits list I see is this:

http://lists.mysql.com/commits/11693

But that one just removes code that was already commented out.

We put that functionality back in the Google patch. It is enabled when rpl_transaction_enabled=1 is set.

Either 4.0 or 4.1 did have this code.

The removal may be associated with the XA changes to the code.

Vasil is studying this now.

The code that existed in 4.0 used innobase_commit_low to set the master log offset and filename in the InnoDB tx log. That doesn't catch all operations that commit to InnoDB because innobase_commit_low is a no-op for some calls from DDL code in ha_innodb.cc. The problem is that this code causes an early return from innobase_commit_low:

        if (trx->conc_state == TRX_NOT_STARTED) {

                return;
        }

Callers where it does nothing are (from the 5.0.37 code I use now):
ha_innobase::delete_table
innobase_drop_database
ha_innobase::rename_table

Mark,

That code still exists in 5.0, but (as reported here by Baron) something is broken...

This email thread on internals is about this bug:

http://lists.mysql.com/internals/35290

I set up a replication with latest 5.0 from BK (5.0.56) and deployed a call to trx_sys_print_mysql_master_log_pos() in the beginning of innobase_commit_low() on the slave server for diagnostic purposes.

Then (autocommit is on by default):

on the master:
insert into t values (7);

on the slave observed:
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 4352, file name mysql-bin.000003

on the master:
./bin/mysqlbinlog --start-position=4352 var/mysql-bin.000003  
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4352
#080130 16:04:40 server id 1  end_log_pos 4379  Xid = 61
COMMIT/*!*/;
# at 4379
#080130 16:13:32 server id 1  end_log_pos 4466  Query   thread_id=3     exec_time=0     error_code=0
use test/*!*/;
SET TIMESTAMP=1201702412/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=1, @@session.unique_checks=1/*!*/;
SET @@session.sql_mode=0/*!*/;
/*!\C cp1251 *//*!*/;
SET @@session.character_set_client=51,@@session.collation_connection=51,@@session.collation_server=8/*!*/;
insert into t values (7)/*!*/;
# at 4466
#080130 16:13:32 server id 1  end_log_pos 4493  Xid = 65
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

I.e. it looks correct. It is also correct if I kill -9 the mysql slave server and restart it.

As Mark said, that value is not updated on the slave when (for example) DROP TABLE is executed, in that case the last printed value is printed again.

I have a few questoins:

* Baron, do you have any idea why there is so huge difference between the printed and the actual value in your setup? You did not run thousands of DROP TABLE, did you? :)

* Mark, what do you mean by "We put that functionality back in the Google patch."? Can you point which code was present, was removed and you did put back?

Thanks!

The Google patch for MySQL 4.0 and 5.0 is at http://code.google.com/p/google-mysql-tools

This does not yet include fixes to also update the replication offset during DDL (drop database, drop table, ...).

We use this for transactional replication on the slave. During crash recovery it updates relay-log.info to match the offset stored in the innodb tx log. This does a bit more than the original use, so many more files have changed.

Vasil, I don't think the DROP TABLE etc should be the problem :-)  I don't know why the difference was so large.  I assumed the log byte offset was just from a variable that was never initialized, and was never correct.  (It looked to me like the code that set it was commented out completely).

Vasil,

In my experience it is sometimes correct though in most cases it does not.
This actually makes it only worse :)

Interesting: I just initialized a new server, this time from one of the masters, which used to be a slave itself in the distant past.  This time, it says:

InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 214, file name usa-bin.000698
InnoDB: Last MySQL binlog file position 0 1060018864, file name ./mysql-bin.000060

But usa hasn't been this server's master since March of last year :-)

Hmm... how can we preserve this functionality in 5.1, where all communication between MySQL and the engine must go through the plugin interface?

Vasil, is the DROP TABLE bug in 5.0 easy to fix?

This is happening to me in 5.1.24-community, and it's a very serious issue.  In my case, the message is similar to Baron's -- the file is correct but the file position is an absurdly low number -- 541, which is in the middle the 2nd item in the log.  

We use Innodb Hot Backup to take backups of a slave, then use that backup to recover the slave, either the one backed up, or a new one on another server.  I know of no other way to set the restored slave on the right position, other than using this line in the error log on startup.

InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 541, file name mysql-bin.000521

Is there a workaround? E.g. some output text by innodb hot backup??  If not, then I suggest that this is a S1 critical bug, since it prevents us from database recovery!  Thanks!

A possibly relevant section of code, in sql/ha_innodb.cc in the 5.0.45 source:

1718 /* TODO: put the
1719 MySQL-4.1 functionality back to 5.0. This is needed to get InnoDB Hot Backup
1720 to work. */
1721 
1722 /*********************************************************************
1723 This is called when MySQL writes the binlog entry for the current
1724 transaction. Writes to the InnoDB tablespace info which tells where the
1725 MySQL binlog entry for the current transaction ended. Also commits the
1726 transaction inside InnoDB but does NOT flush InnoDB log files to disk.
1727 To flush you have to call innobase_commit_complete(). We have separated
1728 flushing to eliminate the bottleneck of LOCK_log in log.cc which disabled
1729 InnoDB's group commit capability. */
1730 
1731 int
1732 innobase_report_binlog_offset_and_commit(
1733 /*=====================================*/
1734                                 /* out: 0 */
1735         THD*    thd,            /* in: user thread */
1736         void*   trx_handle,     /* in: InnoDB trx handle */
1737         char*   log_file_name,  /* in: latest binlog file name */
1738         my_off_t end_offset)    /* in: the offset in the binlog file
1739                                    up to which we wrote */
1740 {
1741    trx_t*   trx;
1742 
1743    trx = (trx_t*)trx_handle;
1744 
1745    ut_a(trx != NULL);
1746 
1747    trx->mysql_log_file_name = log_file_name;
1748    trx->mysql_log_offset = (ib_longlong)end_offset;
1749 
1750    trx->flush_log_later = TRUE;
1751 
1752    innobase_commit(thd, TRUE);
1753 
1754    trx->flush_log_later = FALSE;
1755 
1756    return(0);
1757 }

Gordon, Baron,

I think InnoDB-5.0 and 5.1 print the binlog position right (except for some error if a DROP TABLE happens at the right place). At least, in Vasil's recent test they worked ok.

But they do NOT print right the master binlog position of a replication slave. That functionality worked in 4.x.

Regards,

Heikki

Yes I believe Heikki is correct, and it's good to be clear about this.  There are 3 relevant lines in the error log when you first start the db on recovery:

InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 541, file name mysql-bin.000521
InnoDB: Last MySQL binlog file position 0 975, file name ./mysql-bin.000008

When recovering a master backup, there is no problem because line #3 is indeed correct -- it refers to its own binlog position.  The problem is only on replication slaves, since the position, and maybe file, in line #2 are wrong.

It would be great if Innodb Hot Backup could deduce this information and output it! (hint, hint, hint :-), then I wouldn't have to rely on this.

Please correct me if I'm wrong. This means our strategy of using replication slaves for backups is compromised, as recovery is impossible.  We must instead use only hot backups made from the master.

Just to clarify, as Peter noted, knowing the master binlog position that the slave's data corresponds to is very, very useful.  It's a real shame this doesn't work anymore, because it eliminates all sorts of neat things you used to be able to do.

Can someone give a short summary how to repeat this?

If I run a master/slave, have some activity on the master and kill -9 the slave, then stop the activity on the master, then start the slave I get this:

--- cut ---
$ ./libexec/mysqld --socket /tmp/bug34058/mysql-5.0-s/sock --port 3307 --server-id 2 --relay-log=hag-relay-bin
InnoDB: Log scan progressed past the checkpoint lsn 0 54773
080717 13:45:26  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 0 55065
080717 13:45:26  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
InnoDB: Apply batch completed
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 18994, file name m-bin.000002
080717 13:45:26  InnoDB: Started; log sequence number 0 55065
080717 13:45:26 [Note] ./libexec/mysqld: ready for connections.
Version: '5.0.68'  socket: '/tmp/bug34058/mysql-5.0-s/sock'  port: 3307  Source distribution
080717 13:45:26 [Note] Slave SQL thread initialized, starting replication in log 'm-bin.000002' at position 19021, relay log './hag-relay-bin.000017' position: 3489
080717 13:45:26 [Note] Slave I/O thread: connected to master 'root@localhost:3306',  replication started in log 'm-bin.000002' at position 19021
--- cut ---

If "InnoDB: position 0 18994, file name m-bin.000002" is wrong, then why don't you ignore it and use "starting replication in log 'm-bin.000002' at position 19021", i.e. use 19021 instead of 18994?

The value printed by InnoDB during recovery (18994) is read from the InnoDB files (sys_header + TRX_SYS_MYSQL_MASTER_LOG_INFO).

The value printed by MySQL code (19021) is read from var/master.info.

If the slave crashes these can end up to be different.

The value, printed by InnoDB can also be different from the value printed by MySQL if you are changing MyISAM tables.

The value printed by InnoDB is updated on the disk only at commit (in trx_commit_off_kernel()).

I am unable to reproduce a large difference in the numbers printed by InnoDB and MySQL like the one reported by Baron: 115 vs. 643703376. I can get small differences when the slave crashes. The difference being that InnoDB-printed log position corresponds to the last COMMIT statement seen by InnoDB while MySQL-printed log position may correspond to the next statement or a few statements after it.

Baron, I assume you are not running one HUGE transaction that creates 600k binlog entries? You also explicitly mentioned that you are using only InnoDB tables, so the reason is not that you have a lot of activity on e.g. MyISAM tables.

I used 5.0.68, can someone please give me some hint on how to reproduce this problem?

A note to my previous comment:

Baron did not say that 643703376 is printed by MySQL, he said:

"The REAL position, as I found by examining the binary log and master.info after recovery, and looking at the log events, is portland-bin.001717, position 643703376."

Anyway, if 643703376 is in master.info then MySQL would printing during startup.

From my perspective, the problem occurs when doing a restore of an innodb hot backup, where there is no master.info file.  In prior versions, I could rely on the startup text "In a MySQL replication slave, the last master binlog file position 0 999999..." to correctly position the recovered slave via change master.

Gordon, I can do similar thing with the latest MySQL 5.0.68:

* I setup a replication and made some activity on the master
* "kill -9"d the slave (instead of taking backup with hotbackup)
* removed var/master.info on the slave server
* started the slave and read "InnoDB: position 0 20392, file name m-bin.000001" during recovery
* issued CHANGE MASTER ... MASTER_LOG_POS=20392; START SLAVE;

it went without errors or any subsequent problems.

Any hints on how to repeat the problem are welcome.

Could this be some Hot Backup-only related issue?

Setting to "Can't repeat" as I think it is more appropriate than "Need feedback".

One possible explanation: replication is not running at all.
Then InnoDB prints some old position.

OR

The position was indeed 115 when the backup was taken. Then it advanced to 643703376. When Baron tries to restore the backup, then InnoDB sees 115?

I'm running on 5.1.24.  I'll leave it to Baron to respond from his perspective.  I only chimed in because it seemed like the same issue.

Hmm, I never tried 5.1.x as the reported issue is against 5.0.40.

Things may be different in 5.1, I will probably try that one too to see how it behaves...

Thanks!

Vasil,

Had you tried to repeat with recent 5.1.x? 

We also have some reports claiming that 5.0.44 is also affected...

cmds to reproduce

Attachment: 34058_reproduce.txt (text/plain), 3.23 KiB.

This is reproducable in 5.0.44 and 5.0.60sp1, see the previous attachment for full process I used in my vm. Basically;

## Have Master setup,
## Run infinite (loopy insert of data perl script) for a while
## Take LVM Snapshot, mount and copy datafiles to create a slave.
## Get the slave started, and run infinite script some more on the master.
## Then kill -9 mysql on slave
## Then restart mysql on slave with skip-slave-start and check the output from mysql error
log.

Checking SHOW MASTER STATUS and SHOW SLAVE STATUS just before kill-9 of slave, then cat the slave.err;

[root@localhost mysql]# mysql -u root -e "SHOW MASTER STATUS;";
+-------------------+----------+--------------+------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000001 |  4160333 |              |                  | 
+-------------------+----------+--------------+------------------+
[root@localhost mysql]# mysql -u root -S /tmp/mysql4406.sock -e 'SHOW SLAVE STATUS\G';
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 127.0.0.1
                Master_User: root
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: master-bin.000001
        Read_Master_Log_Pos: 4160333
             Relay_Log_File: localhost-relay-bin.000002
              Relay_Log_Pos: 2079085
      Relay_Master_Log_File: master-bin.000001
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB: 
        Replicate_Ignore_DB: 
         Replicate_Do_Table: 
     Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
                 Last_Errno: 0
                 Last_Error: 
               Skip_Counter: 0
        Exec_Master_Log_Pos: 4160333
            Relay_Log_Space: 2079085
            Until_Condition: None
             Until_Log_File: 
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File: 
         Master_SSL_CA_Path: 
            Master_SSL_Cert: 
          Master_SSL_Cipher: 
             Master_SSL_Key: 
      Seconds_Behind_Master: 0
[root@localhost mysql]# kill -9 `cat /usr/local/mysqlslave/localhost.localdomain.pid`
[root@localhost mysql]# bin/mysqld_safe: line 388:  4345 Killed                  nohup /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysqlslave/ --user=mysql --pid-file=/usr/local/mysqlslave//localhost.localdomain.pid --skip-external-locking --port=4406 --socket=/tmp/mysql4406.sock --server-id\=2 --skip-slave-start --default-storage-engine\=innodb >> /usr/local/mysqlslave/slave.err 2>&1

Number of processes running now: 0
081014 23:07:55  mysqld restarted

[root@localhost mysql]# kill `pidof perl`;
[2]-  Terminated              perl /root/34058issue/infinite_insert.pl
[root@localhost mysql]# 
[root@localhost mysql]# 
[root@localhost mysql]# 
[root@localhost mysql]# cat /usr/local/mysqlslave/slave.err
081014 23:05:38  mysqld started
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
081014 23:05:38  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Last MySQL binlog file position 0 2081484, file name ./master-bin.000001
081014 23:05:38  InnoDB: Started; log sequence number 0 6599325
081014 23:05:38 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.0.60sp1-enterprise-gpl'  socket: '/tmp/mysql4406.sock'  port: 4406  MySQL Enterprise Server (GPL)
081014 23:06:23 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=localhost-relay-bin' to avoid this problem.
081014 23:06:43 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:3306',  replication started in log 'master-bin.000001' at position 2081484
081014 23:06:43 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 2081484, relay log './localhost-relay-bin.000001' position: 4

Number of processes running now: 0
081014 23:07:55  mysqld restarted
InnoDB: Log scan progressed past the checkpoint lsn 0 11935295
081014 23:07:55  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 0 12840023
081014 23:07:55  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
InnoDB: Apply batch completed
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 4194385, file name master-bin.000001
InnoDB: Last MySQL binlog file position 0 2081484, file name ./master-bin.000001
081014 23:07:56  InnoDB: Started; log sequence number 0 12840023
081014 23:07:56 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=localhost-relay-bin' to avoid this problem.
081014 23:07:56 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.0.60sp1-enterprise-gpl'  socket: '/tmp/mysql4406.sock'  port: 4406  MySQL Enterprise Server (GPL)

If the above is not clear or quick, basically; just before the slave is kill -9;

Master reports:
| master-bin.000001 |  4160333 |

Slave reports:
      Master_Log_File: master-bin.000001
        Read_Master_Log_Pos: 4160333

kill -9 then innodb reports in err log;
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 4194385, file name master-bin.000001
InnoDB: Last MySQL binlog file position 0 2081484, file name ./master-bin.000001

Adam,

I tried to reproduce what you observed.

First there is one problem with your scenario, which is probably not relevant but anyway:

1. Run infinite perl script against server1
2. Take snapshot of server1's filesystem
(the perl script is still running on server1 after the snapshot has been taken)
3. Create server2 from the snapshot files
4. Execute "CHANGE MASTER ... MASTER_LOG_POS=NNN;" on server2, where NNN is obtained by executing "SHOW MASTER STATUS" on server1 _after_ the snapshot has been taken (2.).

This means that the position on server1 (and thus NNN) is greater than what server2 has in its files from the snapshot. You should obtain master position (SHOW MASTER STATUS) _before_ taking the snapshot. This is described in http://dev.mysql.com/doc/refman/5.0/en/replication-howto-existingdata.html in step 2: "If your server is already correctly configured, obtain the master status (see Section 15.1.1.4, Obtaining the Master Replication Information) AND THEN use mysqldump to take a snapshot".

Now to the actual problem:

* infinite_insert.pl is running against the master
* get master and slave status
* kill -9 the slave

the result is this:

from master:

+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000003 | 2558594 | | |
+-------------------+----------+--------------+------------------+

from slave:

Master_Log_File: master-bin.000003
Read_Master_Log_Pos: 2558594

when slave is restarted:

InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 2558567, file name master-bin.000003

The position InnoDB prints is the position of the last committed transaction in InnoDB, so it is normal that it is smaller than the position reported by SHOW SLAVE STATUS just before the crash (kill -9) - InnoDB did not commit some transactions.

When I look at your output I see that the position reported by InnoDB (4194385) is greater than the position reported by SHOW SLAVE STATUS before the crash (4160333). This can happen if some activity occurred after SHOW SLAVE STATUS and before kill -9. Provided that infinite_insert.pl is running at that moment, this is quite likely to happen.

This is not a bug.

Notice that the difference, reported originally by Baron, is huge: 115 vs 643703376.

Adam,

I agree with Vasil that the printout looks sensible after the kill -9.

What is the precise problem in it?

Regards,

Heikki

Hi all,
  I did a quick investigation on this bug.

  * I performed some testing (with version 5.0.40 and 5.0.68):

    1. Start master and create table: "CREATE TABLE t1 (a INT) engine=innodb"
    2. Populate the table with 100 transactions (each with one insert).
    3. Started the slave;
    4. Connected the slave and let it apply master's binlog.
    5. kill -9 the slave;
    6. Restarted the slave;

    ** Here is what happened in version 5.0.40:
      
      - Some events written to the binlog showed small values for the end_log_pos (the transaction inserts);

# at 18614
#081113 23:12:15 server id 1  end_log_pos 18682 	Query	thread_id=3	exec_time=0	error_code=0
SET TIMESTAMP=1226617935/*!*/;
BEGIN
/*!*/;
# at 18682
#081113 23:12:15 server id 1  end_log_pos 89 	Query	thread_id=3	exec_time=0	error_code=0
SET TIMESTAMP=1226617935/*!*/;
INSERT INTO t1 VALUES (99)
/*!*/;
# at 18771
#081113 23:12:15 server id 1  end_log_pos 18798 	Xid = 208
COMMIT/*!*/;

      - Insert events were inside a BEGIN and COMMIT, and seemed that the end_log_pos was set as relative to the BEGIN event offset (always showed *89* as the end_log_pos);
      - The transaction BEGIN/COMMIT showed an absolute offset.
      - In the slave, the relay log presented the same offsets as the master binlog;
      - Once I killed and restarted the slave innodb would print (notice the *89* there):

InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 89, file name gaia-bin.000001

    ** Here is what happened in version 5.0.68:

      - All events written to master binary log had absolute position;

# at 18614
#081113 23:57:55 server id 1  end_log_pos 18682 	Query	thread_id=3	exec_time=0	error_code=0
SET TIMESTAMP=1226620675/*!*/;
BEGIN
/*!*/;
# at 18682
#081113 23:57:55 server id 1  end_log_pos 18771 	Query	thread_id=3	exec_time=0	error_code=0
SET TIMESTAMP=1226620675/*!*/;
INSERT INTO t1 VALUES (99)
/*!*/;
# at 18771
#081113 23:57:55 server id 1  end_log_pos 18798 	Xid = 208
COMMIT/*!*/;

      - The slave relay log presented the same type of offsets
      - Once I killed and restarted the slave the innodb would print (note that there is *no 89*):

InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 18771, file name gaia-bin.000001

Just so you know, I went a little into the source code and found that:

  * The value (master_log_pos) that is injected into innodb comes from:

    rli->future_group_master_log_pos= log_pos;

  * This value is set in three different places in the log_event.cc:

     - Query_log_event::exec_event
     - Load_log_event::exec_event
     - Execute_load_log_event::exec_event

  * It is handed to innodb in the *ha_innodb.cc* in function *innobase_commit_low*:

1531 
1532         if (thd && thd->slave_thread) {
1533                 /* Update the replication position info inside InnoDB */
1534 
1535                 trx->mysql_master_log_file_name
1536                                         = active_mi->rli.group_master_log_name;
1537                 trx->mysql_master_log_pos = ((ib_longlong)
1538                                 active_mi->rli.future_group_master_log_pos);
1539         }

  * Innodb seems to pick this at trx0trx.c in function trx_commit_off_kernel:

 805 
 806                 if (trx->mysql_master_log_file_name[0] != '\0') {
 807                         /* This database server is a MySQL replication slave */
 808                         trx_sys_update_mysql_binlog_offset(
 809                                 trx->mysql_master_log_file_name,
 810                                 trx->mysql_master_log_pos,
 811                                 TRX_SYS_MYSQL_MASTER_LOG_INFO, &mtr);
 812                 }

Finally, a few thoughts:

 1. This might explain why Baron was getting the strange (small) offsets!
 2. This could also explain why Vasil was not able to reproduce it (I think Vasil was using 5.0.68).
 3. The strange (small) offsets issue seems fixed in the latest version of 5.0 .

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

This bug is a duplicate of BUG#22540, which was fixed in 5.0.46.

Are you sure?  I guess the code is pretty intricate here, but it doesn't really look like a duplicate to me.  Can you explain how the code fixed in that other bug could cause the *massively* out of whack binlog positions I was seeing?

I suppose it could be true, since the servers I originally noticed this bug on did a lot of really huge LOAD DATA INFILE.

Hi Baron!

Yes, as far as I can tell, it is a duplicate.

There are two ways that an event can make it to the binary log: either directly or through the transaction cache. For non-transactional tables, the practice has been that they go directly to the binary log or through the transaction cache (but see BUG#40116).

Prior to the patch for BUG#22540, the binlog positions for an event were given relative the start of the IO_CACHE it was written to, meaning that when it was written to the transaction cache, the binlog position would be relative the start of the transaction (that is, usually quite small) and not relative the start of the file. The BUG#22540 fixes this by patching the binlog position of the events when writing the transaction cache to the binary log.

However, this alone does not explain why the slave does not roll back the *transaction* when it is being restored from the snapshot, but note that AUTOCOMMIT=0 was used when writing the transaction on the master. This means that the transaction was written to the binary log without a BEGIN, just a COMMIT. Since the slave thread runs with AUTOCOMMIT=1 in 5.0 (a bug that is fixed in 5.1, where we replicate the AUTOCOMMIT flag), this means that each statement would be executed as a separate transaction, so when the slave "crashed" (this is how it appears when the snapshot is restored), it would roll back to a position that was not a start of a transaction. This is fixed by the patch for BUG#26395.

By fixing BUG#22540, the positions will be correct, and the new slave will be able to recover from the correct position, so in that sense, it fixes the problem described in this bug report.

Note, however, that it will not start from the beginning of the transaction when recovering. Assuming that there are no crashes and that the slave is used in a pure scale-out scenario, the end result will be the same (the binary log is a serial history of the statements executed on the master, so it is already serialized correctly). However, intermediate states are still visible to other transactions and it is possible that a reader sees an inconsistent state. This is what BUG#26395 fixes.