Bug #27726 Exec_Master_Log_Pos is showing strange things
Submitted: 10 Apr 2007 4:06 Modified: 9 Sep 2008 16:52
Reporter: Thomas Boehme Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.0.36 OS:Linux (Debian Sarge AMD64)
Assigned to: Assigned Account CPU Architecture:Any

[10 Apr 2007 4:06] Thomas Boehme
Description:
Hello,

having a master master replication and a heartbeat setup with virtual IP address, I import very many records into tables on the HA-master of those systems using LOAD DATA LOCAL INFILE the whole day. When running my check script for watching replication states and drifts I noticed that since I upgraded from 4.1.10a to 5.0.36 (compiled from source) there are much more drifts than before. This also happens with version 5.0.22 (binary version download from mysql.com).

Looking deeper for the reason I noticed that the value of Exec_Master_Log_Pos is showing unexpected values sometimes. On the HA-slave I run SHOW SLAVE STATUS and on the HA-master SHOW MASTER STATUS. The results are compared to find problems and replication drifts. Here are the results with the important variables of three SHOW queries:

#### 1 ####
Master:
'File'     => 'mysql-bin.000011'
'Position' => '378228803'
Slave:
'Exec_Master_Log_Pos' => '6458522'
'Read_Master_Log_Pos' => '378228803'
'Master_Log_File' => 'mysql-bin.000011'
'Seconds_Behind_Master' => '4'

#### 2 ####
Master:
'File'     => 'mysql-bin.000011'
'Position' => '384626508'
Slave:
'Exec_Master_Log_Pos' => '384626508'
'Read_Master_Log_Pos' => '384626508'
'Master_Log_File' => 'mysql-bin.000011'
'Seconds_Behind_Master' => '0'

#### 3 ####
Master:
'File'     => 'mysql-bin.000011'
'Position' => '391015161'
Slave:
'Exec_Master_Log_Pos' => '6388035'
'Read_Master_Log_Pos' => '391015161'
'Master_Log_File' => 'mysql-bin.000011'
'Seconds_Behind_Master' => '10'

I ran these queries within one minute.

You see that the master log file does not change but the slave reports that is has executed first 6458522, then 384626508 and then 6388035 in the masters binary logfile. Also note that the second result shows the correct value for Exec_Master_Log_Pos of 384626508 and a value of 0 for Seconds_Behind_Master.

Hopefully that does not mean that the replacation indeed is done in random order, which could lead to inconsitency between master and slave.

My script calculates the difference beween MASTER.Postion and SLAVE.Exec_Master_Log_Pos to show if there are replication problems. Please tell me if this is the wrong way of doing it and if the behavior I described is correct.

How to repeat:
This happens all the time here. The more data you upload to one system the more the other one need to replicate, showing this behavior really often. I uploaded a 300 MB CSV file to the table, having the following structure:

CREATE TABLE `t_data_mos` (
  `id_stat` int(11) NOT NULL default '0',
  `dtg` datetime NOT NULL default '0000-00-00 00:00:00',
  `id_parameter` int(11) NOT NULL default '0',
  `id_source` int(11) NOT NULL default '0',
  `value` double NOT NULL default '0',
  `id_unit` int(11) NOT NULL default '0',
  `id_precision` int(11) NOT NULL default '0',
  `insertdtg` datetime NOT NULL default '0000-00-00 00:00:00',
  PRIMARY KEY  (`id_stat`,`dtg`,`id_parameter`,`id_source`),
  KEY `dtg` (`dtg`),
  KEY `id_parameter` (`id_parameter`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

When upload was finished I looked at the slave output of SHOW SLAVE STATUS.

Suggested fix:
As I understand Exec_Master_Log_Pos, its value shouldn't decrease unless a new binary logfile has been created.
[10 Apr 2007 5:35] Valeriy Kravchuk
Thank you for a problem report. Is it possible that server crashed (or restarted for any reasons) during that period when you get your results? Then it can be something similar to http://bugs.mysql.com/bug.php?id=26540. Please, check.
[11 Apr 2007 9:05] Thomas Boehme
I checked it. The slave does not crash. Here comes some entries from mysql error log. First slave then master server.

###SLAVE###
070402 17:50:18  mysqld started
070402 17:50:23  InnoDB: Started; log sequence number 3 1006470230
070402 17:50:30 [Note] /mysql/server/mysql/bin/mysqld: ready for connections.
Version: '5.0.36-standard-log'  socket: '/var/tmp/mysql/mysql.sock'  port: 3306  MySQL Community Edition - MC-Wetter (GPL)
070402 17:50:30 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000006' at position 12139540, relay log './mysql-relay-bin.000018'
 position: 12139677
070402 17:50:30 [Note] Slave I/O thread: connected to master 'repl@172.18.2.31:3306',  replication started in log 'mysql-bin.000006' at position 12139540

###MASTER###
070402 15:39:39  mysqld started
070402 15:39:43  InnoDB: Started; log sequence number 0 4060760534
070402 15:39:50 [Note] /mysql/server/mysql/bin/mysqld: ready for connections.
Version: '5.0.36-standard-log'  socket: '/var/tmp/mysql/mysql.sock'  port: 3306  MySQL Community Edition - MC-Wetter (GPL)
070402 15:39:50 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000001' at position 1211, relay log './mysql-relay-bin.000005' position: 977
070402 15:39:50 [Note] Slave I/O thread: connected to master 'repl@172.18.2.30:3306',  replication started in log 'mysql-bin.000001' at position 1211
070402 15:42:22 [Note] Slave: received end packet from server, apparent master shutdown:
070402 15:42:22 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000001' position 2324
070402 15:42:22 [ERROR] Slave I/O thread: error reconnecting to master 'repl@172.18.2.30:3306': Error: 'Lost connection to MySQL server at 'reading initial communication packet', system error: 111'  errno: 2013  retry-time: 60  retries: 86400
070402 15:51:54 [Note] Slave: connected to master 'repl@172.18.2.30:3306',replication resumed in log 'mysql-bin.000001' at position 2324
070402 17:49:00 [Note] Slave: received end packet from server, apparent master shutdown:
070402 17:49:00 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000002' position 43134
070402 17:49:00 [ERROR] Slave I/O thread: error reconnecting to master 'repl@172.18.2.30:3306': Error: 'Lost connection to MySQL server at 'reading initial communication packet', system error: 111'  errno: 2013  retry-time: 60  retries: 86400
070402 17:51:00 [Note] Slave: connected to master 'repl@172.18.2.30:3306',replication resumed in log 'mysql-bin.000002' at position 43134

The problems I decribed in the bug report occured tonight again several times. But there is nothing in the logs.
[12 Apr 2007 7:51] Sveta Smirnova
Please also provide your configuration files both for master and slave.

We have Bug #12691 which can be related too.
[12 Apr 2007 8:54] Thomas Boehme
For your information. My setup is a master-master-replication. Heartbeat provides a virtual IP on one systemen. All connections use this IP address and the HA-slave will never receive or serve data until there is a failover.

Here is my configuration. After that comes a diff showing the configuration differences to the second system.

[mysqld_safe]
open-files-limit = 8192

[mysqld]
basedir         = /mysql/server/mysql
datadir         = /mysql/data
old-passwords
port            = 3306
socket          = /var/tmp/mysql/mysql.sock
skip-name-resolve
interactive_timeout = 3600
wait_timeout = 3600
back_log = 50
max_connections = 2000
max_connect_errors = 100000
table_cache = 2048
max_allowed_packet = 16M
binlog_cache_size = 16M
max_heap_table_size = 64M
sort_buffer_size = 16M
join_buffer_size = 16M
thread_cache = 8
thread_concurrency = 8
query_cache_size = 256M
query_cache_limit = 2M
ft_min_word_len = 4
memlock
default_table_type = INNODB
thread_stack = 192K
transaction_isolation = READ-COMMITTED
tmp_table_size = 64M
log_bin=mysql-bin
relay_log=mysql-relay-bin
log_warnings
log_slow_queries
long_query_time = 2
log_long_format
tmpdir = /var/tmp/mysql
server-id = 1
master-host = 172.18.2.31
master-user = repl
master-password = ###PASSWORD###
auto_increment_increment = 2
auto_increment_offset = 1
key_buffer_size = 64M
read_buffer_size = 4M
read_rnd_buffer_size = 16M
bulk_insert_buffer_size = 0
myisam_sort_buffer_size = 8M
myisam_max_sort_file_size = 10G
myisam_max_extra_sort_file_size = 10G
myisam_repair_threads = 1
myisam_recover
skip-bdb
innodb_additional_mem_pool_size = 16M
innodb_buffer_pool_size = 12G
innodb_data_file_path = ibdata1:25000M;ibdata2:25000M;ibdata3:25000M;ibdata4:25000M;ibdata5:25000M;ibdata6:25000M:autoextend
innodb_data_home_dir = /mysql/innodb
innodb_file_io_threads = 4
innodb_thread_concurrency = 16
innodb_flush_log_at_trx_commit = 0
innodb_fast_shutdown
innodb_log_buffer_size = 16M
innodb_log_file_size = 1024M
innodb_log_files_in_group = 2
innodb_log_group_home_dir=/mysql/innodb_logs
innodb_max_dirty_pages_pct = 90
innodb_lock_wait_timeout = 120

Here the diff:

oxford: diff /tmp/oxford.cnf /tmp/cambridge.cnf
217c217
< server-id = 1
---
> server-id = 2
257c257
< master-host = 172.18.2.31
---
> master-host = 172.18.2.30
288c288
< auto_increment_offset = 1
---
> auto_increment_offset = 2

I also checked #12691. I'm not sure whether it matches my report. I don't use something like "replicate-wild-ignore-table=mysql.%". The slave also doesn't stop replication due to some errors. There aren't any. Both server run fine.

Maybe it is possible that the jumpbacks in Exec_Master_Log_Pos are related to transaction BEGIN but we modify data only in auto_commit mode or using LOAD DATA INFILE.

Also the jumpbacks seems to take a real big step. The binlog file in my original bug report was about 384MB but the Exec_Master_Log_Pos jumped between byte position 6.458.522, then 384.626.508 and then back to the near beginning of file at byte position 6.388.035, which is even before the first one.
[9 Aug 2008 16:52] Sveta Smirnova
Thank you for the report.

You can experience same issue as bug #12691 or bug #31190  which were fixed in version 5.0.54. Please try current version 5.0.67 and inform us if bug still exists.
[9 Sep 2008 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".