MySQL Bugs: #65814: Replication silently stoped

Bug #65814	Replication silently stoped
Submitted:	5 Jul 2012 5:30	Modified:	19 Dec 2012 20:19
Reporter:	Ilya Antipov	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.5.X	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
Master:
Linux version 3.0.0-12-server (buildd@crested) (gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) ) #20-Ubuntu SMP Fri Oct 7 16:36:30 UTC 2011
mysqld  Ver 5.5.21-log for linux2.6 on x86_64 (MySQL Community Server (GPL))

/etc/mysql/my.cnf
[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock

# Here is entries for some specific programs
# The following values assume you have at least 32M ram

# This was formally known as [safe_mysqld]. Both versions are currently parsed.
[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0

[mysqld]
#
#performance_schema=ON
user            = mysql
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr/local/mysql
datadir         = /usr/local/mysql/data
tmpdir          = /tmp
event_scheduler = 1
key_buffer_size         = 1400M
max_allowed_packet      = 16M
thread_stack            = 192K
thread_cache_size       = 64
tmp_table_size          = 128M
max_heap_table_size     = 128M

# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
myisam-recover         = BACKUP
max_connections        = 230
table_cache            = 1500
table_definition_cache = 1500
thread_concurrency     = 16

#
# * Query Cache Configuration
#
query_cache_limit       = 1M
query_cache_size        = 64M

#low-priority-updates
#
# * Logging and Replication
#
#log                    = /tmp/mysql
log_error                = /var/log/mysql/error.log
#slow_query_log
#slow_query_log_file    = /var/log/mysql/mysql-slow.log
#long_query_time = 1
#log-queries-not-using-indexes
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
#       other settings you may need to change.

server-id               = 10
log_bin                 = mysql-bin
log_bin_index           = mysql-bin.index
relay_log               = mysql-relay-bin
relay_log_index         = mysql-relay-bin.index
binlog-format           = MIXED
replicate_wild_ignore_table=openx.%
expire_logs_days        = 10
max_binlog_size         = 1000M

#
# * InnoDB
#
innodb_lock_wait_timeout =10
innodb_buffer_pool_size = 3072M
innodb_flush_log_at_trx_commit=0
innodb_file_per_table
innodb_log_buffer_size = 32M
innodb_log_file_size = 128M
innodb_flush_method=O_DIRECT

Slave:
Linux version 2.6.32-27-generic (buildd@crested) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #49-Ubuntu SMP Thu Dec 2 00:51:09 UTC 2010
mysqld  Ver 5.5.25-log for linux2.6 on x86_64 (MySQL Community Server (GPL))
/etc/mysql/my.cnf
[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock

# Here is entries for some specific programs
# The following values assume you have at least 32M ram

# This was formally known as [safe_mysqld]. Both versions are currently parsed.
[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0

[mysqld]
#
#performance_schema=ON
user            = mysql
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr/local/mysql
datadir         = /usr/local/mysql/data
tmpdir          = /tmp
event_scheduler = 1
#
# * Fine Tuning
#
key_buffer_size         = 512M
max_allowed_packet      = 16M
thread_stack            = 192K
thread_cache_size       = 64
tmp_table_size          = 128M
max_heap_table_size     = 128M

# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
myisam-recover         = BACKUP
max_connections        = 230
table_cache            = 1500
table_definition_cache = 1500
thread_concurrency     = 16
#
# * Query Cache Configuration
#
query_cache_limit       = 1M
query_cache_size        = 64M

#               REPLICATION CONFIG
#
server-id               = 2
log_bin                 = mysql-bin
log_bin_index           = mysql-bin.index
relay_log               = mysql-relay-bin
relay_log_index         = mysql-relay-bin.index

replicate_wild_ignore_table=openx.%
replicate_wild_ignore_table=%.phpbb_users_birthday

expire_logs_days        = 10
max_binlog_size         = 1000M
innodb_flush_log_at_trx_commit=1

#
#                       SLAVE
#
#log_slave_updates      = 1
#skip_slave_start
#read_only              = 1

#
# * InnoDB
#
innodb_lock_wait_timeout =10
innodb_buffer_pool_size = 1500M
innodb_file_per_table
innodb_log_buffer_size = 32M
innodb_log_file_size = 128M
#innodb_flush_method=O_DIRECT

After some a week of successfull replication it suddenly stoppped. 

The following messges appear in the master log:

120704 13:27:20 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000029, 558968285)
120704 13:29:54 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000029, 558968285)
120704 14:29:53 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000029, 558968285)
120704 15:30:53 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000029, 558968285)
120704 16:25:18 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000029, 558968538)
120704 17:25:18 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000029, 558968538)
120704 18:26:18 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000029, 558968538)
120704 19:22:15 [Note] Start binlog_dump to slave_server(2), pos(mysql-bin.000029, 558968538)

Master state is:
mysql> show master status\G
*************************** 1. row ***************************
            File: mysql-bin.000033
        Position: 881019068
    Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)

Slave log contsains only the following messages:
120705  9:24:39 [Note] Slave I/O thread: connected to master 'repl@XXXX',replication started in log 'mysql-bin.000029' at position 558968538

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: XXXXXX
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000029
          Read_Master_Log_Pos: 558968538
               Relay_Log_File: mysql-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: mysql-bin.000029
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table: openx.%,%.phpbb_users_birthday
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 558968538
              Relay_Log_Space: 107
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 10
1 row in set (0.00 sec)

After stop slave and start slave master has a proccess
| 31046946 | repl            | XXXXX:55408       | NULL  | Binlog Dump |   14 | Writing to net              | NULL 

But after some short time it disappears;

Ifconfig shows no errors, binlog is readable on master. I tried to restart slave with changing POS to next position in binlog, but that didn't help. I believe that server should at least say something why replication cannot proceed.

I can provide any information on reqest. I plan to restart master in 20 hours to  see if the problem can be fixed by master restart.

Thanks.

How to repeat:
I don't know

Hello Ilya,

Please don't use 5.5.25 on production slave.
Consider the three bugs:
1.  http://bugs.mysql.com/bug.php?id=65740 (slave only)
2.  http://bugs.mysql.com/bug.php?id=64624 (slave only)
3.  http://bugs.mysql.com/bug.php?id=65745 (master and slave)

It's best you disable query cache on the slave and downgrade to 5.5.24.
But please send us full error log from slave for inspection.

Thank you, Shane
But it's not my case. I allready fixed problems described in that report by changing the definer of events. The attached log also contains data about that. I described my crashes http://forums.mysql.com/read.php?132,558984,558984#msg-558984 here. The full copy of crashing mysql binary and data server still exists if needed.

Your server crashed 74 times due to this bug:
http://bugs.mysql.com/bug.php?id=65740

But now the problem seems to be this:
"Slave I/O thread killed while reading event"

Are there network problems, too small timeouts, too small max_allowed_packet, or somebody killing long-running connections on the master?

Yes. Server crashed because of that. But I found out the solution of problem mysqelf. I changed the owner of events on master, reinstalled mysql, using precomiled version (previousely I compiled it myself) and recreated my slave. And everything went ok.

"Slave I/O thread killed while reading event"

This message appeares when I stop slave using "slave stop". I read that in fact it first stops IO_THREAD and then stops SQL_THREAD. So the error appears. I think it's a different bug(?).

I have noticed one strange thing in master beahiour. Look at the Spin Rounds of InnoDB. Nothing was done to cause it.

Attachment: mysql_innodb_semaphores-week.zip (application/octet-stream, text), 206.52 KiB.

So, you have leap second issues on master.  Have you checked:
http://blog.mozilla.org/it/2012/06/30/mysql-and-the-leap-second-high-cpu-and-the-fix/

Thanks Shane!

This solved my problem! But isn't it a bug, that it was comletely with no errors?

Thank you for the feedback.

> "Slave I/O thread killed while reading event"

This is normal behavior, just a message describing what happened.

> But isn't it a bug, that it was comletely with no errors?

We don't know what caused original issue (except those bugs which Shane mentioned), so it is hard to say if this is issue with your compilation options or network. If network or too small timeouts - you should have option log_warnings=2 to get messages about communication problems.