Bug #45292 orphan binary log created when starting server twice
Submitted: 3 Jun 11:58 Modified: 26 Nov 12:50
Reporter: Joffrey MICHAIE
Status: In progress
Category:Server: Replication Severity:S2 (Serious)
Version:mysql-5.1-telco-7.0 OS:Any
Assigned to: Alfranio Correia Target Version:
Tags: binary log, replication, orphan, phantom, 5.1.30-ndb-6.3.20, 5.1.34-ndb-7.0
Triage: Triaged: D3 (Medium) / R2 (Low) / E3 (Medium)

[3 Jun 11:58] Joffrey MICHAIE
Description:
I have a MySQL server running with NDB and binary logging :

mysql> SHOW MASTER LOGS;
+-----------------+-----------+
| Log_name        | File_size |
+-----------------+-----------+
| sql1-bin.000007 |       529 | 
+-----------------+-----------+
1 row in set (0.00 sec)

mysql> SHOW MASTER STATUS
+-----------------+----------+--------------+------------------+
| File            | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-----------------+----------+--------------+------------------+
| sql1-bin.000007 |      529 |              |                  | 
+-----------------+----------+--------------+------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM `mysql`.`ndb_binlog_index` ORDER BY `gci`;
+----------+---------------------------------------+----------------+---------+---------+---------+-----------+----------------+------------+-------+
| Position | File                                  | epoch          | inserts | updates |
deletes | schemaops | orig_server_id | orig_epoch | gci   |
+----------+---------------------------------------+----------------+---------+---------+---------+-----------+----------------+------------+-------+
|      204 | /usr/local/mysql/data/sql1-bin.000007 | 55319178772496 |       1 |       0 |
      0 |         0 |              0 |          0 | 12880 | 
+----------+---------------------------------------+----------------+---------+---------+---------+-----------+----------------+------------+-------+
1 row in set (0.01 sec)

# ls -lt *bin*
-rw-rw---- 1 mysql mysql  38 2009-06-03 13:11 sql1-bin.index
-rw-rw---- 1 mysql mysql 529 2009-06-03 13:11 sql1-bin.000007

Then I try to start a new mysqld on the same ndb-nodeid and mysql port:

# su - mysql -c mysqld

I get the normal error messages :

Cluster Error :

090603 13:12:50 [ERROR] NDB: error (1) Configuration error: Error : Could not alloc node
id at mgm-srv1 port 1186: Id 50 already allocated by another node.
090603 13:12:51 [ERROR] Plugin 'ndbcluster' init function returned error.
090603 13:12:51 [ERROR] Plugin 'ndbcluster' registration as a STORAGE ENGINE failed.

and Local port in use Error : 

090603 13:12:51 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use
090603 13:12:51 [ERROR] Do you already have another mysqld server running on port: 3310
?
090603 13:12:51 [ERROR] Aborting
090603 13:12:51 [Note] mysqld: Shutdown complete

But, A new binary log file was created :

# ls -lt *bin*
-rw-rw---- 1 mysql mysql 126 2009-06-03 13:12 sql1-bin.000008
-rw-rw---- 1 mysql mysql  76 2009-06-03 13:12 sql1-bin.index
-rw-rw---- 1 mysql mysql 529 2009-06-03 13:11 sql1-bin.000007

Is on the index file (But not always, sometime have to do a purge master logs to xxx to
make it appear)

# cat sql1-bin.index
/usr/local/mysql/data/sql1-bin.000007
/usr/local/mysql/data/sql1-bin.000008

And in the master logs :

mysql> SHOW MASTER LOGS;
+-----------------+-----------+
| Log_name        | File_size |
+-----------------+-----------+
| sql1-bin.000007 |       529 | 
| sql1-bin.000008 |       126 | 
+-----------------+-----------+
2 rows in set (0.00 sec)

The writing on the running server is still on the actual log :

mysql> SHOW MASTER STATUS;
+-----------------+----------+--------------+------------------+
| File            | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-----------------+----------+--------------+------------------+
| sql1-bin.000007 |      529 |              |                  | 
+-----------------+----------+--------------+------------------+
1 row in set (0.00 sec)

mysql> INSERT INTO `test`.`t1` VALUES (2);
Query OK, 1 row affected (0.03 sec)

mysql> SHOW MASTER STATUS;
+-----------------+----------+--------------+------------------+
| File            | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-----------------+----------+--------------+------------------+
| sql1-bin.000007 |      854 |              |                  | 
+-----------------+----------+--------------+------------------+
1 row in set (0.00 sec)

I can now purge the binary logs until the last :
mysql> PURGE MASTER LOGS TO 'sql1-bin.000008';
Query OK, 0 rows affected (0.03 sec)

mysql> SHOW MASTER LOGS;
+-----------------+-----------+
| Log_name        | File_size |
+-----------------+-----------+
| sql1-bin.000008 |       126 | 
+-----------------+-----------+
1 row in set (0.00 sec)

But we still write somewhere else :

mysql> SHOW MASTER STATUS;
+-----------------+----------+--------------+------------------+
| File            | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-----------------+----------+--------------+------------------+
| sql1-bin.000007 |      854 |              |                  | 
+-----------------+----------+--------------+------------------+
1 row in set (0.00 sec)

mysql> INSERT INTO `test`.`t1` VALUES (3);
Query OK, 1 row affected (0.03 sec)

mysql> SHOW MASTER STATUS;
+-----------------+----------+--------------+------------------+
| File            | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-----------------+----------+--------------+------------------+
| sql1-bin.000007 |     1179 |              |                  | 
+-----------------+----------+--------------+------------------+
1 row in set (0.00 sec)

Also, the ndb_binlog_index table is not anymore updated :

mysql> SELECT * FROM `mysql`.`ndb_binlog_index` ORDER BY `gci`;
+----------+---------------------------------------+----------------+---------+---------+---------+-----------+----------------+------------+-------+
| Position | File                                  | epoch          | inserts | updates |
deletes | schemaops | orig_server_id | orig_epoch | gci   |
+----------+---------------------------------------+----------------+---------+---------+---------+-----------+----------------+------------+-------+
|      854 | /usr/local/mysql/data/sql1-bin.000007 | 56483114909709 |       1 |       0 |
      0 |         0 |              0 |          0 | 13151 | 
+----------+---------------------------------------+----------------+---------+---------+---------+-----------+----------------+------------+-------+
1 row in set (0.00 sec)

The file does not exist anymore :

mysql> SHOW BINLOG EVENTS IN 'sql1-bin.000007';
ERROR 1220 (HY000): Error when executing command SHOW BINLOG EVENTS: Could not find
target log

# ls -lt *bin*
-rw-rw---- 1 mysql mysql  38 2009-06-03 13:20 sql1-bin.index
-rw-rw---- 1 mysql mysql 126 2009-06-03 13:12 sql1-bin.000008

After MySQL shutdown :
# /etc/init.d/mysqld stop
* MySQL manager or server PID file could not be found!
(Another bug report?)

# mysqladmin shutdown

# ls -lt *bin*
-rw-rw---- 1 mysql mysql  38 2009-06-03 13:20 sql1-bin.index
-rw-rw---- 1 mysql mysql 126 2009-06-03 13:12 sql1-bin.000008

Since there are a lot of automatic restart scripts to run mysql, this may happen on many
environments, and is not very good, since this breaks quite a lot of things;

How to repeat:
Run MySQL installed with default config + :

log-slave-updates=1
server-id               = 50
log_bin                 = /usr/local/mysql/data/sql1-bin
sync_binlog             = 1
expire_logs_days        = 10
max_binlog_size         = 100M

[MYSQL_CLUSTER]
ndbcluster
ndb-nodeid=50
ndb_connectstring=mgm-srv1

----
Create ndb table :
mysql > CREATE DATABASE IF NOT EXISTS `test`;
mysql > CREATE TABLE `test`.`t1` (`a` int PRIMARY KEY) engine=ndb;
Insert Data :
mysql > INSERT INTO `test`.`t1` VALUES (1);
Check master status, master logs, and mysql.ndb_binlog_index table.
----
Start MySQL server without stopping :
- sudo -u mysql mysqld
or
- su - mysql -c mysqld
----
Check binary logs files, index file, master status, master logs
----
Purge binary logs to last one
mysql> PURGE MASTER LOGS to 'sql1-bin.000008';
----
Check binary logs files, index file, master status, master logs
----
Insert data
mysql> INSERT INTO `test`.`t1` VALUES (2);
----
Check binary logs files, index file, master status, master logs, binary events ...

...
[6 Jun 23:55] Sveta Smirnova
Thank you for the report.

Verified as described:

1. Start mysqld with option indicated (could not repeat without --log-slave-updates=1
--sync-binlog=1 --expire-logs-days=10)
2. Create table, insert
3. Start another mysqld
4. Wait when it dies
5. ls datadir - see new log
6. Issue status queries - no new log
7. Purge master logs to lates log from status command
8. Issue status queries - see new log

Sometimes, but not always, I could not restart mysqld after tests due to "Failed to open
log (file '', errno 2)" error
[8 Jul 18:38] Alfranio Correia
There are three different issues described in this bug report:

1 - The original client's issue what is not reproduced here but may be explained as
follows:

  Due to the failure in the Cluster the server is being aborted
  while purging the logs:

  (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs)

  1 - copy the content of the log-bin.index to a temporary buffer.
  2 - update the log-bin.index.
  3 - flush the log-bin.index.
  4 - erase the files whose names where copied to the temporary buffer
  in step 1.

  Most likely, the server is being aborted while executing step 4 thus
  generating an orphan file. This is a known issue with a simple fix.
  In other words, we just need to augment the initialization routine to
  automatically erase any orphan file.

  2 - Two servers being started on the same data directory.

  The second server is able to create a binary log and update
  the index file although it fails to start up.

  -----------------------------------------------------------
  InnoDB: Error in opening ./ibdata1
  090708 15:42:38  InnoDB: Operating system error number 11 in a file operation.
  InnoDB: Error number 11 means 'Resource temporarily unavailable'.
  InnoDB: Some operating system error numbers are described at
  InnoDB: http://dev.mysql.com/doc/refman/5.1/en/operating-system-error-codes.html
  InnoDB: Could not open or create data files.
  InnoDB: If you tried to add new data files, and it failed here,
  InnoDB: you should now edit innodb_data_file_path in my.cnf back
  InnoDB: to what it was, and remove the new ibdata files InnoDB created
  InnoDB: in this failed attempt. InnoDB only wrote those files full of
  InnoDB: zeros, but did not yet use them in any way. But be careful: do not
  InnoDB: remove old data files which contain your precious data!
  090708 15:42:38 [ERROR] Plugin 'InnoDB' init function returned error.
  090708 15:42:38 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
  090708 15:42:38 [ERROR] Can't start server: Bind on TCP/IP port: Address already in
use
  090708 15:42:38 [ERROR] Do you already have another mysqld server running on port: 3306
?
  090708 15:42:38 [ERROR] Aborting
  -----------------------------------------------------------

  The server should never be able to create a new binary log and modify the
  binary log index file in such situation.

  3 - The purge operation is allowing to remove an used file:
  the descriptor of the current binary log is kept although the physical file
  is removed from the file system.

  There is a problem with the sql_repl.cc:bool log_in_use(const char* log_name),
  which is not correctly checking if a file is in use thus causing purge to
  happen in MYSQL_BIN_LOG::purge_logs.
[8 Jul 18:41] Alfranio Correia
In this bug report, we are going to address item 1 and 3.

You are going to discuss with the replication team how you should address item 2 and file
a new bug report in due time.

Cheers.
[13 Jul 1:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78483

3013 Alfranio Correia	2009-07-13
      BUG#45292 orphan binary log created when starting server twice
      
      This patch fixes two bugs as follows. First, aborting the server while purging
      binary logs might generate orphan files due to how the purge operation was
      implemented:
      
            (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs)
      
        1 - register the files to be removed in a temporary buffer.
        2 - update the log-bin.index.
        3 - flush the log-bin.index.
        4 - erase the files whose names where register in the temporary buffer
        in step 1.
      
      Thus aborting a server while  executing step 4 would generate an orphan file.
      To fix this issue, we record the files to be purged before removing them from
      the index file and from the file system. So if a failure happens such registers
      can be used to automatically remove dangling files. The new steps might be
      outlined as follows:
      
            (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs)
      
        1 - register the files to be removed in a the log-bin.purge
        place in the data directory.
        2 - update the log-bin.index.
        3 - flush the log-bin.index.
        4 - erase the files whose names where register in the log-bin.purge
        in step 1.
      
            (purge routine - sql/log.cc - MYSQL_BIN_LOG::open_index_file)
      
        1 - open the log-bin.index.
        2 - open the log-bin.purge.
        3 - for each file in log-bin.purge
          3.1 Check if file is in log-bin.index and if so ignored it.
          3.2 Otherwise, delete it.
      
      Regarding the second issue. The purge operation was allowing to remove a file
      in use thus leading to the loss of data and possible inconsistencies between
      the master and slave. Roughly, there was a problem with the sql_repl.cc::bool
      log_in_use(), which was not correctly checking if a file was in use. To avoid
      this issue, we replaced the call to such function by a call to the method
      MYSQL_BIN_LOG::is_active() and remove the previous function as it was obsolete.
[19 Jul 16:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78980

3013 Alfranio Correia	2009-07-19
      BUG#45292 orphan binary log created when starting server twice
            
      This patch fixes three bugs as follows. First, aborting the server while purging
      binary logs might generate orphan files due to how the purge operation was
      implemented:
            
                (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs)
           
            1 - register the files to be removed in a temporary buffer.
            2 - update the log-bin.index.
            3 - flush the log-bin.index.
            4 - erase the files whose names where register in the temporary buffer
            in step 1.
      
      Thus a failure while  executing step 4 would generate an orphan file. Second,
      a similar issue might happen while creating a new binary as follows:
      
                (open routine - sql/log.cc - MYSQL_BIN_LOG::open)
           
            1 - open the new log-bin.
            2 - update the log-bin.index.
            
      Thus a failure while executing step 1 would generate an orphan file.
      
      To fix these issues, we record the files to be purged or created before really
      removing or adding them. So if a failure happens such registers can be used to
      automatically remove dangling files. The new steps might be outlined as follows:
            
                (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs)
            
            1 - register the files to be removed in the log-bin.log placed in 
            the data directory.
            2 - update the log-bin.index.
            3 - flush the log-bin.index.
      
                (open routine - sql/log.cc - MYSQL_BIN_LOG::open)
      
            1 - register the file to be created in the log-bin.log
            placed in the data directory.
            2 - open the new log-bin.
            3 - update the log-bin.index.
      
                (purge routine - sql/log.cc - MYSQL_BIN_LOG::open_index_file)
            
            1 - open the log-bin.index.
            2 - open the log-bin.log.
            3 - for each file in log-bin.log
              3.1 Check if the file is in the log-bin.index and if so ignore it.
              3.2 Otherwise, delete it.
            
      The third issue can be described as follows. The purge operation was allowing
      to remove a file in use thus leading to the loss of data and possible 
      inconsistencies between the master and slave. Roughly, the routine was only 
      taking into account the dump threads and so if a slave was not connect the
      file might be delete even though it was in use.
[24 Jul 12:44] Lars Thalmann
It is assumed that this *only* happens when either:

1. the slave has crashed at some point, or 
2. the slave is started with the same datadir as some other server.

It does *not* happen during normal shutdown/start.  (If this is not
correct, and someone has discovered this problem in other
circumstances, please let us know.)
[10 Aug 20:07] Alfranio Correia
Update on comment "[8 Jul 18:38] Alfranio Correia"

There are three different issues described in this bug report:

1 - The original client's issue what is not reproduced here but may be explained as
follows:

  First, aborting the server while purging binary logs might generate
  orphan files due to how the purge operation was implemented:

                (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs)

            1 - register the files to be removed in a temporary buffer.
            2 - update the log-bin.index.
            3 - flush the log-bin.index.
            4 - erase the files whose names where register in the temporary
            buffer in step 1.

  Thus a failure while  executing step 4 would generate an orphan file. Second,
  a similar issue might happen while creating a new binary as follows:

                (open routine - sql/log.cc - MYSQL_BIN_LOG::open)

            1 - open the new log-bin.
            2 - update the log-bin.index.

  Thus a failure while executing step 1 would generate an orphan file.

2 - Two servers being started on the same data directory.

  The second server is able to create a binary log and update
  the index file although it fails to start up.

  -----------------------------------------------------------
  InnoDB: Error in opening ./ibdata1
  090708 15:42:38  InnoDB: Operating system error number 11 in a file operation.
  InnoDB: Error number 11 means 'Resource temporarily unavailable'.
  InnoDB: Some operating system error numbers are described at
  InnoDB: http://dev.mysql.com/doc/refman/5.1/en/operating-system-error-codes.html
  InnoDB: Could not open or create data files.
  InnoDB: If you tried to add new data files, and it failed here,
  InnoDB: you should now edit innodb_data_file_path in my.cnf back
  InnoDB: to what it was, and remove the new ibdata files InnoDB created
  InnoDB: in this failed attempt. InnoDB only wrote those files full of
  InnoDB: zeros, but did not yet use them in any way. But be careful: do not
  InnoDB: remove old data files which contain your precious data!
  090708 15:42:38 [ERROR] Plugin 'InnoDB' init function returned error.
  090708 15:42:38 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
  090708 15:42:38 [ERROR] Can't start server: Bind on TCP/IP port: Address already in
use
  090708 15:42:38 [ERROR] Do you already have another mysqld server running on port:
3306
?
  090708 15:42:38 [ERROR] Aborting
  -----------------------------------------------------------

  The server should never be able to create a new binary log and modify the
  binary log index file in such situation. And apparently the cluster is not
  gracefully finishing the server and it should.

3 - The purge operation is allowing to remove a file that is in use:
the descriptor of the current binary log is kept although the physical file
is removed from the file system.

  The sql_repl.cc:bool log_in_use(const char* log_name) only checks if a 
  binary log is in use by a slave. It does not take into account that there 
  may not be slaves and that the file may be in use by the master.
[7 Sep 1:43] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82536

3102 Alfranio Correia	2009-09-07
      BUG#45292 orphan binary log created when starting server twice
      
      This patch fixes three bugs as follows. First, aborting the server while purging
      binary logs might generate orphan files due to how the purge operation was
      implemented:
      
                (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs)
      
            1 - register the files to be removed in a temporary buffer.
            2 - update the log-bin.index.
            3 - flush the log-bin.index.
            4 - erase the files whose names where register in the temporary buffer
            in step 1.
      
      Thus a failure while  executing step 4 would generate an orphan file. Second,
      a similar issue might happen while creating a new binary as follows:
      
                (open routine - sql/log.cc - MYSQL_BIN_LOG::open)
      
            1 - open the new log-bin.
            2 - update the log-bin.index.
            3 - flush the log-bin.index.
      
      Thus a failure while executing step 1 would generate an orphan file.
      
      To fix these issues, we use the index file as a log and introduce a recovery
      procedure as follows:
      
                (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs)
      
            1 - erase the files.
            2 - update the log-bin.index.
            3 - flush the log-bin.index.
      
                (open routine - sql/log.cc - MYSQL_BIN_LOG::open)
      
            1 - update the log-bin.index if new log-bin entry is not an entry.
            2 - flush the log-bin.index.
            3 - open the new log-bin.
      
                (purge routine - sql/log.cc - MYSQL_BIN_LOG::open_index_file)
      
            1 - open the log-bin.index.
            2 - for each file in log-bin.index
              2.1 check if the file exists.
              2.2 if it does not exist and is not a new log-bin, remove the entry.
            3 - for a new log-bin:
              3.1 - if the file is corrupted, remove it.
              3.2 - otherwise, do nothing.
      
      The third issue can be described as follows. The purge operation was allowing
      to remove a file in use thus leading to the loss of data and possible
      inconsistencies between the master and slave. Roughly, the routine was only
      taking into account the dump threads and so if a slave was not connect the
      file might be delete even though it was in use.
[23 Nov 4:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/91246

3215 Alfranio Correia	2009-11-23
      BUG#45292 orphan binary log created when starting server twice
      
      This patch fixes three bugs as follows. First, aborting the server while purging
      binary logs might generate orphan files due to how the purge operation was
      implemented:
      
                (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs)
      
            1 - register the files to be removed in a temporary buffer.
            2 - update the log-bin.index.
            3 - flush the log-bin.index.
            4 - erase the files whose names where register in the temporary buffer
            in step 1.
      
      Thus a failure while  executing step 4 would generate an orphan file. Second,
      a similar issue might happen while creating a new binary as follows:
      
                (open routine - sql/log.cc - MYSQL_BIN_LOG::open)
      
            1 - open the new log-bin.
            2 - update the log-bin.index.
      
      Thus a failure while executing step 1 would generate an orphan file.
      
      To fix these issues, we record the files to be purged or created before really
      removing or adding them. So if a failure happens such registers can be used to
      automatically remove dangling files. The new steps might be outlined as follows:
      
                (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs)
      
            1 - register the files to be removed in the log-bin.log placed in
            the data directory.
            2 - update the log-bin.index.
            3 - flush the log-bin.index.
      
                (open routine - sql/log.cc - MYSQL_BIN_LOG::open)
      
            1 - register the file to be created in the log-bin.log
            placed in the data directory.
            2 - open the new log-bin.
            3 - update the log-bin.index.
      
                (purge routine - sql/log.cc - MYSQL_BIN_LOG::open_index_file)
      
            1 - open the log-bin.index.
            2 - open the log-bin.log.
            3 - for each file in log-bin.log
              3.1 Check if the file is in the log-bin.index and if so ignore it.
              3.2 Otherwise, delete it.
      
      The third issue can be described as follows. The purge operation was allowing
      to remove a file in use thus leading to the loss of data and possible
      inconsistencies between the master and slave. Roughly, the routine was only
      taking into account the dump threads and so if a slave was not connect the
      file might be delete even though it was in use.