| Bug #45292 | orphan binary log created when starting server twice | ||
|---|---|---|---|
| Submitted: | 3 Jun 11:58 | Modified: | 26 Nov 12:50 |
| Reporter: | Joffrey MICHAIE | ||
| Status: | In progress | ||
| Category: | Server: Replication | Severity: | S2 (Serious) |
| Version: | mysql-5.1-telco-7.0 | OS: | Any |
| Assigned to: | Alfranio Correia | Target Version: | |
| Tags: | binary log, replication, orphan, phantom, 5.1.30-ndb-6.3.20, 5.1.34-ndb-7.0 | ||
| Triage: | Triaged: D3 (Medium) / R2 (Low) / E3 (Medium) | ||
[6 Jun 23:55]
Sveta Smirnova
Thank you for the report. Verified as described: 1. Start mysqld with option indicated (could not repeat without --log-slave-updates=1 --sync-binlog=1 --expire-logs-days=10) 2. Create table, insert 3. Start another mysqld 4. Wait when it dies 5. ls datadir - see new log 6. Issue status queries - no new log 7. Purge master logs to lates log from status command 8. Issue status queries - see new log Sometimes, but not always, I could not restart mysqld after tests due to "Failed to open log (file '', errno 2)" error
[8 Jul 18:38]
Alfranio Correia
There are three different issues described in this bug report: 1 - The original client's issue what is not reproduced here but may be explained as follows: Due to the failure in the Cluster the server is being aborted while purging the logs: (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs) 1 - copy the content of the log-bin.index to a temporary buffer. 2 - update the log-bin.index. 3 - flush the log-bin.index. 4 - erase the files whose names where copied to the temporary buffer in step 1. Most likely, the server is being aborted while executing step 4 thus generating an orphan file. This is a known issue with a simple fix. In other words, we just need to augment the initialization routine to automatically erase any orphan file. 2 - Two servers being started on the same data directory. The second server is able to create a binary log and update the index file although it fails to start up. ----------------------------------------------------------- InnoDB: Error in opening ./ibdata1 090708 15:42:38 InnoDB: Operating system error number 11 in a file operation. InnoDB: Error number 11 means 'Resource temporarily unavailable'. InnoDB: Some operating system error numbers are described at InnoDB: http://dev.mysql.com/doc/refman/5.1/en/operating-system-error-codes.html InnoDB: Could not open or create data files. InnoDB: If you tried to add new data files, and it failed here, InnoDB: you should now edit innodb_data_file_path in my.cnf back InnoDB: to what it was, and remove the new ibdata files InnoDB created InnoDB: in this failed attempt. InnoDB only wrote those files full of InnoDB: zeros, but did not yet use them in any way. But be careful: do not InnoDB: remove old data files which contain your precious data! 090708 15:42:38 [ERROR] Plugin 'InnoDB' init function returned error. 090708 15:42:38 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 090708 15:42:38 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use 090708 15:42:38 [ERROR] Do you already have another mysqld server running on port: 3306 ? 090708 15:42:38 [ERROR] Aborting ----------------------------------------------------------- The server should never be able to create a new binary log and modify the binary log index file in such situation. 3 - The purge operation is allowing to remove an used file: the descriptor of the current binary log is kept although the physical file is removed from the file system. There is a problem with the sql_repl.cc:bool log_in_use(const char* log_name), which is not correctly checking if a file is in use thus causing purge to happen in MYSQL_BIN_LOG::purge_logs.
[8 Jul 18:41]
Alfranio Correia
In this bug report, we are going to address item 1 and 3. You are going to discuss with the replication team how you should address item 2 and file a new bug report in due time. Cheers.
[13 Jul 1:25]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/78483 3013 Alfranio Correia 2009-07-13 BUG#45292 orphan binary log created when starting server twice This patch fixes two bugs as follows. First, aborting the server while purging binary logs might generate orphan files due to how the purge operation was implemented: (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs) 1 - register the files to be removed in a temporary buffer. 2 - update the log-bin.index. 3 - flush the log-bin.index. 4 - erase the files whose names where register in the temporary buffer in step 1. Thus aborting a server while executing step 4 would generate an orphan file. To fix this issue, we record the files to be purged before removing them from the index file and from the file system. So if a failure happens such registers can be used to automatically remove dangling files. The new steps might be outlined as follows: (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs) 1 - register the files to be removed in a the log-bin.purge place in the data directory. 2 - update the log-bin.index. 3 - flush the log-bin.index. 4 - erase the files whose names where register in the log-bin.purge in step 1. (purge routine - sql/log.cc - MYSQL_BIN_LOG::open_index_file) 1 - open the log-bin.index. 2 - open the log-bin.purge. 3 - for each file in log-bin.purge 3.1 Check if file is in log-bin.index and if so ignored it. 3.2 Otherwise, delete it. Regarding the second issue. The purge operation was allowing to remove a file in use thus leading to the loss of data and possible inconsistencies between the master and slave. Roughly, there was a problem with the sql_repl.cc::bool log_in_use(), which was not correctly checking if a file was in use. To avoid this issue, we replaced the call to such function by a call to the method MYSQL_BIN_LOG::is_active() and remove the previous function as it was obsolete.
[19 Jul 16:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/78980 3013 Alfranio Correia 2009-07-19 BUG#45292 orphan binary log created when starting server twice This patch fixes three bugs as follows. First, aborting the server while purging binary logs might generate orphan files due to how the purge operation was implemented: (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs) 1 - register the files to be removed in a temporary buffer. 2 - update the log-bin.index. 3 - flush the log-bin.index. 4 - erase the files whose names where register in the temporary buffer in step 1. Thus a failure while executing step 4 would generate an orphan file. Second, a similar issue might happen while creating a new binary as follows: (open routine - sql/log.cc - MYSQL_BIN_LOG::open) 1 - open the new log-bin. 2 - update the log-bin.index. Thus a failure while executing step 1 would generate an orphan file. To fix these issues, we record the files to be purged or created before really removing or adding them. So if a failure happens such registers can be used to automatically remove dangling files. The new steps might be outlined as follows: (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs) 1 - register the files to be removed in the log-bin.log placed in the data directory. 2 - update the log-bin.index. 3 - flush the log-bin.index. (open routine - sql/log.cc - MYSQL_BIN_LOG::open) 1 - register the file to be created in the log-bin.log placed in the data directory. 2 - open the new log-bin. 3 - update the log-bin.index. (purge routine - sql/log.cc - MYSQL_BIN_LOG::open_index_file) 1 - open the log-bin.index. 2 - open the log-bin.log. 3 - for each file in log-bin.log 3.1 Check if the file is in the log-bin.index and if so ignore it. 3.2 Otherwise, delete it. The third issue can be described as follows. The purge operation was allowing to remove a file in use thus leading to the loss of data and possible inconsistencies between the master and slave. Roughly, the routine was only taking into account the dump threads and so if a slave was not connect the file might be delete even though it was in use.
[24 Jul 12:44]
Lars Thalmann
It is assumed that this *only* happens when either: 1. the slave has crashed at some point, or 2. the slave is started with the same datadir as some other server. It does *not* happen during normal shutdown/start. (If this is not correct, and someone has discovered this problem in other circumstances, please let us know.)
[10 Aug 20:07]
Alfranio Correia
Update on comment "[8 Jul 18:38] Alfranio Correia"
There are three different issues described in this bug report:
1 - The original client's issue what is not reproduced here but may be explained as
follows:
First, aborting the server while purging binary logs might generate
orphan files due to how the purge operation was implemented:
(purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs)
1 - register the files to be removed in a temporary buffer.
2 - update the log-bin.index.
3 - flush the log-bin.index.
4 - erase the files whose names where register in the temporary
buffer in step 1.
Thus a failure while executing step 4 would generate an orphan file. Second,
a similar issue might happen while creating a new binary as follows:
(open routine - sql/log.cc - MYSQL_BIN_LOG::open)
1 - open the new log-bin.
2 - update the log-bin.index.
Thus a failure while executing step 1 would generate an orphan file.
2 - Two servers being started on the same data directory.
The second server is able to create a binary log and update
the index file although it fails to start up.
-----------------------------------------------------------
InnoDB: Error in opening ./ibdata1
090708 15:42:38 InnoDB: Operating system error number 11 in a file operation.
InnoDB: Error number 11 means 'Resource temporarily unavailable'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/operating-system-error-codes.html
InnoDB: Could not open or create data files.
InnoDB: If you tried to add new data files, and it failed here,
InnoDB: you should now edit innodb_data_file_path in my.cnf back
InnoDB: to what it was, and remove the new ibdata files InnoDB created
InnoDB: in this failed attempt. InnoDB only wrote those files full of
InnoDB: zeros, but did not yet use them in any way. But be careful: do not
InnoDB: remove old data files which contain your precious data!
090708 15:42:38 [ERROR] Plugin 'InnoDB' init function returned error.
090708 15:42:38 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
090708 15:42:38 [ERROR] Can't start server: Bind on TCP/IP port: Address already in
use
090708 15:42:38 [ERROR] Do you already have another mysqld server running on port:
3306
?
090708 15:42:38 [ERROR] Aborting
-----------------------------------------------------------
The server should never be able to create a new binary log and modify the
binary log index file in such situation. And apparently the cluster is not
gracefully finishing the server and it should.
3 - The purge operation is allowing to remove a file that is in use:
the descriptor of the current binary log is kept although the physical file
is removed from the file system.
The sql_repl.cc:bool log_in_use(const char* log_name) only checks if a
binary log is in use by a slave. It does not take into account that there
may not be slaves and that the file may be in use by the master.
[7 Sep 1:43]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/82536 3102 Alfranio Correia 2009-09-07 BUG#45292 orphan binary log created when starting server twice This patch fixes three bugs as follows. First, aborting the server while purging binary logs might generate orphan files due to how the purge operation was implemented: (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs) 1 - register the files to be removed in a temporary buffer. 2 - update the log-bin.index. 3 - flush the log-bin.index. 4 - erase the files whose names where register in the temporary buffer in step 1. Thus a failure while executing step 4 would generate an orphan file. Second, a similar issue might happen while creating a new binary as follows: (open routine - sql/log.cc - MYSQL_BIN_LOG::open) 1 - open the new log-bin. 2 - update the log-bin.index. 3 - flush the log-bin.index. Thus a failure while executing step 1 would generate an orphan file. To fix these issues, we use the index file as a log and introduce a recovery procedure as follows: (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs) 1 - erase the files. 2 - update the log-bin.index. 3 - flush the log-bin.index. (open routine - sql/log.cc - MYSQL_BIN_LOG::open) 1 - update the log-bin.index if new log-bin entry is not an entry. 2 - flush the log-bin.index. 3 - open the new log-bin. (purge routine - sql/log.cc - MYSQL_BIN_LOG::open_index_file) 1 - open the log-bin.index. 2 - for each file in log-bin.index 2.1 check if the file exists. 2.2 if it does not exist and is not a new log-bin, remove the entry. 3 - for a new log-bin: 3.1 - if the file is corrupted, remove it. 3.2 - otherwise, do nothing. The third issue can be described as follows. The purge operation was allowing to remove a file in use thus leading to the loss of data and possible inconsistencies between the master and slave. Roughly, the routine was only taking into account the dump threads and so if a slave was not connect the file might be delete even though it was in use.
[23 Nov 4:10]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/91246 3215 Alfranio Correia 2009-11-23 BUG#45292 orphan binary log created when starting server twice This patch fixes three bugs as follows. First, aborting the server while purging binary logs might generate orphan files due to how the purge operation was implemented: (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs) 1 - register the files to be removed in a temporary buffer. 2 - update the log-bin.index. 3 - flush the log-bin.index. 4 - erase the files whose names where register in the temporary buffer in step 1. Thus a failure while executing step 4 would generate an orphan file. Second, a similar issue might happen while creating a new binary as follows: (open routine - sql/log.cc - MYSQL_BIN_LOG::open) 1 - open the new log-bin. 2 - update the log-bin.index. Thus a failure while executing step 1 would generate an orphan file. To fix these issues, we record the files to be purged or created before really removing or adding them. So if a failure happens such registers can be used to automatically remove dangling files. The new steps might be outlined as follows: (purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs) 1 - register the files to be removed in the log-bin.log placed in the data directory. 2 - update the log-bin.index. 3 - flush the log-bin.index. (open routine - sql/log.cc - MYSQL_BIN_LOG::open) 1 - register the file to be created in the log-bin.log placed in the data directory. 2 - open the new log-bin. 3 - update the log-bin.index. (purge routine - sql/log.cc - MYSQL_BIN_LOG::open_index_file) 1 - open the log-bin.index. 2 - open the log-bin.log. 3 - for each file in log-bin.log 3.1 Check if the file is in the log-bin.index and if so ignore it. 3.2 Otherwise, delete it. The third issue can be described as follows. The purge operation was allowing to remove a file in use thus leading to the loss of data and possible inconsistencies between the master and slave. Roughly, the routine was only taking into account the dump threads and so if a slave was not connect the file might be delete even though it was in use.

Description: I have a MySQL server running with NDB and binary logging : mysql> SHOW MASTER LOGS; +-----------------+-----------+ | Log_name | File_size | +-----------------+-----------+ | sql1-bin.000007 | 529 | +-----------------+-----------+ 1 row in set (0.00 sec) mysql> SHOW MASTER STATUS +-----------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-----------------+----------+--------------+------------------+ | sql1-bin.000007 | 529 | | | +-----------------+----------+--------------+------------------+ 1 row in set (0.00 sec) mysql> SELECT * FROM `mysql`.`ndb_binlog_index` ORDER BY `gci`; +----------+---------------------------------------+----------------+---------+---------+---------+-----------+----------------+------------+-------+ | Position | File | epoch | inserts | updates | deletes | schemaops | orig_server_id | orig_epoch | gci | +----------+---------------------------------------+----------------+---------+---------+---------+-----------+----------------+------------+-------+ | 204 | /usr/local/mysql/data/sql1-bin.000007 | 55319178772496 | 1 | 0 | 0 | 0 | 0 | 0 | 12880 | +----------+---------------------------------------+----------------+---------+---------+---------+-----------+----------------+------------+-------+ 1 row in set (0.01 sec) # ls -lt *bin* -rw-rw---- 1 mysql mysql 38 2009-06-03 13:11 sql1-bin.index -rw-rw---- 1 mysql mysql 529 2009-06-03 13:11 sql1-bin.000007 Then I try to start a new mysqld on the same ndb-nodeid and mysql port: # su - mysql -c mysqld I get the normal error messages : Cluster Error : 090603 13:12:50 [ERROR] NDB: error (1) Configuration error: Error : Could not alloc node id at mgm-srv1 port 1186: Id 50 already allocated by another node. 090603 13:12:51 [ERROR] Plugin 'ndbcluster' init function returned error. 090603 13:12:51 [ERROR] Plugin 'ndbcluster' registration as a STORAGE ENGINE failed. and Local port in use Error : 090603 13:12:51 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use 090603 13:12:51 [ERROR] Do you already have another mysqld server running on port: 3310 ? 090603 13:12:51 [ERROR] Aborting 090603 13:12:51 [Note] mysqld: Shutdown complete But, A new binary log file was created : # ls -lt *bin* -rw-rw---- 1 mysql mysql 126 2009-06-03 13:12 sql1-bin.000008 -rw-rw---- 1 mysql mysql 76 2009-06-03 13:12 sql1-bin.index -rw-rw---- 1 mysql mysql 529 2009-06-03 13:11 sql1-bin.000007 Is on the index file (But not always, sometime have to do a purge master logs to xxx to make it appear) # cat sql1-bin.index /usr/local/mysql/data/sql1-bin.000007 /usr/local/mysql/data/sql1-bin.000008 And in the master logs : mysql> SHOW MASTER LOGS; +-----------------+-----------+ | Log_name | File_size | +-----------------+-----------+ | sql1-bin.000007 | 529 | | sql1-bin.000008 | 126 | +-----------------+-----------+ 2 rows in set (0.00 sec) The writing on the running server is still on the actual log : mysql> SHOW MASTER STATUS; +-----------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-----------------+----------+--------------+------------------+ | sql1-bin.000007 | 529 | | | +-----------------+----------+--------------+------------------+ 1 row in set (0.00 sec) mysql> INSERT INTO `test`.`t1` VALUES (2); Query OK, 1 row affected (0.03 sec) mysql> SHOW MASTER STATUS; +-----------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-----------------+----------+--------------+------------------+ | sql1-bin.000007 | 854 | | | +-----------------+----------+--------------+------------------+ 1 row in set (0.00 sec) I can now purge the binary logs until the last : mysql> PURGE MASTER LOGS TO 'sql1-bin.000008'; Query OK, 0 rows affected (0.03 sec) mysql> SHOW MASTER LOGS; +-----------------+-----------+ | Log_name | File_size | +-----------------+-----------+ | sql1-bin.000008 | 126 | +-----------------+-----------+ 1 row in set (0.00 sec) But we still write somewhere else : mysql> SHOW MASTER STATUS; +-----------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-----------------+----------+--------------+------------------+ | sql1-bin.000007 | 854 | | | +-----------------+----------+--------------+------------------+ 1 row in set (0.00 sec) mysql> INSERT INTO `test`.`t1` VALUES (3); Query OK, 1 row affected (0.03 sec) mysql> SHOW MASTER STATUS; +-----------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-----------------+----------+--------------+------------------+ | sql1-bin.000007 | 1179 | | | +-----------------+----------+--------------+------------------+ 1 row in set (0.00 sec) Also, the ndb_binlog_index table is not anymore updated : mysql> SELECT * FROM `mysql`.`ndb_binlog_index` ORDER BY `gci`; +----------+---------------------------------------+----------------+---------+---------+---------+-----------+----------------+------------+-------+ | Position | File | epoch | inserts | updates | deletes | schemaops | orig_server_id | orig_epoch | gci | +----------+---------------------------------------+----------------+---------+---------+---------+-----------+----------------+------------+-------+ | 854 | /usr/local/mysql/data/sql1-bin.000007 | 56483114909709 | 1 | 0 | 0 | 0 | 0 | 0 | 13151 | +----------+---------------------------------------+----------------+---------+---------+---------+-----------+----------------+------------+-------+ 1 row in set (0.00 sec) The file does not exist anymore : mysql> SHOW BINLOG EVENTS IN 'sql1-bin.000007'; ERROR 1220 (HY000): Error when executing command SHOW BINLOG EVENTS: Could not find target log # ls -lt *bin* -rw-rw---- 1 mysql mysql 38 2009-06-03 13:20 sql1-bin.index -rw-rw---- 1 mysql mysql 126 2009-06-03 13:12 sql1-bin.000008 After MySQL shutdown : # /etc/init.d/mysqld stop * MySQL manager or server PID file could not be found! (Another bug report?) # mysqladmin shutdown # ls -lt *bin* -rw-rw---- 1 mysql mysql 38 2009-06-03 13:20 sql1-bin.index -rw-rw---- 1 mysql mysql 126 2009-06-03 13:12 sql1-bin.000008 Since there are a lot of automatic restart scripts to run mysql, this may happen on many environments, and is not very good, since this breaks quite a lot of things; How to repeat: Run MySQL installed with default config + : log-slave-updates=1 server-id = 50 log_bin = /usr/local/mysql/data/sql1-bin sync_binlog = 1 expire_logs_days = 10 max_binlog_size = 100M [MYSQL_CLUSTER] ndbcluster ndb-nodeid=50 ndb_connectstring=mgm-srv1 ---- Create ndb table : mysql > CREATE DATABASE IF NOT EXISTS `test`; mysql > CREATE TABLE `test`.`t1` (`a` int PRIMARY KEY) engine=ndb; Insert Data : mysql > INSERT INTO `test`.`t1` VALUES (1); Check master status, master logs, and mysql.ndb_binlog_index table. ---- Start MySQL server without stopping : - sudo -u mysql mysqld or - su - mysql -c mysqld ---- Check binary logs files, index file, master status, master logs ---- Purge binary logs to last one mysql> PURGE MASTER LOGS to 'sql1-bin.000008'; ---- Check binary logs files, index file, master status, master logs ---- Insert data mysql> INSERT INTO `test`.`t1` VALUES (2); ---- Check binary logs files, index file, master status, master logs, binary events ... ...