Description:
Our data center had a power failure today. When the servers came back up, MySQL failed to start properly. It recovered from the binary logs without issue, but then croaked on the RSA keys used for sha256 passwords / pluggable authentication. Here are the log contents from the restart:
130821 12:29:48 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
2013-08-21 12:29:49 1849 [Note] Plugin 'FEDERATED' is disabled.
2013-08-21 12:29:49 1849 [Note] InnoDB: The InnoDB memory heap is disabled
2013-08-21 12:29:49 1849 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2013-08-21 12:29:49 1849 [Note] InnoDB: Compressed tables use zlib 1.2.8
2013-08-21 12:29:49 1849 [Note] InnoDB: Using Linux native AIO
2013-08-21 12:29:49 1849 [Note] InnoDB: Not using CPU crc32 instructions
2013-08-21 12:29:49 1849 [Note] InnoDB: Initializing buffer pool, size = 600.0M
2013-08-21 12:29:49 1849 [Note] InnoDB: Completed initialization of buffer pool
2013-08-21 12:29:49 1849 [Note] InnoDB: Highest supported file format is Barracuda.
2013-08-21 12:29:49 1849 [Note] InnoDB: The log sequence numbers 27259967 and 27259967 in ibdata files do not match the log sequence number 27533679 in the ib_logfiles!
2013-08-21 12:29:49 1849 [Note] InnoDB: Database was not shutdown normally!
2013-08-21 12:29:49 1849 [Note] InnoDB: Starting crash recovery.
2013-08-21 12:29:49 1849 [Note] InnoDB: Reading tablespace information from the .ibd files...
2013-08-21 12:29:49 1849 [Note] InnoDB: Restoring possible half-written data pages
2013-08-21 12:29:49 1849 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Last MySQL binlog file position 0 3159073, file name lynyrd-bin.000027
2013-08-21 12:29:49 1849 [Note] InnoDB: 128 rollback segment(s) are active.
2013-08-21 12:29:49 1849 [Note] InnoDB: Waiting for purge to start
2013-08-21 12:29:49 1849 [Note] InnoDB: 5.6.12 started; log sequence number 27533679
2013-08-21 12:29:49 1849 [Note] Recovering after a crash using /var/lib/mysql/lynyrd-bin
2013-08-21 12:29:49 1849 [Note] Starting crash recovery...
2013-08-21 12:29:49 1849 [Note] Crash recovery finished.
2013-08-21 12:29:49 1849 [ERROR] Failure to parse RSA public key (file exists): /var/lib/mysql//rsa-mysql-server.pub: error:0906D06C:PEM routines:PEM_read_bio:no start line
2013-08-21 12:29:49 1849 [ERROR] Aborting
2013-08-21 12:29:49 1849 [Note] Binlog end
2013-08-21 12:29:49 1849 [Note] Shutting down plugin 'partition'
2013-08-21 12:29:49 1849 [Note] Shutting down plugin 'ARCHIVE'
2013-08-21 12:29:49 1849 [Note] Shutting down plugin 'PERFORMANCE_SCHEMA'
2013-08-21 12:29:49 1849 [Note] Shutting down plugin 'BLACKHOLE'
2013-08-21 12:29:49 1849 [Note] Shutting down plugin 'INNODB_SYS_DATAFILES'
...
2013-08-21 12:29:49 1849 [Note] Shutting down plugin 'INNODB_TRX'
2013-08-21 12:29:49 1849 [Note] Shutting down plugin 'InnoDB'
2013-08-21 12:29:49 1849 [Note] InnoDB: FTS optimize thread exiting.
2013-08-21 12:29:49 1849 [Note] InnoDB: Starting shutdown...
2013-08-21 12:29:51 1849 [Note] InnoDB: Shutdown completed; log sequence number 27538069
2013-08-21 12:29:51 1849 [Note] Shutting down plugin 'CSV'
...
2013-08-21 12:29:51 1849 [Note] Shutting down plugin 'binlog'
2013-08-21 12:29:51 1849 [Note] /usr/sbin/mysqld: Shutdown complete
Notice the failure to read the RSA public key. Thinking "that's weird", I went to inspect the private and public keys. They were working yesterday...
$ openssl rsa -in rsa-mysql-server.pem
writing RSA key
-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
$ openssl rsa -in rsa-mysql-server.pub
writing RSA key
-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
Okay so "corrupt" really isn't the best word here. MySQL /replaces/ the public key file with the private key on a crash. That's crazy, and not good. Thankfully the public key was still recoverable:
$ openssl rsa -in rsa-mysql-server.pem -pubout
writing RSA key
-----BEGIN PUBLIC KEY-----
...
-----END PUBLIC KEY-----
So I just re-exported it back to the public key file:
$ openssl rsa -in rsa-mysql-server.pem -pubout > rsa-mysql-server.pub
writing RSA key
And with that MySQL happily started back up. Kind-of. I had to restart my slaves, because for some reason they were now trying to connect without passwords?
2013-08-21 21:57:36 4023 [Warning] Access denied for user 'user'@'slave-1-hostname' (using password: NO)
2013-08-21 21:57:36 4023 [Warning] Access denied for user 'user'@'slave-2-hostname' (using password: NO)
Restarting the slaves cleared that up, too:
2013-08-21 22:09:30 4023 [Note] Start binlog_dump to master_thread_id(223) slave_server(97), pos(lynyrd-bin.000027, 3179475) <-- slave 1 restarted
2013-08-21 22:09:46 4023 [Warning] Access denied for user 'user'@'slave-1-hostname' (using password: NO)
2013-08-21 22:10:47 4023 [Warning] Access denied for user 'user'@'slave-2-hostname' (using password: NO)
2013-08-21 22:11:11 4023 [Note] Start binlog_dump to master_thread_id(244) slave_server(98), pos(lynyrd-bin.000027, 3179475) <-- slave 2 restarted
Here's some output from one of the slave logs:
2013-08-21 21:55:35 1634 [ERROR] Slave I/O: error reconnecting to master 'nwtsdns@master-ip:3307' - retry-time: 60 retries: 380, Error_code: 2003
2013-08-21 21:56:35 1634 [ERROR] Slave I/O: error reconnecting to master 'nwtsdns@master-ip:3307' - retry-time: 60 retries: 381, Error_code: 2003
2013-08-21 21:57:37 1634 [ERROR] Slave I/O: error reconnecting to master 'nwtsdns@master-ip:3307' - retry-time: 60 retries: 382, Error_code: 1045 <-- master finally got started with correct RSA keys
2013-08-21 21:58:37 1634 [ERROR] Slave I/O: error reconnecting to master 'nwtsdns@master-ip:3307' - retry-time: 60 retries: 383, Error_code: 1045
This is just bizarre behavior from MySQL during a crash recovery. Notice that it is somewhat related to bug 70060, because the error messages for that last problem are no help at all. Obviously the slaves weren't failing to provide a password.
How to repeat:
- Have RSA keys for sha256 authentication
- Crash