MySQL Bugs: #58324: Slave I/O thread still retries to connect master after verifying user ID fails.

Bug #58324	Slave I/O thread still retries to connect master after verifying user ID fails.
Submitted:	19 Nov 2010 14:49	Modified:	20 Mar 2013 5:40
Reporter:	Libing Song	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.1+, 5.6.1	OS:	Any
Assigned to:	Luis Soares	CPU Architecture:	Any

Description:
Slave I/O thread still retries to connect master after verifying user ID fails.
The failure of verifying user id probably means a wrong user name or password.
That is a fatal error and the following retrying will definitely fail.
So IO thread needn't retry and should stop immediately if it is a invalid replication user.

Here is logs in error log file
-------------------------------
101119 17:29:11 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 109, relay log './slave-relay-bin.000001' position: 4
101119 17:29:11 [ERROR] Slave I/O: error connecting to master 'fake@127.0.0.1:13000' - retry-time: 1  retries: 1, Error_code: 1045
101119 17:29:12 [ERROR] Slave I/O: error connecting to master 'fake@127.0.0.1:13000' - retry-time: 1  retries: 2, Error_code: 1045
101119 17:29:13 [ERROR] Slave I/O: error connecting to master 'fake@127.0.0.1:13000' - retry-time: 1  retries: 3, Error_code: 1045
101119 17:29:14 [ERROR] Slave I/O: error connecting to master 'fake@127.0.0.1:13000' - retry-time: 1  retries: 4, Error_code: 1045
101119 17:29:15 [ERROR] Slave I/O: error connecting to master 'fake@127.0.0.1:13000' - retry-time: 1  retries: 5, Error_code: 1045
101119 17:29:16 [ERROR] Slave I/O: error connecting to master 'fake@127.0.0.1:13000' - retry-time: 1  retries: 6, Error_code: 1045
101119 17:29:17 [ERROR] Slave I/O: error connecting to master 'fake@127.0.0.1:13000' - retry-time: 1  retries: 7, Error_code: 1045
101119 17:29:18 [ERROR] Slave I/O: error connecting to master 'fake@127.0.0.1:13000' - retry-time: 1  retries: 8, Error_code: 1045
101119 17:29:19 [ERROR] Slave I/O: error connecting to master 'fake@127.0.0.1:13000' - retry-time: 1  retries: 9, Error_code: 1045
101119 17:29:20 [ERROR] Slave I/O: error connecting to master 'fake@127.0.0.1:13000' - retry-time: 1  retries: 10, Error_code: 1045
101119 17:29:20 [Note] Slave I/O thread killed while connecting to master

How to repeat:
source include/master-slave.inc;

connection slave;
source include/stop-slave.inc;
CHANGE MASTER TO master_user='fake';
START SLAVE;
source include/wait_for_slave_io_to_stop.inc;

What exact MySQL server version and operating system are we talking about?

On 5.1 and 5.5, there is only one error message in error log file:
"Slave I/O: error connecting to master 'fake@127.0.0.1:13000' - retry-time: 1  retries: 10, Error_code: 1045".  The message still means that I/O thread has tried to connect 10 times.
For a fatal error, the error message is similar to:
"[ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file', Error_code: 1236"

Hi Valeriy Kravchuk:
I checked on ubuntu system and 
I checked mysql-5.1-bugteam, mysql-5.5-bugteam and mysql-trunk-bugfixing.

Thank you for the feedback.

Verified as described in mysql-trunk. 5.1-main is not affected.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/125938

3512 Luis Soares	2010-12-03
      BUG#58324
      
      WORK-IN-PROGRESS
      
      Extended test case for 5.1 .

In all three versions (5.1, 5.5 and trunk) slave will retry until the
number of retries meets master_retry_count. You can check that by:

  1. applying this patch to 5.1:

     - http://lists.mysql.com/commits/125938

  2. run the test case

     (...)/mysql-test> perl mtr rpl_bug58324.test

  3. check the slave's error log 

     (...)/mysql-test> cat var/log/mysqld.2.err

  4. notice several reconnect entries, one for each failure.

Same happens in 5.5. What seems to differ in trunk is the fact that in
we already report one entry per failed reconnection in that codebase. 
This behavior was introduced by fix for BUG#56416 and pushed to trunk
recently.

Therefore, I guess the actual bug here is to make the IO thread not to
reconnect when a FATAL error is found. Wrt the extra entries in the
error log, I see nothing wrong with them so I agree with the fix for
BUG#56416.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/125953

3410 Luis Soares	2010-12-03
      BUG#58324: Slave I/O thread still retries to connect master after 
                 verifying user ID fails.
      
      The slave would try to reconnect to the master, even when 
      facing FATAL connection errors.
      
      The fix is to make the IO thread only to attempt reconnections
      on potentially transient network errors.
     @ sql/rpl_slave.cc
        Apart from the fix, added error injection hooks to be used 
        from within the test case.

it is perception issue, I'd always take retries over fatal in this case, slave is configured to connect to master, it should keep doing that until succeeds. 

it is up for DBA to resolve this situation, not for MySQL server