Description:
we setuped several 2 mysql servers in mysql replication group. Yesterday, one mysql server encountered some error, any connection to it would encounters Timeout Exception.
We though MySQL replication group can handle such scenario and mark that server to "Unavailable", however, the truth is: mysql client library still sending new connection to bad server.
callstack is like below:
Unhandled exception System.Data.Entity.Core.EntityException: The underlying provider failed on Open. ---> System.TimeoutException: Timeout in IO operation
at MySql.Data.MySqlClient.TimedStream.StopTimer()
at MySql.Data.MySqlClient.TimedStream.Read(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.BufferedStream.Read(Byte[] array, Int32 offset, Int32 count)
at MySql.Data.MySqlClient.MySqlStream.ReadFully(Stream stream, Byte[] buffer, Int32 offset, Int32 count)
at MySql.Data.MySqlClient.MySqlStream.LoadPacket()
at MySql.Data.MySqlClient.MySqlStream.ReadPacket()
at MySql.Data.MySqlClient.Authentication.MySqlAuthenticationPlugin.ReadPacket()
at MySql.Data.MySqlClient.Authentication.MySqlAuthenticationPlugin.Authenticate(Boolean reset)
at MySql.Data.MySqlClient.NativeDriver.Open()
at MySql.Data.MySqlClient.Driver.Open()
at MySql.Data.MySqlClient.Driver.Create(MySqlConnectionStringBuilder settings)
at MySql.Data.MySqlClient.Replication.ReplicationManager.GetNewConnection(String groupName, Boolean master, MySqlConnection connection)
at MySql.Data.MySqlClient.MySqlConnection.Open()
at System.Data.Entity.Infrastructure.Interception.InternalDispatcher`1.Dispatch[TTarget,TInterceptionContext](TTarget target, Action`2 operation, TInterceptionContext interceptionContext, Action`3 executing, Action`3 executed)
at System.Data.Entity.Infrastructure.Interception.DbConnectionDispatcher.Open(DbConnection connection, DbInterceptionContext interceptionContext)
at System.Data.Entity.Core.EntityClient.EntityConnection.Open()
code analysis:
===
we checked code ReplicationManager.GetNewConnection(), found that it ignored Timeout exception and let go. I believe this is a bad idea, if open connection get timeout, it's definitely that server is broken.
How to repeat:
it's not easy to repeat a sever error. however, if you read code, it's quite obvious
please check ReplicationManager.GetNewConnection(string groupName, bool master, MySqlConnection connection) try...catch statement.
Suggested fix:
please consider a timeout exception as "server unavailable"