Bug #54135 setQueryTimeout unsafe across VIP
Submitted: 1 Jun 2010 11:16 Modified: 15 Apr 2011 15:29
Reporter: Martin Waite Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S3 (Non-critical)
Version:5.1.x OS:Any
Assigned to: CPU Architecture:Any

[1 Jun 2010 11:16] Martin Waite
Description:
Hi,

If Connector/J connects to a database through a VIP, then there is no guarantee that the server that the timer thread connects to in order to kill the original query is the same server that the original query was run against.

In some situations, the "KILL ..." command issued by the timer thread could be applied to the wrong server, and there kill any other query that uses the same connection id.  It is an unlikely situation, granted.

How to repeat:
Set up a VIP between two servers.  Connect to one server and run "select sleep(20)" with a setQueryTimeout() setting of 30.  When the select has started, reconfigure the VIP to use the second server.

Watch the query logs of both servers.  The KILL command goes to the wrong server.

Suggested fix:
I do not know a solid way to prevent this, but perhaps if the timer thread recorded the server-id of the connection used for the original query and then compared this to the server-id of the connection used for the KILL.  If the server-ids differ, then do not apply the kill.
[2 Jun 2010 6:20] Tonci Grgin
Hi Martin and thanks for your report.

I do see the problem but there is just no way to pick the right server with VIP. Will consult with others but not sure what will come out of it.
[9 Jun 2010 8:16] Tonci Grgin
I need Mark's insight here... IMO, it's not just VIP, this can happen in round-robin and such too.
Maybe the right way to go, at least for MySQL server's supporting it, would be to embed an UID in a comment in query and look for it in process list before killing anything.
[10 Jun 2010 7:19] Tonci Grgin
Martin, we discussed this and there is just no way to fix this elegantly for now. Sorry.
[19 Jan 2011 9:51] Tonci Grgin
Since there is no way for connector to tell where the connection went in this scenario, the workaround would be to use queryTimeoutKillsConnection option to simply kill the offending connection or to use external query killer.

Since injecting some sort of identifier would cause connector, upon timeout, to log to *all* hosts in search of proper UUID will be costly I'm exploring other possibilities. One of them is to file a feature request to server to support timeouts in COM_QUERY.

For now, I think I'll start over with implementing a mechanism that will prevent c/J from killing the right threadID on *wrong* server. Then we'll see.
[15 Apr 2011 15:29] Tonci Grgin
Partial fix pushed to revision 1058. Fix prevents c/J from killing the right ConnectionID on wrong server.