Bug #34963 RoundRobin load balancing not working correctly with the ReplicationConnection
Submitted: 29 Feb 2008 18:03 Modified: 14 Aug 2009 21:24
Reporter: Jennifer Lee Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:5.0.8 OS:Linux (RHEL 5)
Assigned to: Mark Matthews CPU Architecture:Any

[29 Feb 2008 18:03] Jennifer Lee
Description:
RoundRobin load balancing not working correctly with the ReplicationConnection due to the fact that the slave Connection object sets its failedOver status to true if its hostIndex > 0 (ie -- it is connected to a slave other than the first slave in the URL list). As a result, the slave Connection attempts to fallback to the Master (in this case the first slave in the list) as soon as the shouldFallBack() method returns true. For large loads, the first slave in the list receives most of the load unless you set queriesBeforeRetryMaster and secondsBeforeRetryMaster 
to very high numbers.

How to repeat:
I am using the ReplicationDriver with a datasource URL that has 1 master and 2 slaves: jdbc:mysql://master,slave1,slave2/db_name?autoReconnect=true&roundRobinLoadBalance=true

Using top to view CPU utilization on both of the slave servers (which have identical hardware), I would expect to see both servers being utilized approximately equally. However, slave1 displays >100% CPU utilization while slave2 exhibits <5% CPU utilization for my test program which issues thousands of read requests using multiple threads. 

Suggested fix:
Looking at the Connection object's createNewIO method, after a new MySqlIO object is created, there is a check to see if (hostIndex != 0). If 
it is, then setFailedOverState() is called. This should not be called if the connection is a slave connection. I added a check to see if the Connection is 
a slave connection by checking the "com.mysql.jdbc.ReplicationConnection.isSlave" property that is set in the NonRegisteringDriver.connectReplicationConnection() 
method. 

If the Connection is a slave connection, skip the setFailedOverState() call.

After making this change, the CPU utilizations are close to equal on the 2 slave servers which indicates true round robin load balancing is occurring.
[3 Mar 2008 19:26] Tonci Grgin
Hi Jennifer and thanks for your report. I believe it's connected to Bug#34937 but I have to confirm the suspicion. What's your opinion?
[4 Mar 2008 14:05] Jennifer Lee
Hi Tonci, 

Bug#34937 looks like a casting issue, so I don't think the 2 are related. Also, I'm using DBCP connection pooling, not the MysqlConnectionPoolDataSource. The main issue for this bug is that the slave Connection object inside of the ReplicationConnection object is being set to failed over (in the createNewIO method) when it shouldn't be. Once it's set to failed over, it keeps trying to fall back to the master which results in uneven load balancing.

Thanks!
Jennifer
[5 Mar 2008 14:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43465
[5 Mar 2008 15:44] Jennifer Lee
Will this patch also be applied to the 5.0.x code stream?

Thanks!
Jennifer
[5 Mar 2008 16:06] Mark Matthews
No, this patch will not be applied retroactively. Connector/J 5.1 is the current GA release, 5.0 is "legacy" and will only have critical (i.e. security, corruption-related) fixes applied.
[6 Mar 2008 16:44] Jennifer Lee
Hi Mark, 

Thanks for the response. Since we're using MySQL 5.0, I was thinking we also needed to use Connector/J 5.0.x, but I now see that 5.1.5 does support MySQL 5.0, so we are okay.

Thanks!
Jennifer
[25 Apr 2008 16:22] Luke Bredeson
I think I have found a problem with this patch (I am using mysql-connector-java-5.1.6).  It seems that the last slave in the connection string will never be used (this is very obvious when you only have 2 slaves, as I do).  The offending code is in ConnectionImpl.java:

int indexRange = hostList.size() - 1;
int index = (int)(Math.random() * indexRange);
return index;

It works correctly if you don't subtract 1 from hostList.size(), as casting to int rounds down and will never select the last slave.  This should work:

int index = (int)(Math.random() * hostList.size());
return index;

As a temporary workaround, I have just added duplicate slave entries to my connection string, but this is obviously not ideal.
[7 Jul 2008 14:24] Tony Bedford
Changing to "open" as patch needs to be re-verified.
[30 Jul 2008 14:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50724
[18 Nov 2008 0:06] Todd Farmer
The problem of the last slave never being called is addressed in BUG#39611, and I believe this bug report should now be closed.