Bug #63178 connection is refused while excuting "flush privileges"
Submitted: 10 Nov 2011 6:19 Modified: 29 Jul 2012 23:23
Reporter: liu hickey (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S3 (Non-critical)
Version:MySQL-5.1.48 OS:Linux
Assigned to: CPU Architecture:Any
Tags: flush privileges; connection refused; race risk

[10 Nov 2011 6:19] liu hickey
Description:
In such scenario, we hit the infinite looping of 'flush privileges' between master and slave(it's also a master but read-only), which caused the app connections periodic error of "host is not allowed to connect to this MySQL server":
restart 'slave' but changed the server-id, and during the gap of master and 'slave', there exists a 'flush privileges' event in the relay log to be executed.

The root problems caused the connection issue is due to the logic defect which explained as below:

For 'flush privileges', acl_reload()would be called, and then acl_load() is called. The global variable allow_all_hosts will set to 0 under the lock, as well as the modification for acl_check_hosts.

But when client connections to server, acl_check_hosts() will be called, which logic is list below:

bool acl_check_host(const char *host, const char *ip)
1496 {
1497 if (allow_all_hosts)         
1498 return 0;
1499 VOID(pthread_mutex_lock(&acl_cache->lock));
1500
1501 if ((host && hash_search(&acl_check_hosts,(uchar*) host,strlen(host))) ||
1502 (ip && hash_search(&acl_check_hosts,(uchar*) ip, strlen(ip))))
1503 {
	
in line 1497,we find that allow_all_hosts is used without any lock, which caused  inconsistent between allow_all_hosts and acl_check_hosts.

So there is a race risk, and in some special case which like we hit, that might be a problem.

How to repeat:
NoN

Suggested fix:
check allow_all_hosts under the lock, just like this:

VOID(pthread_mutex_lock(&acl_cache->lock));
if (allow_all_hosts){
   VOID(pthread_mutex_unlock(&acl_cache->lock));         	
   return 0;
}
[10 Nov 2011 8:40] Valeriy Kravchuk
Thank you for the problem report.
[29 Jul 2012 23:23] Paul DuBois
Noted in 5.7.0 changelog.

The server refused client connections while executing FLUSH
PRIVILEGES.