MySQL Bugs: #7209: Client error with "Access Denied" on updates when high concurrency

Bug #7209	Client error with "Access Denied" on updates when high concurrency
Submitted:	12 Dec 2004 21:22	Modified:	5 Jan 2006 18:19
Reporter:	Steven Hartland	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server	Severity:	S1 (Critical)
Version:	<=4.1.16, <=5.0.18	OS:	Linux (Linux, FreeBSD)
Assigned to:	Konstantin Osipov	CPU Architecture:	Any
Tags:	bfsm_2006_11_02

Description:
I can confirm this is still happening on 4.1.7 under FreeBSD.
Our senario is that we have ~ 100 machines connected to our central DB each of
which logs to 2 seperate tables once per minute. 1 row into out machine log
table and 0..* rows to our server log table.
The behaviour we are seeing is that random machines will report the error:
Failed to insert server log row ( insert command denied to user
'username'@'hostname' for table 'gf_server_log' )
or
Failed to insert machine log row ( insert command denied to user
'username'@'hostname' for table 'gf_machine_log' )
There doesnt seem to be any pattern to the error i.e. it may happen after just a
few mins of running or it may happen after serveral hours. But in the end all
our loggers stop processing.

This behaviour has only started since our upgrade yesterday to 4.1.7 from
4.0.x.
This may be a seperate issue but the details are so close I doubt it.
Server is running on FreeBSD 4.8-RELEASE-p16 compiled with:
make WITH_LINUXTHREADS=yes DB_DIR=/usr/local/mysql
also tried:
make WITH_LINUXTHREADS=yes BUILD_OPTIMIZED=yes BUILD_STATIC=yes
DB_DIR=/usr/local/mysql
But this was totally unstable sig 11's and out of memory errors.
Clients vary most are FreeBSD 5.2.1 using 4.0 client libs via perl DBI/DBD.

After the initial build I tried a compile without linuxthreads:
Linux threads version was: 2.2.3_13
New build with just:
make DB_DIR=/usr/local/mysql
Although a huge amount better we just had our first failure 5 hours later its seems to be related to concurrency.

Rolled back to 4.0.22 and confirmed that it doesnt suffer from this issue.
Compiled 4.0.22 with following and have had not issues at all unlike with 4.1.7:
make WITH_LINUXTHREADS=yes BUILD_OPTIMIZED=yes BUILD_STATIC=yes
DB_DIR=/usr/local/mysql

I originally reported this under:
http://bugs.mysql.com/bug.php?id=5569
as it looked very similar but reading closer it may not be hence re-reported here as critical as it means 4.1 is currently unusable in this senario.

How to repeat:
Connect a large number of clients using the same user/pass from different hosts matching different host access grants.

We've experienced this bug also under Linux with MySQL versions 4.1.7 and 4.1.9.  The symptoms are exactly as are described above...

Configuation:

MySQL official binaries, version 4.1.7 and 4.1.9
RedHat Linux ES 3 (Taroon Update 2) w/ Kernel 2.4.21-15.0.2.ELsmp
Intel(R) Xeon(TM) CPU 3.06GHz

Anyone else seeing this?

Seeing something similar here on Linux with 4.1.10a... we go along fine for some time, then things freeze up for a while, denying new incoming connections. When this is going on, I usually see something like this (here's hoping for a fixed-width font):

mysql> show full processlist;
+---------+----------------------+----------------------+-------+---------+-------+-------+-----------------------+
| Id      | User                 | Host                 | db    | Command | Time  | State | Info                  |
+---------+----------------------+----------------------+-------+---------+-------+-------+-----------------------+
|    4425 | root                 | www.xxx.xxx:36179    | db1   | Sleep   |     5 |       | NULL                  |
|  516916 | root                 | www.xxx.xxx:37466    | db2   | Sleep   |   152 |       | NULL                  |
| 1015771 | root                 | www.xxx.xxx:54811    | db2   | Sleep   | 65773 |       | NULL                  |
| 1063203 | root                 | localhost            | db2   | Query   |     0 | NULL  | show full processlist |
| 1772016 | root                 | xxx.xxx.112.41:60366 | db2   | Sleep   |    43 |       | NULL                  |
| 1772027 | unauthenticated user | xxx.xxx.112.41:60377 | NULL  | Connect |  NULL | login | NULL                  |
| 1772028 | unauthenticated user | xxx.xxx.112.41:60378 | NULL  | Connect |  NULL | login | NULL                  |
| 1772029 | unauthenticated user | xxx.xxx.112.41:60379 | NULL  | Connect |  NULL | login | NULL                  |
| 1772030 | unauthenticated user | xxx.xxx.112.41:60380 | NULL  | Connect |  NULL | login | NULL                  |
| 1772031 | unauthenticated user | xxx.xxx.112.41:60381 | NULL  | Connect |  NULL | login | NULL                  |
| 1772032 | unauthenticated user | xxx.xxx.112.41:60382 | NULL  | Connect |  NULL | login | NULL                  |
| 1772033 | unauthenticated user | xxx.xxx.112.41:60383 | NULL  | Connect |  NULL | login | NULL                  |
| 1772034 | unauthenticated user | xxx.xxx.112.41:60384 | NULL  | Connect |  NULL | login | NULL                  |
| 1772035 | unauthenticated user | xxx.xxx.112.41:60385 | NULL  | Connect |  NULL | login | NULL                  |
| 1772036 | unauthenticated user | xxx.xxx.112.41:60386 | NULL  | Connect |  NULL | login | NULL                  |
| 1772037 | unauthenticated user | xxx.xxx.112.41:60387 | NULL  | Connect |  NULL | login | NULL                  |
| 1772038 | unauthenticated user | xxx.xxx.112.41:60388 | NULL  | Connect |  NULL | login | NULL                  |
| 1772039 | unauthenticated user | xxx.xxx.112.41:60389 | NULL  | Connect |  NULL | login | NULL                  |
| 1772040 | unauthenticated user | xxx.xxx.112.41:60390 | NULL  | Connect |  NULL | login | NULL                  |
| 1772041 | unauthenticated user | xxx.xxx.112.41:60391 | NULL  | Connect |  NULL | login | NULL                  |
| 1772042 | unauthenticated user | xxx.xxx.112.41:60392 | NULL  | Connect |  NULL | login | NULL                  |
| 1772043 | unauthenticated user | xxx.xxx.112.41:60393 | NULL  | Connect |  NULL | login | NULL                  |
| 1772044 | unauthenticated user | xxx.xxx.112.41:60394 | NULL  | Connect |  NULL | login | NULL                  |
| 1772045 | unauthenticated user | xxx.xxx.112.41:60395 | NULL  | Connect |  NULL | login | NULL                  |
| 1772046 | unauthenticated user | xxx.xxx.112.41:60396 | NULL  | Connect |  NULL | login | NULL                  |
| 1772047 | unauthenticated user | xxx.xxx.112.41:60398 | NULL  | Connect |  NULL | login | NULL                  |
| 1772048 | unauthenticated user | xxx.xxx.112.41:60399 | NULL  | Connect |  NULL | login | NULL                  |
| 1772049 | unauthenticated user | xxx.xxx.112.41:60400 | NULL  | Connect |  NULL | login | NULL                  |
| 1772050 | unauthenticated user | xxx.xxx.112.41:60401 | NULL  | Connect |  NULL | login | NULL                  |
| 1772051 | unauthenticated user | xxx.xxx.112.41:60403 | NULL  | Connect |  NULL | login | NULL                  |

I should mentioned we've got one hard-hit table approaching a gig in size (almost 3 million records). We recently upgraded from 4.0 where we didn't see this problem.

Any further information we can provide on this? It was enough of a problem to force us back onto 4.0. Thanks!

Same here we cant use 4.1 due to this show stopper. No workaround was found.

We haven't been able to reliably reproduce this problem but it *might* be
an incompatibility between the 4.1 server and the 4.0 client libraries.

Steven, Sergio and Adrew: can you please check whether you can confirm
that you are connecting to the 4.1 server with a 4.0 client library?

Yep, we've been running with a 4.0.15 client here. I'll see if we can get 4.1 running there and if that happens to fix things.

Yep we would have been running 4.0 client libs as to upgrade the client would have ment a update no well over 100 machines.

The following should be taken with some major grains of salt, but hopefully it'll be useful.

The other night, I moved everything to 4.1.12 - client webserver, master database server, slave database server. Everything was working fine and I thought I'd watch for a few days before reporting back here.

Then today, our colocation provider had a power outage they were unable to handle (another story). After getting things back up, I found the same hanging-on-authentication problem as before, in fact worse in that it wasn't letting up (this happened during a heavier part of day for us, which might explain that, plus maybe I felt more of a rush to get things working). I could manually connect with the mysql binary, but the flood of web requests (through a PHP loadable module in Apache) would just falter.

Anyway, I dropped the master server back to 4.0 and things started working. Right now, as it's set up, we have one process querying the slave (PHP cli version), and if I watch it, it seems to take a while to authenticate, but usually it will succeed. I'm seeing sporadic login failures from web to master too (4.1 client, 4.0 server) - enough to be a nuisance if it stays this way.

So, it seems like we had the same setup at two different points, the first bug-free, the second not so much. Maybe somehow the webserver was running a different version of the client code than I thought, but it doesn't make a lot of sense to me. Checking now, php reports the mysql module as expected, whereas maybe I only looked at the compile info before to make sure it was the php I just compiled. But even if the 4.0 client code somehow stayed in memory before (which I can't imagine how right now because Apache was completely stopped and started), but if that were so, the symptoms should have been the complete opposite. For the record, I was starting 4.1 with --old-passwords, and all the querying processes are using the same mysql login.

Is there any way to see the version of a client through a variant of "show full processlist" or the like? Everything I check on the php side agrees I'm on the 4.1 client, but it would be nice to have independent confirmation...

I've moved both mysqld servers to 4.0 and still occasionally see a short, sporadic authentication problem, so I'll probably move the client back shortly as well.

In short, we seem to have a 4.1->4.1 instance of this problem, but there's enough oddness in the air, it's not information to rely on. What I hope is that something in the retelling will spark an idea that leads to the answer. I'll update if I find out more.

Okay, the sporadic failures have been eliminated by disabling traffic shaping software that was not supposed to come up on reboot (it wasn't running before the reboot so couldn't have had anything to do with the original issue). I will try again with the 4.1 version running everywhere when I have a chance and see if yesterday's authentication issues go away (it would be strange, but no more stranger than what I saw)

Is there anything new to this issue ? 
I'm also suffering from this problem:
I've upgraded my mysql-server to 4.1.12. Clients are using 4.1, too. Everything worked stable with 4.0. But now I get sporadic Access denied on INSERT, UPDATEs and even SELECTs when connecting from my perl scripts.

I use Linux debian sarge with 2.6 kernel.

Is there any progress on this bug? We've been enjoying stability on our system on 4.0 for a while now, but it would be nice to have the new features in 4.1 available. Is there any kind of information that would help in getting this one figured out?

This is still happening as of 4.1.13 on Linux kernel 2.6 (Fedora Core 4).  See bug #11958

http://bugs.mysql.com/bug.php?id=11958 marked as a duplicate of this one.

Needed to pu this here too:

======================================
 
I really don't have a simpler test case, but we are having this same problem
on:
=========================================================
| version                         | 4.1.15-standard-debug-log                   
    |
| version_comment                 | MySQL Community Edition - Standard (GPL)    
    |
| version_compile_machine         | i686                                        
    |
| version_compile_os              | pc-linux-gnu                               

=========================================================

We have a problem that random SELECT queries will be rejected, when they should
be accepted.  Then, during the same connection, the query may then start to be
accepted again.  So, the problem is random, and happens about 1% of the time.

A perhaps unrelated problem is that it seems that the DNS cache will sometimes
get the wrong FQDN somehow for a user.  I think this points to some problem with
the host cache's hash function access or something.

http://bugs.mysql.com/bug.php?id=10931

For example, we have one person who will log in from x.x.x.x and then get the
access denied error saying she is not allowed from y.y.y.y, even thought that is
NOT HER IP!  When we flush the hosts, then it will think she really has logged
in from x.x.x.x.   

 I have found some stuff in the error logs after turning tracing on.  Don't know
if this is relevant....?

?func: info: New connection received on TCP/IP (144)
?func: info: Host: xxx.xxx.xxx.xxx  ip: 111.111.111.111
?func: info: vio_read returned -1,  errno: 11

Note that this is her IP (111.111.111.111), but that is not the right DNS name (xxx.xxx.xxx.xxx)!  Maybe the host cache's hashtable's keys are wrong?  I don't know what errno 11 is...

Ok, I isolated the general area the error appears in the log file.  I got an
error on the Picture query below (about 10 lines down).  You can tell it is
our error because we have

T@19311536: /usr/sbin/mysqld: net_printf: enter: message: 1142

a couple of lines later.  1142 is the Accessdenied Mysql vendor error.

So this means that maybe we can search for stuff around net_printf and 1142 to
find it.

Anyone see anything strange here?  Want more output?
==============================================================================================

  GNU nano 1.2.1
File: /home/jake/ERROROUT.txt

T@19311536: /usr/sbin/mysqld: vio_read: exit: 4
T@19311536: /usr/sbin/mysqld: do_command: packet_header: Memory: 1260d868
Bytes: (4)
T@19311536: /usr/sbin/mysqld: vio_read: enter: sd=174, buf=0x1260d868, size=70
T@19311536: /usr/sbin/mysqld: vio_read: exit: 70
T@19311536: /usr/sbin/mysqld: vio_blocking: enter: set_blocking_mode: 0
old_mode: 1
T@19311536: /usr/sbin/mysqld: vio_blocking: exit: 0
T@19311536: /usr/sbin/mysqld: do_command: info: Command on TCP/IP (174) = 3
(Query)
T@19311536: /usr/sbin/mysqld: my_write: my: Fd: 6  Buffer: 0x879d150  Count:
92  MyFlags: 20
T@19311536: /usr/sbin/mysqld: dispatch_command: query: SELECT
object,autonumber, timestamp FROM Picture WHERE autonumber=361
T@19311536: /usr/sbin/mysqld: free_root: enter: root: 0x1221b7b8  flags: 0
T@19311536: /usr/sbin/mysqld: check_access: enter: db: 'JExam'  want_access: 1
master_access: 0
T@19311536: /usr/sbin/mysqld: check_access: info: db_access: 0
T@19311536: /usr/sbin/mysqld: hash_search: exit: found key at 15124
T@19311536: /usr/sbin/mysqld: wild_case_compare: enter: str:
'66-168-208-231.dhcp.athn.ga.charter.com'  wildstr: '66.188.79.85'
T@19311536: /usr/sbin/mysqld: net_printf: enter: message: 1142
T@19311536: /usr/sbin/mysqld: vio_is_blocking: exit: 0
T@19311536: /usr/sbin/mysqld: vio_write: enter: sd=174, buf=0x1260d868,
size=118
T@19311536: /usr/sbin/mysqld: vio_write: exit: 118
T@19311536: /usr/sbin/mysqld: my_malloc: my: size: 2020  my_flags: 16
T@19311536: /usr/sbin/mysqld: my_malloc: exit: ptr: 0x12396838
T@19311536: /usr/sbin/mysqld: my_free: my: ptr: 0x0
T@19311536: /usr/sbin/mysqld: my_free: my: ptr: 0x0
T@19311536: /usr/sbin/mysqld: Item::cleanup: info: Type: 1
T@19311536: /usr/sbin/mysqld: Item_ident::cleanup: enter: b:(null)((null)), t:
(null)((null)), f:autonumber(autonumber)
T@19311536: /usr/sbin/mysqld: Item::cleanup: info: Type: 0
T@19311536: /usr/sbin/mysqld: Item_ident::cleanup: enter: b:(null)((null)), t:
(null)((null)), f:timestamp(timestamp)
T@19311536: /usr/sbin/mysqld: Item::cleanup: info: Type: 0
T@19311536: /usr/sbin/mysqld: Item_ident::cleanup: enter: b:(null)((null)), t:
(null)((null)), f:autonumber(autonumber)
T@19311536: /usr/sbin/mysqld: Item::cleanup: info: Type: 0
T@19311536: /usr/sbin/mysqld: Item_ident::cleanup: enter: b:(null)((null)), t:
(null)((null)), f:object(object)
T@19311536: /usr/sbin/mysqld: Item::cleanup: info: Type: 0
T@19311536: /usr/sbin/mysqld: dispatch_command: info: query ready
T@19311536: /usr/sbin/mysqld: free_root: enter: root: 0x1221b00c  flags: 1
T@19311536: /usr/sbin/mysqld: vio_is_blocking: exit: 0
T@19311536: /usr/sbin/mysqld: vio_read: enter: sd=174, buf=0x1260d868, size=4
T@19311536: /usr/sbin/mysqld: vio_read: vio_error: Got error 11 during read
T@19311536: /usr/sbin/mysqld: vio_read: exit: -1
T@19311536: /usr/sbin/mysqld: do_command: info: vio_read returned -1,  errno:
11
T@19311536: /usr/sbin/mysqld: thr_alarm: enter: thread: T@19311536  sec: 28800

FYI, the

T@10128888: /usr/sbin/mysqld: Item::cleanup: info: Type: 1
T@10128888: /usr/sbin/mysqld: Item_ident::cleanup: enter: b:(null)((null)), t:(null)((null)), f:autonumber(autonumber)
T@10128888: /usr/sbin/mysqld: Item::cleanup: info: Type: 0

lines appear to be different from successful selects....maybe the null's signal the problem?

Looks like 4.0.18 doesn't have this problem...still need to try 4.0.18 on the same machine as I had 4.1.15 running on though...

Just to chime in on Jake's previous posts.  The build is from MySQL's 4.1.15 source rpm on RHEL3 with kernel 2.4.21-37.ELsmp (the U6 kernel).  The machine running this build is a HP DL560 with 4 2.2Ghz p4's, with HyperThreading enabled, for a total of 8 Virtual Processors.

my.cnf looks like:
----
[client]
port            = 3306
socket          = /var/lib/mysql/mysql.sock

[mysqld]
port                    = 3306
socket                  = /var/lib/mysql/mysql.sock
key_buffer              = 256M
max_allowed_packet      = 10M
table_cache             = 3000
thread_cache            = 8
sort_buffer_size        = 128K
read_buffer_size        = 512K
read_rnd_buffer_size    = 512K
net_buffer_length       = 16K
thread_stack            = 192K
datadir                 = /path/mysql/datadir
log                     = /path/mysql/logs/query.log
log-error               = /path/mysql/logs/error.log
log-bin                 = /path/mysql/logs/binary.log
log-bin-index           = /path/mysql/logs/binary.index
log-warnings            = 2
max_connections         = 3000
thread_concurrency      = 16
open_files_limit        = 16384
skip-locking

skip-bdb
skip-innodb

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash

[isamchk]
key_buffer = 8M
sort_buffer_size = 8M

[myisamchk]
key_buffer = 8M
sort_buffer_size = 8M

[mysqlhotcopy]
interactive-timeout
-----

Part of the reason the buffer sizes are tweaked down is that when larger and talking to 3k concurrent clients we tend to satrt swapping

One thing that is odd is that there are many vio_* calls, even though we've built --without-vio as per the rpm.  

Thoughts?  This one is oddball to me.

Now we are getting somewhere...

I just installed 4.0.25 on my home system, and it worked fine.  Then I installed 4.1.14 on my home system and I get the errors again.  So it is 4.1+ specific.  Haven't tried 5.0 yet.
==============================
Here's some of my home system specs:
==============================
Linux optimator 2.6.12-gentoo-r6 #1 SMP Sun Aug 21 20:00:25 EDT 2005 x86_64 Dual Core AMD Opteron(tm) Processor 275 AuthenticAMD GNU/Linux

So, it looks like writing a program to do many selects over and over again should allow you to reproduce, IF you have non-worldwide permissions on the user you are connecting as.  Also, I think there need to be some specific tables_priv permissions for the user and IP you are testing from to see the bug.

Also, connect with many threads all doing the selects at once to see the problem escalate.  I can reproduce all off of one host (localhost) on 4.1.14.   I'm not sure if it is concurrency or the multiple connections from the same hosts causing the behavior, but it rarely (never?) happens with only one connection.

I'll bet my butt that this has to do with host hash keys clashing somewhere because looked to be changed in 4.1 (see http://bugs.mysql.com/bug.php?id=10931)

Well, we just disabled the host cache and that didn't seem to help.

Ok, I'm going to get out of this chair before my "computer neck" becomes permanently cricked. 

:)

I think I've provided enough info here, but please let me know if I can help further.

Have you tried to disable the new authentication?
Is there a set of scripts I could use to repeat the bug? (to prepare databases and privileges, fill tables, run the stress test)?

I don't know how to disable the new authentication, will you please advise and I will try it?

There are no scripts for a test case yet.  We are in production, so our main priority is to downgrade now.  The scripts should be fairly easy, but I'm a java programmer.  Do you want it in Java?

The original problem was not the new auth as this was not available in our tests as we where using 4.0 clients.

Same problem here.
C client built againt MySQL-devel-4.0.21-0.rpm to 4.1.14 server (mysql-standard-4.1.14-pc-linux-gnu-i686-icc-glibc23.tar.gz from mysql.com) under linux Redhat 9.0.

Here is my.cnf :
[client]
port            = 3306
socket          = /var/lib/mysql/mysql.sock

[mysqld]
port            = 3306
socket          = /var/lib/mysql/mysql.sock
skip-name-resolve
skip-host-cache
skip-locking
old-passwords
key_buffer=256M
query_cache_size = 0
record_buffer=2M
sort_buffer=2M
max_allowed_packet=1M
max_connections=800
max_connect_errors=100
table_cache=1800
net_read_timeout=180
net_write_timeout=180
wait_timeout=360
thread_concurrency=4

log-bin
server-id       = 1

innodb_additional_mem_pool_size = 16M
innodb_buffer_pool_size = 1G
innodb_data_file_path = ibdata1:1000M:autoextend
innodb_file_io_threads = 4
innodb_thread_concurrency = 16
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 8M
innodb_log_file_size = 256M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
innodb_lock_wait_timeout = 120

Here's a small test script test_bug_mysql.c :
----cut here-----
#include <stdio.h>
#include </usr/include/mysql/mysql.h>

#define host "192.168.0.19"
#define username "Myname"
#define password "Mypassword"
static MYSQL *mysql;

int main(void)
{
  MYSQL_RES      *result;
  char select[220];
  int i=0;

  sprintf(select,"select email from myusertable where name='david'");

  printf("start testing MySQL server\n");
  mysql = mysql_init(NULL);
  if(mysql == NULL)
  {
    printf("Failed to init to database: Error: %s\n\n",mysql_error(mysql));
    return 1;
  }
  if (mysql_real_connect(mysql,host3,username,password,"Jvo_Users",0,NULL,0) == NULL)
  {  
    printf("Failed to connect to database: Error: %s\n\n fe4f46Y \n\n",mysql_error(mysql));
    return 1;
  }
  else
  {
     for(i=0;i< 50000;i++)
     {
	if(mysql_query(mysql,select) != 0)
	{
          printf("%d | Failed to execute querie : %s  Error: %s\n\n",i,select,mysql_error(mysql));
          mysql_close(mysql);
          return 1;
	}
	result = mysql_store_result(mysql);
	mysql_free_result(result);
     }
  }
  mysql_close(mysql);
  return 1;
}
----cut here-----

This always procedure error, counter i is always different.

One last thing, i can reproduce this bug with 6 Web servers with this simple script.
But my last server, can't reproduce this behaviour and i can't see why.
All servers are stricly identical (build over a same RAID 1 array : have an arry, took one disc, build second server, took a disk, build third server ...)

Problem continues in 4.1.15.  What's the status?

Here's a case:

Do an update on the
same table and run the script simultaneously about three times in the
background.
mysql  Ver 14.7 Distrib 4.1.15, for pc-linux-gnu (i686) using readline 4.3
mysql> select version();
+---------------------+
| version()           |
+---------------------+
| 4.1.10a-pro-gpl-log |
+---------------------+
1 row in set (0.00 sec)

script:
#!/usr/bin/perl
use strict;
use DBI;

my $dbh = do_connect("---", "----", "----", "-----", 3306);
my $sql = qq { update table set x = 'somevalue'  where pk  = ? and pk2= ? };
my $i = 1;
while($i < 1000000) {
       my $sh = $dbh->prepare($sql);
       my $rc = $sh->execute( 888222, 'an@email.com');
       if($rc) {
               print "$i: worked\n";
       } else {
               print "$i: ERROR " . $dbh->errstr;
               enditall();
       }
       $sh->finish;
       $i++;
}

sub enditall{
$dbh->disconnect;
exit;
}

sub do_connect {
  my $db     = shift or return undef;
  my $user   = shift or return undef;
  my $pwd    = shift or return undef;
  my $host   = shift or return undef;
  my $port   = shift or return undef;

  my $dbstr = "DBI:mysql:database=$db";
  $dbstr = $host ? "$dbstr;host=$host" : $dbstr;
  $dbstr = $port ? "$dbstr;port=$port" : $dbstr;
  return DBI->connect( $dbstr, $user, $pwd, {} );
}

Is there any progress on fixing this bug for 4.1.16 or 4.1.17?  Still seeing it on 4.1.15-max regularly.

We have now managed to reproduce this properly on an internal server. Now that we have a machine that we can perform some proper debugging on hopefully we will have more information on this bug soon. 

Best regards

Mark

So far we are only able to reproduce this when connecting with a user remotely - scripts running from localhost are all successful. Is this the case for all other users that have encountered this bug?

Best regards

Mark

This was certainly the case when I originally opened the bug. We have over 200 machines all connecting to a central DB.

I can reproduce solely on localhost.  I have java code for it.

I could do this with 4.1.14

In our case, the vast majority of connections were over the network, and primarily from the same remote host.

Can't reproduce the bug locally yet, 4.1-bk

A file with source programs used to reproduce this bug

Attachment: 7209.tgz (application/x-gtar, text), 1.48 KiB.

I was able to reproduce the problem, now trying to get a backtrace.

Backtrace of the error printout:

(gdb) bt
#0  net_printf(THD*, unsigned, ...) (thd=0x8ce3098, errcode=1142)
    at protocol.cc:180
#1  0x081e514c in check_grant(THD*, unsigned long, st_table_list*, unsigned, unsigned, bool) (thd=0x8ce3098, want_access=4, tables=0x428004e8, show_table=0, 
    number=0, no_errors=false) at sql_acl.cc:2916
#2  0x08183bce in check_one_table_access(THD*, unsigned long, st_table_list*) (
    thd=0x8ce3098, privilege=4, tables=0x428004e8) at sql_parse.cc:3796
#3  0x08187801 in update_precheck(THD*, st_table_list*) (thd=0x8ce3098, 
    tables=0x428004e8) at sql_parse.cc:5609
#4  0x08181220 in mysql_execute_command(THD*) (thd=0x8ce3098)
    at sql_parse.cc:2825
#5  0x08184da2 in mysql_parse(THD*, char*, unsigned) (thd=0x8ce3098, 
    inBuf=0x42800db0 "update mail_log set deliv_status_cd = 'D' where email='mail@sqa.org.uk' and email_content_id=535822", length=147730644)
    at sql_parse.cc:4343
#6  0x0817dc35 in dispatch_command(enum_server_command, THD*, char*, unsigned)
    (command=COM_QUERY, thd=0x8ce3098, 
    packet=0x8cd7b91 "update mail_log set deliv_status_cd = 'D' where email='mail@sqa.org.uk' and email_content_id=535822", packet_length=100)
    at sql_parse.cc:1502
#7  0x0817d560 in do_command(THD*) (thd=0x8ce3098) at sql_parse.cc:1315
#8  0x0817c9a8 in handle_one_connection (arg=0x429924dc) at sql_parse.cc:1047
#9  0x4016ff60 in pthread_start_thread () from /lib/i686/libpthread.so.0
#10 0x401700fe in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#11 0x40301327 in clone () from /lib/i686/libc.so.6

The bug is in a race condition in function table_hash_search: column_priv_hash->current_record is updated concurrently by multiple threads. When search results depend on it, no record is found.
This patch fixes the problem:

kostja@oak:~/work/mysql-4.1-root/sql> bk diffs -u
===== mysqld.cc 1.608 vs edited =====
--- 1.608/sql/mysqld.cc	2005-12-23 14:38:14 +03:00
+++ edited/mysqld.cc	2006-01-01 22:09:39 +03:00
@@ -420,7 +420,7 @@
 		LOCK_delayed_insert, LOCK_delayed_status, LOCK_delayed_create,
 		LOCK_crypt, LOCK_bytes_sent, LOCK_bytes_received,
 	        LOCK_global_system_variables,
-		LOCK_user_conn, LOCK_slave_list, LOCK_active_mi;
+		LOCK_user_conn, LOCK_slave_list, LOCK_active_mi, LOCK_tnh;
 #ifdef HAVE_OPENSSL
 pthread_mutex_t LOCK_des_key_file;
 #endif
@@ -2630,6 +2630,7 @@
   (void) pthread_mutex_init(&LOCK_delayed_create,MY_MUTEX_INIT_SLOW);
   (void) pthread_mutex_init(&LOCK_manager,MY_MUTEX_INIT_FAST);
   (void) pthread_mutex_init(&LOCK_crypt,MY_MUTEX_INIT_FAST);
+  (void) pthread_mutex_init(&LOCK_tnh,MY_MUTEX_INIT_FAST);
   (void) pthread_mutex_init(&LOCK_bytes_sent,MY_MUTEX_INIT_FAST);
   (void) pthread_mutex_init(&LOCK_bytes_received,MY_MUTEX_INIT_FAST);
   (void) pthread_mutex_init(&LOCK_user_conn, MY_MUTEX_INIT_FAST);
===== sql_acl.cc 1.171 vs edited =====
--- 1.171/sql/sql_acl.cc	2005-12-28 11:23:21 +03:00
+++ edited/sql_acl.cc	2006-01-01 22:58:37 +03:00
@@ -1980,6 +1980,8 @@
 
 /* Search after a matching grant. Prefer exact grants before not exact ones */
 
+extern pthread_mutex_t LOCK_tnh;
+
 static GRANT_TABLE *table_hash_search(const char *host,const char* ip,
 				      const char *db,
 				      const char *user, const char *tname,
@@ -1989,6 +1991,7 @@
   uint len;
   GRANT_TABLE *grant_table,*found=0;
 
+  pthread_mutex_lock(&LOCK_tnh);
   len  = (uint) (strmov(strmov(strmov(helping,user)+1,db)+1,tname)-helping)+ 1;
   for (grant_table=(GRANT_TABLE*) hash_search(&column_priv_hash,
 					      (byte*) helping,
@@ -2000,7 +2003,10 @@
     if (exact)
     {
       if (compare_hostname(&grant_table->host, host, ip))
-	return grant_table;
+      {
+        found= grant_table;
+        break;
+      }
     }
     else
     {
@@ -2009,6 +2015,7 @@
 	found=grant_table;					// Host ok
     }
   }
+  pthread_mutex_unlock(&LOCK_tnh);
   return found;
 }
 

Looking for a non-locking solution.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/509

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/510

Fixed in 4.1 tree, currently tagged 4.1.17. 
Documentation note: the bug is also possible (although has never been observed so far) in 4.0 and 3.23. The nature of the bug is that under certain configuration of grants, an "Access denied" error could be erroneously returned under high load. The bug does not depend on client or protocol version, only on the configuration of grants.

Entry added to 4.1.17 changelog:

      <listitem>
        <para>
          <literal>Access Denied</literal> error could be erroneously
          returned with specific grant combinations under high load.
          (Bug #7209)
        </para>
      </listitem>

It may be interesting that my particular grant combinations do not exhibit problems with the 4.0 branch (specifically 4.0.26), but did with 4.1+

Same here 4.0.24 no problems 4.1.7 and 4.1.13 both crashed out in minutes of startup.

Is there any planned release date for 4.1.17 ? I'm waiting to finally upgrade my main server. I don't want to build from sources, since would break my package manager's depencendies.

Hi,

4.1.17 is expected to be released within January, not taking account for any unforseen circumstances - so there are still no guarantees on when it will actually be released. 

I would expect to see it by the end of January however.

Best regards

Mark

We appear to have run into this bug when upgrading from 4.0.18 to 5.0.18.  We've upgraded three or four 4.0 installations to 5.0 without problems; the first indication that there was a problem was the following error message:

 execute failed: SELECT command denied to user 'foo'@'bar.wetafx.co.nz' for table 'Baz'

This error was happening very intermittently -- the cron job generating the message runs every five minutes, but we were getting the message at random intervals from a half-hour to an hour apart, and the table that it was denied access to varied from error message to error message. (The tables are all in the same database, and the grants for the 'foo' user have a variety of table and column permissions.)

Soon after, we received a report that one of the user tools that uses these tables was failing intermittently; the user once saw it fail several times in succession, though it eventually succeeded.

On finding this bug report, we tried simplifying the grants for the 'foo' user -- instead of differing column and table permissions, we granted the appropriate permissions to the database as a whole.  This seems to have caused the symptoms to disappear, since we've seen no errors in the last day.  I've tried to use the example code given here, but not had much luck, possibly because there's no detail about the kind of database that they should be run against.

Since the fix to this bug for 4.1.17 is dated after the release date of 5.0.18, I assume that the bug may still be in this release. Is the patch that was developed applicable to 5.0?

Hi,

We've pushed the patch for this bug in to the 5.0 tree for 5.0.19. This should be released within the next couple of weeks. 

Best regards

Mark

Is this bug fixed in 5.0.x? It is not clear from the information in this bug report.

Bug #8002 is set as duplicate of this bug here.