Bug #13659 | Slave DNS name reported incorrectly in master's show processlist | ||
---|---|---|---|
Submitted: | 30 Sep 2005 14:36 | Modified: | 24 Jan 2006 9:01 |
Reporter: | Michael DePhilliips | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server | Severity: | S1 (Critical) |
Version: | 4.1.14 | OS: | Linux (RHEL WS r3 2.4.21-20) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[30 Sep 2005 14:36]
Michael DePhilliips
[30 Sep 2005 14:48]
Valeriy Kravchuk
Thank you for a bug report. Please, send the my.cnf content and SHOW VARIABLES results from your master server. Please, look at the bug reports http://bugs.mysql.com/bug.php?id=13477 and http://bugs.mysql.com/bug.php?id=11958 also. Do you have anything similar in your configuration (like --skip-name-resolve on the server)? Do you have similar problems trying to connect from your slave hosts with mysql client?
[30 Sep 2005 15:20]
Michael DePhilliips
>>Do you have similar problems trying to connect from your slave hosts with mysql client? No
[4 Oct 2005 10:28]
Borja GarcĂa
We are experiencing the same problem under RH 9 2.4.20-8smp #1 SMP using version MySQL-server-4.1.14-0.glibc23.i386.rpm mysql> show processlist\G *************************** 1. row *************************** Id: 184 User: dbrep Host: dbrepdb3:37061 db: NULL Command: Binlog Dump Time: 14364 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 2. row *************************** Id: 185 User: dbrep Host: dbrepdb9:47113 db: NULL Command: Binlog Dump Time: 14364 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 3. row *************************** Id: 186 User: dbrep Host: dbrepdb6:36330 db: NULL Command: Binlog Dump Time: 14364 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 4. row *************************** Id: 187 User: dbrep Host: dbrepdb3:32803 db: NULL Command: Binlog Dump Time: 14364 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 5. row *************************** Id: 188 User: dbrep Host: dbrepdb5:35351 db: NULL Command: Binlog Dump Time: 14364 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 6. row *************************** Id: 189 User: dbrep Host: dbrepdb0:33472 db: NULL Command: Binlog Dump Time: 14364 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 7. row *************************** Id: 190 User: dbrep Host: dbrepdb13:32775 db: NULL Command: Binlog Dump Time: 14364 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 8. row *************************** Id: 191 User: dbrep Host: dbrepdb11:36777 db: NULL Command: Binlog Dump Time: 14364 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 9. row *************************** Id: 192 User: dbrep Host: dbrepdb4:37242 db: NULL Command: Binlog Dump Time: 14364 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 10. row *************************** Id: 193 User: dbrep Host: dbrepdb12:37276 db: NULL Command: Binlog Dump Time: 14364 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 11. row *************************** Id: 220 User: dbrep Host: dbrepdb10:36790 db: NULL Command: Binlog Dump Time: 14362 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 12. row *************************** Id: 221 User: dbrep Host: dbrepdb8:45216 db: NULL Command: Binlog Dump Time: 14362 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 13. row *************************** Id: 222 User: dbrep Host: dbrepdb3:37013 db: NULL Command: Binlog Dump Time: 14362 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 14. row *************************** Id: 223 User: dbrep Host: dbrepdb7:50876 db: NULL Command: Binlog Dump Time: 14362 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL *************************** 15. row *************************** Id: 203427 User: root Host: localhost db: NULL Command: Query Time: 0 State: NULL Info: show processlist 17 rows in set (0.00 sec) dbrepdb3 appears 3 times, but netstat says: netstat |grep ESTABLISHED tcp 0 0 dbw:mysql dbrepdb5:35358 ESTABLISHED tcp 0 0 dbw:mysql dbrepdb9:47131 ESTABLISHED tcp 0 0 dbw:mysql dbrepdb12:37282 ESTABLISHED tcp 0 0 dbw:mysql dbrepdb10:36796 ESTABLISHED tcp 0 0 dbw:mysql dbrepdb2:37015 ESTABLISHED tcp 0 0 dbw:mysql dbrepdb6:36337 ESTABLISHED tcp 0 0 dbw:mysql dbrepdb3:37067 ESTABLISHED tcp 0 0 dbw:mysql dbrepdb1:32805 ESTABLISHED tcp 0 0 dbw:mysql dbrepdb13:32781 ESTABLISHED tcp 0 0 dbw:mysql dbrepdb7:50894 ESTABLISHED tcp 0 0 dbw:mysql dbrepdb8:45234 ESTABLISHED tcp 0 0 dbw:mysql dbrepdb4:37248 ESTABLISHED tcp 0 0 dbw:mysql dbrepdb11:36798 ESTABLISHED tcp 0 0 dbw:mysql dbrepdb0:33474 ESTABLISHED cat /etc/hosts 172.20.254.10 dbrepdb0 172.20.254.234 dbrepdb1 172.20.254.235 dbrepdb2 172.20.254.233 dbrepdb3 172.20.254.20 dbrepdb4 172.20.254.23 dbrepdb5 172.20.254.237 dbrepdb6 172.20.254.134 dbrepdb7 172.20.254.242 dbrepdb8 172.20.254.137 dbrepdb9 172.20.254.242 dbrepdb8 172.20.254.247 dbrepdb10 172.20.254.180 dbrepdb11 172.20.254.21 dbrepdb12 172.20.254.22 dbrepdb13 The clue is the same, consecutive ips..but there are others consecutive ips that works fine, although we have restart the server, it seems the problem is with the same machines. Regards.
[7 Oct 2005 14:22]
Michael DePhilliips
agreed - after a server restart temporarily fixes the problem, it returns to the same machine. Thanks
[20 Oct 2005 8:30]
Andrew Stribblehill
I'm worried that this may be a bug in MySQL's authentication routines. It's more than just the process list: the master grants privileges on its erroneous view of the slave hostname, at least in my case. Client and software here are mysql-{client,server}-4.1.14-1.FC4.1, on AMD64. [root@celeste ~]# mysql -h babar -u repl -p<mumble> ERROR 1045 (28000): Access denied for user 'repl'@'bianca.dur.ac.uk' (using password: YES) [root@babar etc]# host bianca.dur.ac.uk bianca.dur.ac.uk has address 129.234.4.218 [root@babar etc]# host 129.234.4.218 218.4.234.129.in-addr.arpa domain name pointer bianca.dur.ac.uk. [root@babar etc]# host celeste celeste.dur.ac.uk has address 129.234.4.250 [root@babar etc]# host 129.234.4.250 250.4.234.129.in-addr.arpa domain name pointer celeste.dur.ac.uk. [root@celeste log]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:00:1A:1A:3E:3C inet addr:129.234.4.250 Bcast:129.234.255.255 Mask:255.255.0.0 inet6 addr: fe80::200:1aff:fe1a:3e3c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3648047 errors:0 dropped:0 overruns:0 frame:0 TX packets:1965491 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:793207614 (756.4 MiB) TX bytes:150580945 (143.6 MiB) Interrupt:169 eth0:0 Link encap:Ethernet HWaddr 00:00:1A:1A:3E:3C inet addr:129.234.4.181 Bcast:129.234.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:169 eth1 Link encap:Ethernet HWaddr 00:00:1A:1A:3E:3B inet addr:10.0.0.2 Bcast:10.0.0.255 Mask:255.255.255.0 inet6 addr: fe80::200:1aff:fe1a:3e3b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:46948 errors:0 dropped:0 overruns:0 frame:0 TX packets:47096 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:9151453 (8.7 MiB) TX bytes:9319626 (8.8 MiB) Interrupt:177
[1 Nov 2005 12:01]
Andrew Stribblehill
Keyword: SECURITY Maybe you should escalate this bug report. Having a host misidentified as a different one could easily have serious security concerns, especially if a systematic way to exploit it were found.
[1 Nov 2005 13:17]
Valeriy Kravchuk
I had increased the severity of this report. Please, inform about the versions of glibc you use. Do you have nscd (Name Service Cache Daemon) installed? What version, if any?
[1 Nov 2005 15:07]
Andrew Stribblehill
# uname -a Linux babar 2.6.13-1.1532_FC4smp #1 SMP Thu Oct 20 01:42:06 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux # rpm -q glibc glibc-2.3.5-10.3 glibc-2.3.5-10.3 (2*glibc -- one for 64-bit and the other for 32-bit legacy stuff) # nscd -V nscd (GNU libc) 2.3.5 Copyright (C) 2005 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Thorsten Kukuk and Ulrich Drepper. I turned off nscd for a while but the problem still exhibited itself. Thanks.
[1 Nov 2005 15:36]
Michael DePhilliips
Thanks for upgrading this. nscd (GNU libc) 2.3.2 glibc-2.3.2-95.37
[4 Nov 2005 9:25]
Valeriy Kravchuk
All reporters: Please, try to start your servers with --skip-host-cache and run them for a while. Send a note here immediately if you will see the same problem after that.
[4 Nov 2005 20:43]
Michael DePhilliips
Hi, I restarted with --skip-host-cache, after about 4 hours the problem seems to be resolved. The problem has been temporarily fixed with a restart, but usualy shows up prior to this amount of time. I will add a post tomorrow if it shows up again over night. Thanks, Michael
[5 Nov 2005 0:04]
Andrew Stribblehill
Restarted with --skip-host-cache; problem went away. Then restarted without --skip-host-cache; problem returned immediately. Then restarted with --skip-host-cache: no noticed problem for 2 hours.
[5 Nov 2005 9:05]
Valeriy Kravchuk
Please, monitor your server for more time. Reopen this report as soon as you get this problem. If this workaround helps you, than it is not a MySQL bug. I'll try to provide you with the explanation why...
[9 Nov 2005 17:02]
Andrew Stribblehill
Workaround still seems to hold. Can you explain why you think it's not a bug in MySQL? To me, it would seem that if it works okay when we turn off the host cache, the host cache is most likely to blame.
[9 Nov 2005 17:53]
Valeriy Kravchuk
Sorry for misinforming you. It really looks like a bug in MySQL itself. So, while using this workaround, describe your dns setup, send the content of the /etc/host.conf, /etc/hosts , nscd.conf, nsswitch.conf, resolv.conf (if any). As the problem occured at various sites, I ask you all for this information. It can help me to create a repeatable test case.
[17 Nov 2005 18:21]
Michael DePhilliips
Sorry for the delayed response. +++++++++++++++++++++++++++++++++++++++++++++ /etc/host.conf order hosts,bind +++++++++++++++++++++++++++++++++++++++ /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 localhost.localdomain localhost 130.199.88.103 robinson.star.bnl.gov ++++++++++++++++++++++++++++++++++++++++++++ /etc/nscd.conf # # An example Name Service Cache config file. This file is needed by nscd. # # Legal entries are: # # logfile <file> # debug-level <level> # threads <#threads to use> # server-user <user to run server as instead of root> # server-user is ignored if nscd is started with -S parameters # stat-user <user who is allowed to request statistics> # # enable-cache <service> <yes|no> # positive-time-to-live <service> <time in seconds> # negative-time-to-live <service> <time in seconds> # suggested-size <service> <prime number> # check-files <service> <yes|no> # # Currently supported cache names (services): passwd, group, hosts # # logfile /var/log/nscd.log # threads 6 server-user nscd # stat-user nocpulse debug-level 0 enable-cache passwd yes positive-time-to-live passwd 600 negative-time-to-live passwd 20 suggested-size passwd 211 check-files passwd yes enable-cache group yes positive-time-to-live group 3600 negative-time-to-live group 60 suggested-size group 211 check-files group yes enable-cache hosts yes positive-time-to-live hosts 3600 negative-time-to-live hosts 20 suggested-size hosts 211 check-files hosts yes +++++++++++++++++++++++++++++++++++++++++++++++++ # /etc/nsswitch.conf # # An example Name Service Switch config file. This file should be # sorted with the most-used services at the beginning. # # The entry '[NOTFOUND=return]' means that the search for an # entry should stop if the search in the previous entry turned # up nothing. Note that if the search failed due to some other reason # (like no NIS server responding) then the search continues with the # next entry. # # Legal entries are: # # nisplus or nis+ Use NIS+ (NIS version 3) # nis or yp Use NIS (NIS version 2), also called YP # dns Use DNS (Domain Name Service) # files Use the local files # db Use the local database (.db) files # compat Use NIS on compat mode # hesiod Use Hesiod for user lookups # [NOTFOUND=return] Stop searching if not found so far # # To use db, put the "db" in front of "files" for entries you want to be # looked up first in the databases # # Example: #passwd: db files nisplus nis #shadow: db files nisplus nis #group: db files nisplus nis passwd: files shadow: files group: files #hosts: db files nisplus nis dns hosts: files dns # Example - obey only what nisplus tells us... #services: nisplus [NOTFOUND=return] files #networks: nisplus [NOTFOUND=return] files #protocols: nisplus [NOTFOUND=return] files #rpc: nisplus [NOTFOUND=return] files #ethers: nisplus [NOTFOUND=return] files #netmasks: nisplus [NOTFOUND=return] files bootparams: nisplus [NOTFOUND=return] files ethers: files netmasks: files networks: files protocols: files rpc: files services: files netgroup: files publickey: nisplus automount: files aliases: files nisplus +++++++++++++++++++++++++ /etc/resolv.conf search star.bnl.gov nameserver 130.199.1.1 nameserver 130.199.128.31
[2 Dec 2005 19:50]
Valeriy Kravchuk
So, because --skip-host-cache is the workaroun, the problem is really in the following code (sql/hostname.cc, line 150 in latest 4.1.17-BK sources): /* Check first if we have name in cache */ if (!(specialflag & SPECIAL_NO_HOST_CACHE)) { VOID(pthread_mutex_lock(&hostname_cache->lock)); if ((entry=(host_entry*) hostname_cache->search((gptr) &in->s_addr,0))) { char *name; if (!entry->hostname) name=0; // Don't allow connection else name=my_strdup(entry->hostname,MYF(0)); *errors= entry->errors; VOID(pthread_mutex_unlock(&hostname_cache->lock)); DBUG_RETURN(name); } VOID(pthread_mutex_unlock(&hostname_cache->lock)); } It was stated in the report that bug never occured before. Please, look at the (related) bug #10931. Looks like hostname cache had not been used for a long time, until that bug was fixed in 4.1.13. Than part of code is executed each and every time MySQL resolves ip addresses to hosnames, if --skip-host-cache is not used. So, people insisting that the problem goes far beyond the SHOW PROCESSLIST results for slave servers, may be quite right. Looks like there may be some kind of a race condition that is simply easier visible when replication is used. So, I came back to my initial idea, that the problem may be shown without replication at all, with several clients being sent queries at a high rate and SHOW PROCESSLIST being executed repeatedly. I am trying to create a test case based on these ideas. What do you think about all these? Yet another thing I want to know: is anybody of the reportes get this bug on singe CPU machine? I saw that smp in the uname -a results, but just to be sure.
[6 Dec 2005 17:11]
Andrew Stribblehill
It certainly seems that the most likely broken part is hostname.cc, somewhere near the ip_to_hostname function. By the way, I'm not convinced about the '// Don't allow connection' comment.
[17 Dec 2005 18:14]
Valeriy Kravchuk
Dear bug reporters (there are many)! Thank you for your patience. I was finally able to pinpoint and verify this weird bug. See bug #15756 for details. I'd mark this one as a duplicate, if you agree. The bug has nothing to do with replication. It is related to hostnames chaching only, and it is easily demonstratable, really, as soon as you are "unlucky" enough to get IP-addresses from some range. As and additional temporary workaround, please, avoid IP-addresses with 206, 207, 232, 233, 234, 235 and some other numbers in dot notation. The bug had not influenced versions before 4.1.14 because hostname cache was effectively bypassed in them because of the other bug, later fixed.
[24 Jan 2006 9:01]
Ramil Kalimullin
See #15756: incorrect ip address matching in ACL due to use of latin1 collation