Bug #28359 Intermitted lost connection at 'reading authorization packet' errors
Submitted: 10 May 2007 16:25 Modified: 25 Oct 2007 19:24
Reporter: Jon Ribbens Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Errors Severity:S2 (Serious)
Version:5.0.41 OS:Linux (RedHat Enterprise Linux ES release 4)
Assigned to: Magnus Blåudd
Tags: 64-bit, authorization, linux, Lost connection

[10 May 2007 16:25] Jon Ribbens
Description:
We are using MySQL 5.0.41, connecting via TCP/IP to the database server. Intermittently we are getting:

  OperationalError: (2013, "Lost connection to MySQL server at 'reading authorization packet', system error: 0")

The servers are 64-bit, using the MySQL-server-community-5.0.41-0.rhel4.x86_64.rpm binary install package provided by yourselves.

The clients are 32-bit, using the MySQL-client-community-5.0.41-0.rhel4.i386.rpm binary install package provided by yourselves. We are using Python and MySQL_python-1.2.2.

I do not believe this is likely to be a networking issue - by the time the connection setup is at the 'reading authorization packet' stage, the TCP/IP connection is already established and has been successfully used for two-way communication. I also don't see how it can be a configuration problem - the server works fine 99% of the time, and, if a particular connection initialises successfully it will then work 100% reliably for many weeks without problem.

How to repeat:
Use our web site a lot.

Suggested fix:
Presumably some sort of bug or race-condition in the authentication code on the server?
[10 May 2007 16:30] Jon Ribbens
Oh, I forgot to mention, if you google for "reading authorization packet" you'll see thousands of sites where this intermittent error has leaked into the site content (i.e. usually a PHP site), which provides further evidence it's a general problem/bug and not just something weird about our setup.
[10 May 2007 19:28] Jon Ribbens
I did some debugging with --debug, below is a partial log.
It shows a single thread from connection to where it dies.
As you can see it is dying because of EINTR on read().
What I don't get is (a) why this doesn't happen at any time other than during connection setup (connections once established are rock-solid), and (b) why MySQLd isn't coping with this, since it's completely standard and expected to get EINTR from read().

T@10874248: ?func: info: New connection received on TCP/IP (70)
T@10874248: vio_peer_addr: enter: sd: 70
T@10874248: vio_peer_addr: exit: addr: 192.168.1.74
T@10874248: my_malloc: my: size: 13  my_flags: 0
T@10874248: my_malloc: exit: ptr: 0xf5d5a0
T@10874248: ?func: info: Host: unknown host  ip: 192.168.1.74
T@10874248: vio_keepalive: enter: sd: 70  set_keep_alive: 1
T@10874248: net_write_command: enter: length: 67
T@10874248: vio_is_blocking: exit: 0
T@10874248: vio_write: enter: sd: 70  buf: 0x10016e0  size: 72
T@10874248: vio_write: exit: 72
T@10874248: vio_is_blocking: exit: 0
T@10874248: vio_read: enter: sd: 70  buf: 0x10016e0  size: 4
T@10874248: vio_read: vio_error: Got error 11 during read
T@10874248: vio_read: exit: -1
T@10874248: ?func: info: vio_read returned -1,  errno: 11
T@10874248: thr_alarm: enter: thread: T@10874248  sec: 5
T@10874248: vio_blocking: enter: set_blocking_mode: 1  old_mode: 0
T@10874248: vio_blocking: exit: 0
T@10874248: vio_read: enter: sd: 70  buf: 0x10016e0  size: 4
T@10874248: vio_read: vio_error: Got error 4 during read
T@10874248: vio_read: exit: -1
T@10874248: ?func: info: vio_read returned -1,  errno: 4
T@10874248: ?func: error: Couldn't read packet: remain: 4  errno: 4  length: -1
T@10874248: vio_blocking: enter: set_blocking_mode: 0  old_mode: 1
T@10874248: vio_blocking: exit: 0
T@10874248: net_printf_error: enter: message: 1043
T@10874248: vio_is_blocking: exit: 0
T@10874248: push_warning: enter: code: 1043, msg: Bad handshake
T@10874248: close_connection: enter: fd: TCP/IP (70)  error: ''
[13 May 2007 14:00] Jeff C
I get this often as well.

I'm on 32bit.  I noticed it happening when using 5.1 clients going to 5.0.  I haven't seen 5.0->5.0 clients producing this error as of yet... 

Jon Ribbens, are your clients 5.1 going to 5.0 as well?  

I hope they fix this bug as fast as possible as this bug hurts mysql's image.  All the errors being generated thru php (search google and stuff)... most people will get a bad view of Mysql when it is great!
[13 May 2007 14:37] Jon Ribbens
Our clients were 5.0.27 talking to 5.0.27 servers. I upgraded to see if this would fix the problem (which it didn't) - they are now 5.0.41 clients talking to 5.0.41 servers.
[13 May 2007 17:39] Jon Ribbens
I have discovered what's going on, sort of. The login is sometimes taking over 5 seconds, and therefore it is timing out. Although the MySQL documentation says that this should produce a "Bad handshake" error on the client, it doesn't - the server just drops the connection.

I have worked-around the problem by increasing the connect_timeout global variable to 30 seconds.

What I do not yet know is why the login is taking over 5 seconds. Neither our client machine or server machine are overloaded, the logins should be completing practically instantaneously.
[13 May 2007 18:59] Jeff C
Well at least you found a semi-solution that works.  I hope they figure it out soon.  I've added 30 seconds into connect_timeout...

Thanks for the tip Jon.
[6 Jun 2007 18:31] Gabriel Barazer
I got this error many times on a moderately loaded server as well.
The error string and code (0) isn't clear at all : Reading the source related to  these errors shows some tests during the authentication process to the mysql server. See sql-common/client.c:2313 to 2321 : Error "reading authorization packet" is returned when mysql->net.last_errno == CR_SERVER_LOST, which by the constant CR_SERVER_LOST, obviously means that the server has dropped the connection. Reasons for this can be various, but I think it has something to do with timeouts during authentication on the server side. As wrote Jon, the problem might be a bug in the network error handling (which could explain the randomness of this bug).

Temporary workaround is indeed setting a long timeout, to avoid the server dropping the connection. I will check if this is enough to make this bug disappear. But there is clearly something to do on the server code to avoid that. (somewhere next to sql/sql_parse.cc:1143 , version 5.0.41)
[9 Aug 2007 21:52] Sveta Smirnova
Thankk you for the report.

Error 2013, "Lost connection to MySQL server" is client error, not server. Even if server sent Bad Handshake this means client didn't receive anything. Also there is "T@10874248: push_warning: enter: code: 1043, msg: Bad handshake" in the server stack trace. Connection outage seems to be somewhere in your environment.

So I don't see bug here. Please explain what is wrong?
[28 Aug 2007 9:56] Jon Ribbens
Sveta, firstly the documentation says you will get the 'Bad handshake' error, and this is false. You will never see this error. So that's either a bug in the code or a mistake in the documentation. The only way I could find to diagnose this problem was to either 'gdb' the mysql server binary, or to run the 'debug' mysql server and try and use the completely-undocumented debug macros. (The "T@10874248" stuff you mention is not in the 'server stack trace', it is in the debug macro output, and only if you know how to use that, which I suspect essentially no non-mysql-employees do.)

Secondly, I explained in my original report why the problem is unlikely to be caused by the network. Please explain why you say "Connection
outage seems to be somewhere in your environment" when all the evidence seems to point to the contrary.
[28 Aug 2007 13:41] Sveta Smirnova
Thank you for the feedback.

The manual says "The number of seconds that the mysqld  server waits for a connect packet before responding with Bad handshake" at http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#option_mysqld_connect_... which means server will send this error and nothing about client will receive this error. If you mind other quote from manual please provide exact quote and URL.

Also how will client receive any error from server if it lost connection?

> Secondly, I explained in my original report why the problem is unlikely
> to be caused by the network. Please explain why you say "Connection
> outage seems to be somewhere in your environment" when all the evidence
> seems to point to the contrary.

You said "Temporary workaround is indeed setting a long timeout, to avoid the
server dropping the connection." Additionally you get "lost connection" error. This shows something in your environment make network connection between MySQL client and server slow and buggy sometimes.

So I don't see MySQL bug here. If you will be able to provide test case which shows problem is MySQL code and not environment which makes network connection slow and/or buggy feel free to provide the test case and reopen the report.
[28 Aug 2007 16:21] Jon Ribbens
> The manual says "The number of seconds that the mysqld  server waits
> for a connect packet before responding with Bad handshake" 

Indeed. What do you think the word 'respond' means?

> Also how will client receive any error from server if it lost connection?

It didn't lose the connection, the server deliberately dropped it. The server could quite easily send an error before dropping the connection.

> Additionally you get "lost connection" error. This shows something in your
> environment make network connection between MySQL client and server slow
> and buggy sometimes.

"Lost connection" shows nothing except that the server has deliberately dropped the connection. I have already explained why the problem does not appear to be the network.
[28 Aug 2007 16:23] Jon Ribbens
I am setting the bug back to 'open' because, at the very least, the "Bad handshake" thing is a documentation error.
[31 Aug 2007 9:21] Magnus Blåudd
Hi,

will try to explain how a connection to the MySQL Server works. You have provided a very nice tracefile that show what happens in the server when the client get the error "2013 Lost connection to MySQL server at 'reading authorization
packet', system error: 0" and I'll comment on that one directly.

It actually starts before the beginning of the trace file where the server accepts the connection and starts a new thread to handle this incoming connection. It will set the network read and write timeouts to the value of "connect_timeout", which is 5 seconds by default in order to quickly be able to disconnect clients that don't respond(try telnet to the mysqld port...). Then call 'check_connection' in sql_parse.cc

> T@10874248: ?func: info: New connection received on TCP/IP (70)
> T@10874248: vio_peer_addr: enter: sd: 70
> T@10874248: vio_peer_addr: exit: addr: 192.168.1.74

The connection is from 192.168.1.74, we use 'vio_peer_addr' to ask the network about that.

> T@10874248: my_malloc: my: size: 13  my_flags: 0
> T@10874248: my_malloc: exit: ptr: 0xf5d5a0
> T@10874248: ?func: info: Host: unknown host  ip: 192.168.1.74
> T@10874248: vio_keepalive: enter: sd: 70  set_keep_alive: 1

Call to 'vio_keep_alive', this shows that the ip was allowed to connect du to ACL lists. At sql_parse.cc:880

> T@10874248: net_write_command: enter: length: 67
> T@10874248: vio_is_blocking: exit: 0
> T@10874248: vio_write: enter: sd: 70  buf: 0x10016e0  size: 72
> T@10874248: vio_write: exit: 72

The server builds the "initial greeting message" and sends it to the client.

> T@10874248: vio_is_blocking: exit: 0
> T@10874248: vio_read: enter: sd: 70  buf: 0x10016e0  size: 4
> T@10874248: vio_read: vio_error: Got error 11 during read
> T@10874248: vio_read: exit: -1
> T@10874248: ?func: info: vio_read returned -1,  errno: 11

The server starts to read from the client in non blocking mode just to see if there is anything there. errno 11 shows no data was available at this time.

> T@10874248: thr_alarm: enter: thread: T@10874248  sec: 5

Setup alarm that will break the below read in blocking mode after 5 seconds.

> T@10874248: vio_blocking: enter: set_blocking_mode: 1  old_mode: 0
> T@10874248: vio_blocking: exit: 0

Turn on blocking mode.

> T@10874248: vio_read: enter: sd: 70  buf: 0x10016e0  size: 4
> T@10874248: vio_read: vio_error: Got error 4 during read
> T@10874248: vio_read: exit: -1
> T@10874248: ?func: info: vio_read returned -1,  errno: 4
> T@10874248: ?func: error: Couldn't read packet: remain: 4  errno: 4  length: -1

Read from the client. Fails after 5 seconds because the client hasn't said anything(or at least it hasn't yet reached the server). This is "such a serious error" that the server decides not to write anything more to the socket, to enforce this it set the flag "net->error= 2"(this can be seen in function 'my_real_read' in net_serv.cc). This flag has the effect that even if the server calls a function that writes to the net it will be thrown away.

Even if the server would write to the client(which might not exist anymore), the client would most likely be out of sync. Remember that packets can be "in the air" as if throwing a ball back and forth.

> T@10874248: vio_blocking: enter: set_blocking_mode: 0  old_mode: 1
> T@10874248: vio_blocking: exit: 0
> T@10874248: net_printf_error: enter: message: 1043

Write the "Bad handshake" error message to the client. But since "net->error" is 2 it will be thrown away and never sent. Sorry!

> T@10874248: vio_is_blocking: exit: 0
> T@10874248: push_warning: enter: code: 1043, msg: Bad handshake
> T@10874248: close_connection: enter: fd: TCP/IP (70)  error: 

Finally close the connection to the client. If the client is still there as in this case, it will be in the state of reading from the network/server expecting an answer to it's "Login request" that it has sent(although the server didn't get it in time). So instead of that reply it'll get an error since the server closed the connection and thus will display 2013 indicating "The client didn't get an error when writing to the server, but it didn't get a full answer (or any answer) to the question."

How can we then determine this is not a server bug, after all the "thr_alarm" function could be buggy in some way and thus breaking the read from the client before the 5 seconds has passed. Or something...

1. Use wireshark/etheral to log the connection between the client and the server. Since it will include the time of each packet it receive that will effectively show how long the server have waited before seding out the socket reset(a RST packet). Wireshark is very nice since it also knows how to interpret the MySQL protocol. If you are running on the default port 3306 it will interpret automatically, otherwise use "Decode as.." and set the port being used to MySQL.

Example trace(reduced):
Time        From/To                         What   
0.000251    127.0.0.1:3306->127.0.0.1:10508 MySQL Server Greeting proto=10 version=5.0.50-debug-log
0.004448    127.0.0.1:10508<-127.0.0.1:3306 MySQL Login Request user=root db=test
0.004586    127.0.0.1:3306->127.0.0.1:10508 MySQL Response OK

2. Insert sleep(10) in the appropriate place in server or client code.
 - When doing this in the client code, I get exactly the error described here.
 - When doing it in the server code, it's not possible to get any error at all
   since the client is reading with almost infinite timeout.

3. Check the value of "aborted_connections" by using "SHOW STATUS like 'aborted_connections'"  - it should have increased with one after the client got this error.

Can the server send ER_BAD_HANDSHAKE as it says in the manual? Yes, it will do
that in case of other errors during this handshake phase(for example if the
length of the clients Login request is wrong). But most likely _never_ in the case
where "connect_timeout" has expired - since it's "such a serious error" ;)

Why don't we have a higher "connect_timeout" value by default? 5 seconds is fairly low in
order to be "secure by default" - the low value allows spurious connects to that port to be
discarded quickly and free up that thread to do some useful work. But maybe 5 seconds is too
low?
[31 Aug 2007 9:22] Magnus Blåudd
I'll recommend the manual be updated. We really don't want our users have
to produce tracefiles or use wireshark in order to find out they need to increase the
"connect_timeout" value. 

An error message on the client saying "2013 Lost connection to MySQL Server at '<something>', system error: <errno>" most likely indicates the the "connect_timeout" value is to low on the server. There is also "2013 Lost connection to MySQL Server during query" but that is a _different_ problem.
[21 Sep 2007 14:59] Magnus Blåudd
Talked with Monty about this for an explanation why we have such a low connect_timeout by default. It's primarily a security feature i.e if there are "clients" connecting to the server without doing anything, they will only block the server(i.e one thread in the server) for the 5 seconds.

Since quite a lot of these error messages show up in a web search, we will increase the default value to 10 seconds.

Increasing the connect_timeout value in the servers my.cnf is recommended if this 
problem occurs frequently. It does not really make the server insecure.

Check the value of "aborted_connections" by using "SHOW STATUS like
'aborted_connections'"  to easily determine this is the problem you experienc, it will increase with one for each connection the server aborts.
[21 Sep 2007 15:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/34456

ChangeSet@1.2526, 2007-09-21 17:10:45+02:00, msvensson@shellback.(none) +1 -0
  Bug#28359 Intermitted lost connection at 'reading authorization packet' errors
   - Increase default 'connect_timeout' value to 10 seconds
[18 Oct 2007 21:35] Bugs System
Pushed into 5.1.23-beta
[18 Oct 2007 21:37] Bugs System
Pushed into 5.0.52
[25 Oct 2007 19:24] Paul Dubois
Noted in 5.0.52, 5.1.23 changelogs.

The default value of the connect_timeout system variable was
increased from 5 to 10 seconds. This might help in cases where
clients frequently encounter errors of the form Lost connection to
MySQL server at 'XXX', system error: errno. 

Also will add a new section about the "Lost connection" error at:

http://dev.mysql.com/doc/refman/5.0/en/common-errors.html
[9 Dec 2007 7:00] Ivan Pupkin
Hi!

I'd like to comment this issue. U think the problem is more complicated.

I have several 5.0.38 mysql servers. On 32 bit platform and on 64 bit platform.

On 64-bit platform i've noticed those problems You describe - a random connection lost to mysql server.

The encreased timeout helped, but there is at least one issue when it doesn't help.

When i'm copying a large file from the mysql server with rsync protocol - it's impossible to connect to mysql server - the connection lost appears.

I was dumping traffic with tcpdump - and noticed a lot of packets with checksum error.

I must add that this doesn't happen when You are loading the network with another protocols (at least that i tried didnt).

And this issue does not happen on 32 bit pltform.

If You are able to do this please try to reproduce it, the keys are:

1) 64 bit linux
2) copying a large file (>10Gb) with rsync from mysql server
3) maybe master-slave replication (my servers are in master-slave replication)
[25 Sep 2008 16:48] Christian Deligant
I have a very similar bug since I upgraded from Mysql $ to Mysql 5... Client and server are on the same machine, the PHP logs in with "localhost"

Never had problems with 4, apart of too many connection errors from times to other due to a very busy server...

Hope this helps!
[4 Oct 2008 2:47] Dan Franklin
Some commenters said to use a show status command, but they gave the incorrect command.  It is
  show status like 'aborted_connects'
not "aborted_connections".

This is the case for 5.0.45, anyway.
[20 Oct 2008 8:21] Ondrej Brablc
We thought that the problem might be in 32-bit client and 64-bit server too, we have even increased the connection timeout to 60 seconds, but this helped only partially. The final solution in our case was to switch the ethernet card from 100Mb/s to 1000Mb/s mode. So just in case you found this bug and have similar problems, and increasing timeout does not help, use ethtool to check your card mode and test whether switching to 1000Mb/s would not help in your case as well.
[21 Oct 2008 15:53] Magnus Blåudd
Ever tried with wireshark trying to see how the network packets looked like? Could the first NIC have been faulty?
[5 Apr 2009 15:54] Rent Tor
Oh my God!
I've been looking for the cause of one of my client's problems with MySQL for over a year now, just to find out through this bug report I stumbled upon by pure chance that it's just the 64bit version of MySQL that's broken.

Wow. But what really makes me wonder are two things:

First, why doesn't MySQL aknowledge this as a bug but instead chooses to use the band aid in the form of increased timeouts? On my client's system we upped the connect_timeout to a ridiculous 240 seconds - with the problem still prevailing. Heck, upping the timeout is like adding a second tank when your car starts leaking gas. You'll get the sime mileage from one fillup - it's just a hell of a lot more expensive to fill the tank. Why not plumb the leak?

Second: I've been on this issue not "on my own" but thanks to my client having invested in a PLATINUM support contract, I had "help" from MySQL. The quality of which can essentially be seen here: "Must be a network problem.", "Not a MySQL fault.", "Check your my.cnf.", "The server is doing everything right there.". Not one single mentioning of this bug report which would have helped me tremendously to recognize we were chasing ghosts in the mist. I even offered live access to the servers affected but was turned down each and every time.

By now I "fixed" the problem by patching up PHP - which is the major web app platform at my client - so that it'll open connections to the MySQL database servers with the Auto_Reconnect flag set. *This* finally fixed the problems. Still this is less than ideal but then it's better than a web app that keeps crashing because of a bug MySQL obviously neglects to even ack as a bug and therefore won't fix. Sad, sad, sad.

I actually believe that the bug could be found and consequentually fixed very easily. Since it's only 64bit systems that are affected, this reeks a lot like a varibale overflow/underflow or an unclean initialization somewhere. Most likely in the port setup. Shouldn't be that hard to find. Just by looking at the startup messages of the 64bit mysqld one can see that the source is everything but 64bit aware anyway.
[6 Apr 2009 9:17] Magnus Blåudd
I hope you carefully read my previous explanation from "31 Aug 2007 11:21" about the default value of "connect_timeout" being set quite low in MySQL Server as well as describing how a connect works.

1. Would be very interesting to see a trace of a failed connect with for example wireshark.

2. Have you run the query "SHOW STATUS LIKE 'aborted_connections'" on your server? This would tell if it is the mysqld that get a read timeout during connect and thus closes the new connection and increments that counter.

3. What version of MySQL are you using?

4. Yes, "reconnect" has been off by default for quite some time now.
[6 Apr 2009 9:45] Rent Tor
> I hope you carefully read my previous explanation from "31 Aug 2007 11:21"
> about the default value of "connect_timeout" being set quite low in MySQL 
> Server as well as describing how a connect works.

Yes, I indeed did and I found it very interesting. 
However I have the feeling that you missed my statement about we having had connect_timeout set as high as 240 seconds without this making any noticeable difference.

> 1. Would be very interesting to see a trace of a failed connect with for
> example wireshark.

I will see whether I can provide that. Unfortunatley the servers are hit by some millions of requests a day so the log may be quite large and a bit hard to read. In case you've got any information as to how to decrease the trace size, let me know.

> 2. Have you run the query "SHOW STATUS LIKE 'aborted_connections'" on your
> server? This would tell if it is the mysqld that get a read timeout during 
> connect and thus closes the new connection and increments that counter.

As mentioned in the comments above and as you've therefore surely read it's "SHOW STATUS LIKE 'aborted_connects'" - not connections. Anyway, of course I did.

From 11:31am my local time today:

mysql> show status like 'abort%';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| Aborted_clients  | 199461 | 
| Aborted_connects | 59490  | 
+------------------+--------+
2 rows in set (0.05 sec)
mysql> 

From 11:34am my local time:
mysql> show status like 'abort%';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| Aborted_clients  | 199501 | 
| Aborted_connects | 59495  | 
+------------------+--------+
2 rows in set (0.05 sec)

As you can see not only the aborted_connects increase but the server also aborts clients which is another, as of yet unresolved problem, my client encounters.

> What version of MySQL are you using?
Oh yes. That question again ;) Sorry, that's really not your fault, but by now I'm quite sick and tired of it. That's because I've heared enough "Wait for next release" statements already. Anyway:

mysql> show variables like 'version';
+---------------+---------------------------+
| Variable_name | Value                     |
+---------------+---------------------------+
| version       | 5.0.72-enterprise-gpl-log | 
+---------------+---------------------------+

And yes, I might even consider upgrading yet again as I did the last 5 to 7 times. But since the problem didn't go away the last times, chances are it won't go away this time, too.

> Yes, "reconnect" has been off by default for quite some time now.
To my knowledge, it has never been active in PHP but I may well be wrong there.
[6 Apr 2009 10:59] Rent Tor
From 12:56pm:

+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| Aborted_clients  | 200198 | 
| Aborted_connects | 62007  | 
+------------------+--------+
2 rows in set (0.24 sec)
[6 Apr 2009 11:01] Rent Tor
Operating System is RedHat Linux Enterprise Server 5.

Linux idgverl-li43.privatedomain 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 07:18:46 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
[3 Jul 2009 17:34] mike mike
I'm having this issue with the latest MySQL server from Ubuntu 64-bit Jaunty

mysql  Ver 14.12 Distrib 5.0.75, for debian-linux-gnu (x86_64) using readline 5.2

[root@sql02 ~]# dpkg -l | grep mysql
ii  libdbd-mysql-perl                                4.008-1                          A Perl5 database interface to the MySQL data
ii  libmysqlclient15-dev                             5.1.30really5.0.75-0ubuntu10.2   MySQL database development files
ii  libmysqlclient15off                              5.1.30really5.0.75-0ubuntu10.2   MySQL database client library
ii  mysql-client                                     5.1.30really5.0.75-0ubuntu10.2   MySQL database client (metapackage depending
ii  mysql-client-5.0                                 5.1.30really5.0.75-0ubuntu10.2   MySQL database client binaries
ii  mysql-common                                     5.1.30really5.0.75-0ubuntu10.2   MySQL database common files
ii  mysql-server                                     5.1.30really5.0.75-0ubuntu10.2   MySQL database server (metapackage depending
ii  mysql-server-5.0                                 5.1.30really5.0.75-0ubuntu10.2   MySQL database server binaries
ii  mysql-server-core-5.0                            5.1.30really5.0.75-0ubuntu10.2   MySQL database core server files

It occurs maybe once per day. I have ~ 1000 queries per second on average coming from 4 different machines over TCP and batch jobs running locally over unix socket, and only the unix socket fails - again, only maybe once a day, but it can be an important job to miss if it doesn't work.

I've tried adding in PHP mysqli options to increase the connection timeout, it didn't help.
[7 Aug 2009 20:54] Andrey Vasilishin
I have the same problem.
FATAL ERROR: Connection to database server failed. 
MYSQL Error:Lost connection to MySQL server at 'reading authorization packet', system error: 0

tcpdump show, that packets run between servers
> tcpdump -n -p -v -tttt host 213.186.119.2 and '77.47.134.11 and ( tcp port 3306 )'
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
2009-08-07 23:46:21.436795 IP (tos 0x0, ttl 64, id 44592, offset 0, flags [DF], proto TCP (6), length 60)
    213.186.119.2.40381 > 77.47.134.11.3306: Flags [S], cksum 0x4dfc (correct), seq 2473162537, win 5840, options [mss 1460,sackOK,TS val 887740697 ecr 0,nop,wscale 6], length 0
2009-08-07 23:46:21.438827 IP (tos 0x10, ttl 61, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    77.47.134.11.3306 > 213.186.119.2.40381: Flags [S.], cksum 0xac4b (correct), seq 868070102, ack 2473162538, win 5792, options [mss 1460,sackOK,TS val 995262441 ecr 887740697,nop,wscale 6], length 0
2009-08-07 23:46:21.473055 IP (tos 0x0, ttl 64, id 44593, offset 0, flags [DF], proto TCP (6), length 52)
    213.186.119.2.40381 > 77.47.134.11.3306: Flags [.], cksum 0xf13a (correct), ack 1, win 92, options [nop,nop,TS val 887740729 ecr 995262441], length 0
2009-08-07 23:46:21.475382 IP (tos 0x10, ttl 61, id 13018, offset 0, flags [DF], proto TCP (6), length 112)
    77.47.134.11.3306 > 213.186.119.2.40381: Flags [P.], cksum 0x1914 (correct), seq 1:61, ack 1, win 91, options [nop,nop,TS val 995262477 ecr 887740729], length 60
2009-08-07 23:46:21.509805 IP (tos 0x0, ttl 64, id 44594, offset 0, flags [DF], proto TCP (6), length 52)
    213.186.119.2.40381 > 77.47.134.11.3306: Flags [.], cksum 0xf0b5 (correct), ack 61, win 92, options [nop,nop,TS val 887740766 ecr 995262477], length 0

mysql> show status like 'aborted%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Aborted_clients  | 146   |
| Aborted_connects | 399   |
+------------------+-------+
2 rows in set (0.21 sec)

mysql> show variables like 'connect%';
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    999
Current database: *** NONE ***

+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| connect_timeout | 60    |
+-----------------+-------+
1 row in set (0.08 sec)

mysql> show variables like 'version%';
+-------------------------+---------------------------+
| Variable_name           | Value                     |
+-------------------------+---------------------------+
| version                 | 5.0.83-log                |
| version_comment         | Gentoo Linux mysql-5.0.83 |
| version_compile_machine | i686                      |
| version_compile_os      | pc-linux-gnu              |
+-------------------------+---------------------------+
4 rows in set (0.04 sec)

This bug only with remote servers, with localhost no such problem.
[7 Aug 2009 20:59] mike mike
Mine is actually the opposite.

Remote connections don't seem to have this issue but localhost does. Forcing socket and forcing TCP1
[14 Aug 2009 12:32] Andrey Vasilishin
If it can help resolve the bug, I installed the cacti, that's show me, that server use in average 923 Mbis/sec and 997 Mbit/sec in peak on the 1 Gbit/sec Ethernet interface. That's why i think, that problem in the limits of the channels between mysql-server and client.
[12 Nov 2009 13:37] Abhijit Shanbhag
I had the same issue, I am running Microsoft Vista. The Hardware setup was Two Vista PC's were connected to the DLINK Switch . ADSL Modem is also connected to the switch. I encountered this same problem when running a query, despite the fact the connection was made between the two Vista PC's. Both my VISTA Computers are running MySQL. And I use it to tranfer data TO/FRO to both the PC's.

I did not change the timeout or any other variables. 
Luckily I switched off the router and tried again, BINGO it worked screaming fast. The connection speed and query transfer was fantastic.
Next I configured the LAN to work on segment 192.168.1.2 & 192.168.1.3.
Next I configured the ADSL Model to 192.168.0.1
I configured the Gateway for both the PC's as 192.168.0.1

This probably indicates, there is some bug in the TCP/IP Implementation of Microsoft. May be the packet is wrongly transferred to INTERNET (TCP) and it MyODBC driver waits for a machine on internet to reply with 7 hops. However this is not theorotically possible as after 7 hops if the reply does not reach from the router the packet is lost.

However working around this problem, it is clear that if the ADSL/INTERNET goes on another segment like 192.168.0.xxx, then the packet reaches MySQL ODBC.

I am not CERTIFIED GRADUATE in any discipline. These are my own analysis after suffering a lot in the hands of Microfost. (I fear legal retribution). I MAY BE WRONG.
[2 Sep 2011 17:47] Andy D
Would you kindly consider opening this bug again?  It is still a very real problem, 4 years after the initial bug report, even with the 10 second timeout.
[15 Sep 2011 14:31] Rickard Andersson
I can also confirm that this still seems to be an issue. I'm connecting through a OpenVPN connection from a Mac mini to a Debian VPS and the connection is fine, I'm able to connect to ssh through the tunnel without any problems, and ICMP packets will go through just fine.

Debian: 
Linux xxx 2.6.18-194.26.1.el5.028stab070.14 #1 SMP Thu Nov 18 16:34:01 MSK 2010 x86_64 GNU/Linux

ii  mysql-server-5.1                5.1.58-1~dotdeb.1            MySQL database server binaries and system database setup

Mac:
OS X 10.6.5 (32bit kernel)
Sequel Pro 0.9.7

I've been playing around with the connection_timeout value but not even 240 seconds will get me connected. 

show status like 'aborted_connects' shows (just restarted mysqld)

Variable_name	Value
Aborted_connects	1
[1 Mar 2012 18:16] Dani Kaplan
I get this bug as well.
Calling using ODBC (Latest Client) to 5.5.17 server

It happens only then I iterate through results on one query calling many small queries - but I call them one after another....
so I don't see what it causes this failure....
[13 Dec 2012 18:34] Michael Bender
I still have this issue also like everyone else in this thread using 5.1.62-0ubuntu0.10.04.-1-log on 64bit hardware.

Please re-open
[17 Dec 2012 3:46] salley zhao
I still have this issue also like everyone else in CentOS release 5.2 (Final),mysql server version is 5.1.45,mysql client version is 5.0.45-7,I don't know if different version cause this issue. 

mysql> SHOW global STATUS LIKE 'abort%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Aborted_clients  | 9586  |
| Aborted_connects | 1801  |
+------------------+-------+
2 rows in set (0.00 sec)
Ten minutes After:
mysql> SHOW global STATUS LIKE 'abort%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Aborted_clients  | 9663  |
| Aborted_connects | 1820  |
+------------------+-------+
2 rows in set (0.00 sec)

mysql> show variables like '%wait%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| table_lock_wait_timeout | 50    |
| wait_timeout            | 30    |
+-------------------------+-------+
2 rows in set (0.00 sec)

mysql> show variables like '%version%';
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| protocol_version        | 10                  |
| version                 | 5.1.45-log          |
| version_comment         | Source distribution |
| version_compile_machine | x86_64              |
| version_compile_os      | unknown-linux-gnu   |
+-------------------------+---------------------+
5 rows in set (0.00 sec)
[10 Apr 2013 14:21] Willie Frontera
I can confirm this bug on an up-to-date 64 bit debian system. Connection seems to drop more often at busy times but it remains unpredictable when it happens.
[14 Jun 2013 8:09] Rob Janssen
Uncaught exception 2 in 'databaseconnection' at line 23: mysqli::real_connect(): (HY000/2013): 
Lost connection to MySQL server at 'reading authorization packet', system error: 0

Variable_name               Value
Aborted_clients             165
Aborted_connects            227

Variable_name               Value
innodb_lock_wait_timeout    50
table_lock_wait_timeout     50
wait_timeout                28800

Variable_name               Value
protocol_version            10
version                     5.0.30-log
version_bdb                 Sleepycat Software: Berkeley DB 4.1.24: (December 16, 2006)
version_comment             Gentoo Linux mysql-5.0.30
version_compile_machine     i686
version_compile_os          pc-linux-gnu
[1 Apr 2015 15:32] José Kleber kleber
I'm with this problem in a multi-threaded application. each thread at the end of the process, must write a record in the database. I'm about to give up the mysql use this application
[21 Oct 2015 14:49] Alexander Millar
For others that find this issues when trying to resolve. We encountered this problem whenever our hourly backups were running via mysqldump.