Bug #28359 Intermitted lost connection at 'reading authorization packet' errors
Submitted: 10 May 2007 18:25 Modified: 25 Oct 2007 21:24
Reporter: Jon Ribbens
Status: Closed
Category:Server: Errors Severity:S2 (Serious)
Version:5.0.41 OS:Linux (RedHat Enterprise Linux ES release 4)
Assigned to: Magnus Blaudd Target Version:
Tags: Lost connection, authorization, linux, 64-bit

[10 May 2007 18:25] Jon Ribbens
Description:
We are using MySQL 5.0.41, connecting via TCP/IP to the database server. Intermittently we
are getting:

  OperationalError: (2013, "Lost connection to MySQL server at 'reading authorization
packet', system error: 0")

The servers are 64-bit, using the MySQL-server-community-5.0.41-0.rhel4.x86_64.rpm binary
install package provided by yourselves.

The clients are 32-bit, using the MySQL-client-community-5.0.41-0.rhel4.i386.rpm binary
install package provided by yourselves. We are using Python and MySQL_python-1.2.2.

I do not believe this is likely to be a networking issue - by the time the connection
setup is at the 'reading authorization packet' stage, the TCP/IP connection is already
established and has been successfully used for two-way communication. I also don't see how
it can be a configuration problem - the server works fine 99% of the time, and, if a
particular connection initialises successfully it will then work 100% reliably for many
weeks without problem.

How to repeat:
Use our web site a lot.

Suggested fix:
Presumably some sort of bug or race-condition in the authentication code on the server?
[10 May 2007 18:30] Jon Ribbens
Oh, I forgot to mention, if you google for "reading authorization packet" you'll see
thousands of sites where this intermittent error has leaked into the site content (i.e.
usually a PHP site), which provides further evidence it's a general problem/bug and not
just something weird about our setup.
[10 May 2007 21:28] Jon Ribbens
I did some debugging with --debug, below is a partial log.
It shows a single thread from connection to where it dies.
As you can see it is dying because of EINTR on read().
What I don't get is (a) why this doesn't happen at any time other than during connection
setup (connections once established are rock-solid), and (b) why MySQLd isn't coping with
this, since it's completely standard and expected to get EINTR from read().

T@10874248: ?func: info: New connection received on TCP/IP (70)
T@10874248: vio_peer_addr: enter: sd: 70
T@10874248: vio_peer_addr: exit: addr: 192.168.1.74
T@10874248: my_malloc: my: size: 13  my_flags: 0
T@10874248: my_malloc: exit: ptr: 0xf5d5a0
T@10874248: ?func: info: Host: unknown host  ip: 192.168.1.74
T@10874248: vio_keepalive: enter: sd: 70  set_keep_alive: 1
T@10874248: net_write_command: enter: length: 67
T@10874248: vio_is_blocking: exit: 0
T@10874248: vio_write: enter: sd: 70  buf: 0x10016e0  size: 72
T@10874248: vio_write: exit: 72
T@10874248: vio_is_blocking: exit: 0
T@10874248: vio_read: enter: sd: 70  buf: 0x10016e0  size: 4
T@10874248: vio_read: vio_error: Got error 11 during read
T@10874248: vio_read: exit: -1
T@10874248: ?func: info: vio_read returned -1,  errno: 11
T@10874248: thr_alarm: enter: thread: T@10874248  sec: 5
T@10874248: vio_blocking: enter: set_blocking_mode: 1  old_mode: 0
T@10874248: vio_blocking: exit: 0
T@10874248: vio_read: enter: sd: 70  buf: 0x10016e0  size: 4
T@10874248: vio_read: vio_error: Got error 4 during read
T@10874248: vio_read: exit: -1
T@10874248: ?func: info: vio_read returned -1,  errno: 4
T@10874248: ?func: error: Couldn't read packet: remain: 4  errno: 4  length: -1
T@10874248: vio_blocking: enter: set_blocking_mode: 0  old_mode: 1
T@10874248: vio_blocking: exit: 0
T@10874248: net_printf_error: enter: message: 1043
T@10874248: vio_is_blocking: exit: 0
T@10874248: push_warning: enter: code: 1043, msg: Bad handshake
T@10874248: close_connection: enter: fd: TCP/IP (70)  error: ''
[13 May 2007 16:00] Jeff C
I get this often as well.

I'm on 32bit.  I noticed it happening when using 5.1 clients going to 5.0.  I haven't seen
5.0->5.0 clients producing this error as of yet... 

Jon Ribbens, are your clients 5.1 going to 5.0 as well?  

I hope they fix this bug as fast as possible as this bug hurts mysql's image.  All the
errors being generated thru php (search google and stuff)... most people will get a bad
view of Mysql when it is great!
[13 May 2007 16:37] Jon Ribbens
Our clients were 5.0.27 talking to 5.0.27 servers. I upgraded to see if this would fix the
problem (which it didn't) - they are now 5.0.41 clients talking to 5.0.41 servers.
[13 May 2007 19:39] Jon Ribbens
I have discovered what's going on, sort of. The login is sometimes taking over 5 seconds,
and therefore it is timing out. Although the MySQL documentation says that this should
produce a "Bad handshake" error on the client, it doesn't - the server just drops the
connection.

I have worked-around the problem by increasing the connect_timeout global variable to 30
seconds.

What I do not yet know is why the login is taking over 5 seconds. Neither our client
machine or server machine are overloaded, the logins should be completing practically
instantaneously.
[13 May 2007 20:59] Jeff C
Well at least you found a semi-solution that works.  I hope they figure it out soon.  I've
added 30 seconds into connect_timeout...

Thanks for the tip Jon.
[6 Jun 2007 20:31] Gabriel Barazer
I got this error many times on a moderately loaded server as well.
The error string and code (0) isn't clear at all : Reading the source related to  these
errors shows some tests during the authentication process to the mysql server. See
sql-common/client.c:2313 to 2321 : Error "reading authorization packet" is returned when
mysql->net.last_errno == CR_SERVER_LOST, which by the constant CR_SERVER_LOST, obviously
means that the server has dropped the connection. Reasons for this can be various, but I
think it has something to do with timeouts during authentication on the server side. As
wrote Jon, the problem might be a bug in the network error handling (which could explain
the randomness of this bug).

Temporary workaround is indeed setting a long timeout, to avoid the server dropping the
connection. I will check if this is enough to make this bug disappear. But there is
clearly something to do on the server code to avoid that. (somewhere next to
sql/sql_parse.cc:1143 , version 5.0.41)
[9 Aug 2007 23:52] Sveta Smirnova
Thankk you for the report.

Error 2013, "Lost connection to MySQL server" is client error, not server. Even if server
sent Bad Handshake this means client didn't receive anything. Also there is "T@10874248:
push_warning: enter: code: 1043, msg: Bad handshake" in the server stack trace. Connection
outage seems to be somewhere in your environment.

So I don't see bug here. Please explain what is wrong?
[28 Aug 2007 11:56] Jon Ribbens
Sveta, firstly the documentation says you will get the 'Bad handshake' error, and this is
false. You will never see this error. So that's either a bug in the code or a mistake in
the documentation. The only way I could find to diagnose this problem was to either 'gdb'
the mysql server binary, or to run the 'debug' mysql server and try and use the
completely-undocumented debug macros. (The "T@10874248" stuff you mention is not in the
'server stack trace', it is in the debug macro output, and only if you know how to use
that, which I suspect essentially no non-mysql-employees do.)

Secondly, I explained in my original report why the problem is unlikely to be caused by
the network. Please explain why you say "Connection
outage seems to be somewhere in your environment" when all the evidence seems to point to
the contrary.
[28 Aug 2007 15:41] Sveta Smirnova
Thank you for the feedback.

The manual says "The number of seconds that the mysqld  server waits for a connect packet
before responding with Bad handshake" at
http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#option_mysqld_connect_
timeout which means server will send this error and nothing about client will receive this
error. If you mind other quote from manual please provide exact quote and URL.

Also how will client receive any error from server if it lost connection?

> Secondly, I explained in my original report why the problem is unlikely
> to be caused by the network. Please explain why you say "Connection
> outage seems to be somewhere in your environment" when all the evidence
> seems to point to the contrary.

You said "Temporary workaround is indeed setting a long timeout, to avoid the
server dropping the connection." Additionally you get "lost connection" error. This shows
something in your environment make network connection between MySQL client and server slow
and buggy sometimes.

So I don't see MySQL bug here. If you will be able to provide test case which shows
problem is MySQL code and not environment which makes network connection slow and/or buggy
feel free to provide the test case and reopen the report.
[28 Aug 2007 18:21] Jon Ribbens
> The manual says "The number of seconds that the mysqld  server waits
> for a connect packet before responding with Bad handshake" 

Indeed. What do you think the word 'respond' means?

> Also how will client receive any error from server if it lost connection?

It didn't lose the connection, the server deliberately dropped it. The server could quite
easily send an error before dropping the connection.

> Additionally you get "lost connection" error. This shows something in your
> environment make network connection between MySQL client and server slow
> and buggy sometimes.

"Lost connection" shows nothing except that the server has deliberately dropped the
connection. I have already explained why the problem does not appear to be the network.
[28 Aug 2007 18:23] Jon Ribbens
I am setting the bug back to 'open' because, at the very least, the "Bad handshake" thing
is a documentation error.
[31 Aug 2007 11:21] Magnus Blaudd
Hi,

will try to explain how a connection to the MySQL Server works. You have provided a very
nice tracefile that show what happens in the server when the client get the error "2013
Lost connection to MySQL server at 'reading authorization
packet', system error: 0" and I'll comment on that one directly.

It actually starts before the beginning of the trace file where the server accepts the
connection and starts a new thread to handle this incoming connection. It will set the
network read and write timeouts to the value of "connect_timeout", which is 5 seconds by
default in order to quickly be able to disconnect clients that don't respond(try telnet to
the mysqld port...). Then call 'check_connection' in sql_parse.cc

> T@10874248: ?func: info: New connection received on TCP/IP (70)
> T@10874248: vio_peer_addr: enter: sd: 70
> T@10874248: vio_peer_addr: exit: addr: 192.168.1.74

The connection is from 192.168.1.74, we use 'vio_peer_addr' to ask the network about
that.

> T@10874248: my_malloc: my: size: 13  my_flags: 0
> T@10874248: my_malloc: exit: ptr: 0xf5d5a0
> T@10874248: ?func: info: Host: unknown host  ip: 192.168.1.74
> T@10874248: vio_keepalive: enter: sd: 70  set_keep_alive: 1

Call to 'vio_keep_alive', this shows that the ip was allowed to connect du to ACL lists.
At sql_parse.cc:880

> T@10874248: net_write_command: enter: length: 67
> T@10874248: vio_is_blocking: exit: 0
> T@10874248: vio_write: enter: sd: 70  buf: 0x10016e0  size: 72
> T@10874248: vio_write: exit: 72

The server builds the "initial greeting message" and sends it to the client.

> T@10874248: vio_is_blocking: exit: 0
> T@10874248: vio_read: enter: sd: 70  buf: 0x10016e0  size: 4
> T@10874248: vio_read: vio_error: Got error 11 during read
> T@10874248: vio_read: exit: -1
> T@10874248: ?func: info: vio_read returned -1,  errno: 11

The server starts to read from the client in non blocking mode just to see if there is
anything there. errno 11 shows no data was available at this time.

> T@10874248: thr_alarm: enter: thread: T@10874248  sec: 5

Setup alarm that will break the below read in blocking mode after 5 seconds.

> T@10874248: vio_blocking: enter: set_blocking_mode: 1  old_mode: 0
> T@10874248: vio_blocking: exit: 0

Turn on blocking mode.

> T@10874248: vio_read: enter: sd: 70  buf: 0x10016e0  size: 4
> T@10874248: vio_read: vio_error: Got error 4 during read
> T@10874248: vio_read: exit: -1
> T@10874248: ?func: info: vio_read returned -1,  errno: 4
> T@10874248: ?func: error: Couldn't read packet: remain: 4  errno: 4  length: -1

Read from the client. Fails after 5 seconds because the client hasn't said anything(or at
least it hasn't yet reached the server). This is "such a serious error" that the server
decides not to write anything more to the socket, to enforce this it set the flag
"net->error= 2"(this can be seen in function 'my_real_read' in net_serv.cc). This flag has
the effect that even if the server calls a function that writes to the net it will be
thrown away.

Even if the server would write to the client(which might not exist anymore), the client
would most likely be out of sync. Remember that packets can be "in the air" as if throwing
a ball back and forth.

> T@10874248: vio_blocking: enter: set_blocking_mode: 0  old_mode: 1
> T@10874248: vio_blocking: exit: 0
> T@10874248: net_printf_error: enter: message: 1043

Write the "Bad handshake" error message to the client. But since "net->error" is 2 it will
be thrown away and never sent. Sorry!

> T@10874248: vio_is_blocking: exit: 0
> T@10874248: push_warning: enter: code: 1043, msg: Bad handshake
> T@10874248: close_connection: enter: fd: TCP/IP (70)  error: 

Finally close the connection to the client. If the client is still there as in this case,
it will be in the state of reading from the network/server expecting an answer to it's
"Login request" that it has sent(although the server didn't get it in time). So instead of
that reply it'll get an error since the server closed the connection and thus will display
2013 indicating "The client didn't get an error when writing to the server, but it didn't
get a full answer (or any answer) to the question."

How can we then determine this is not a server bug, after all the "thr_alarm" function
could be buggy in some way and thus breaking the read from the client before the 5 seconds
has passed. Or something...

1. Use wireshark/etheral to log the connection between the client and the server. Since it
will include the time of each packet it receive that will effectively show how long the
server have waited before seding out the socket reset(a RST packet). Wireshark is very
nice since it also knows how to interpret the MySQL protocol. If you are running on the
default port 3306 it will interpret automatically, otherwise use "Decode as.." and set the
port being used to MySQL.

Example trace(reduced):
Time        From/To                         What   
0.000251    127.0.0.1:3306->127.0.0.1:10508 MySQL Server Greeting proto=10
version=5.0.50-debug-log
0.004448    127.0.0.1:10508<-127.0.0.1:3306 MySQL Login Request user=root db=test
0.004586    127.0.0.1:3306->127.0.0.1:10508 MySQL Response OK

2. Insert sleep(10) in the appropriate place in server or client code.
 - When doing this in the client code, I get exactly the error described here.
 - When doing it in the server code, it's not possible to get any error at all
   since the client is reading with almost infinite timeout.

3. Check the value of "aborted_connections" by using "SHOW STATUS like
'aborted_connections'"  - it should have increased with one after the client got this
error.

Can the server send ER_BAD_HANDSHAKE as it says in the manual? Yes, it will do
that in case of other errors during this handshake phase(for example if the
length of the clients Login request is wrong). But most likely _never_ in the case
where "connect_timeout" has expired - since it's "such a serious error" ;)

Why don't we have a higher "connect_timeout" value by default? 5 seconds is fairly low in
order to be "secure by default" - the low value allows spurious connects to that port to
be
discarded quickly and free up that thread to do some useful work. But maybe 5 seconds is
too
low?
[31 Aug 2007 11:22] Magnus Blaudd
I'll recommend the manual be updated. We really don't want our users have
to produce tracefiles or use wireshark in order to find out they need to increase the
"connect_timeout" value. 

An error message on the client saying "2013 Lost connection to MySQL Server at
'<something>', system error: <errno>" most likely indicates the the "connect_timeout"
value is to low on the server. There is also "2013 Lost connection to MySQL Server during
query" but that is a _different_ problem.
[21 Sep 2007 16:59] Magnus Blaudd
Talked with Monty about this for an explanation why we have such a low connect_timeout by
default. It's primarily a security feature i.e if there are "clients" connecting to the
server without doing anything, they will only block the server(i.e one thread in the
server) for the 5 seconds.

Since quite a lot of these error messages show up in a web search, we will increase the
default value to 10 seconds.

Increasing the connect_timeout value in the servers my.cnf is recommended if this 
problem occurs frequently. It does not really make the server insecure.

Check the value of "aborted_connections" by using "SHOW STATUS like
'aborted_connections'"  to easily determine this is the problem you experienc, it will
increase with one for each connection the server aborts.
[21 Sep 2007 17:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/34456

ChangeSet@1.2526, 2007-09-21 17:10:45+02:00, msvensson@shellback.(none) +1 -0
  Bug#28359 Intermitted lost connection at 'reading authorization packet' errors
   - Increase default 'connect_timeout' value to 10 seconds
[18 Oct 2007 23:35] Bugs System
Pushed into 5.1.23-beta
[18 Oct 2007 23:37] Bugs System
Pushed into 5.0.52
[25 Oct 2007 21:24] Paul DuBois
Noted in 5.0.52, 5.1.23 changelogs.

The default value of the connect_timeout system variable was
increased from 5 to 10 seconds. This might help in cases where
clients frequently encounter errors of the form Lost connection to
MySQL server at 'XXX', system error: errno. 

Also will add a new section about the "Lost connection" error at:

http://dev.mysql.com/doc/refman/5.0/en/common-errors.html
[9 Dec 2007 8:00] Ivan Pupkin
Hi!

I'd like to comment this issue. U think the problem is more complicated.

I have several 5.0.38 mysql servers. On 32 bit platform and on 64 bit platform.

On 64-bit platform i've noticed those problems You describe - a random connection lost to
mysql server.

The encreased timeout helped, but there is at least one issue when it doesn't help.

When i'm copying a large file from the mysql server with rsync protocol - it's impossible
to connect to mysql server - the connection lost appears.

I was dumping traffic with tcpdump - and noticed a lot of packets with checksum error.

I must add that this doesn't happen when You are loading the network with another
protocols (at least that i tried didnt).

And this issue does not happen on 32 bit pltform.

If You are able to do this please try to reproduce it, the keys are:

1) 64 bit linux
2) copying a large file (>10Gb) with rsync from mysql server
3) maybe master-slave replication (my servers are in master-slave replication)
[25 Sep 2008 18:48] Christian Deligant
I have a very similar bug since I upgraded from Mysql $ to Mysql 5... Client and server
are on the same machine, the PHP logs in with "localhost"

Never had problems with 4, apart of too many connection errors from times to other due to
a very busy server...

Hope this helps!
[4 Oct 2008 4:47] Dan Franklin
Some commenters said to use a show status command, but they gave the incorrect command. 
It is
  show status like 'aborted_connects'
not "aborted_connections".

This is the case for 5.0.45, anyway.
[20 Oct 2008 10:21] Ondrej Brablc
We thought that the problem might be in 32-bit client and 64-bit server too, we have even
increased the connection timeout to 60 seconds, but this helped only partially. The final
solution in our case was to switch the ethernet card from 100Mb/s to 1000Mb/s mode. So
just in case you found this bug and have similar problems, and increasing timeout does not
help, use ethtool to check your card mode and test whether switching to 1000Mb/s would not
help in your case as well.
[21 Oct 2008 17:53] Magnus Blaudd
Ever tried with wireshark trying to see how the network packets looked like? Could the
first NIC have been faulty?
[5 Apr 17:54] Rent Tor
Oh my God!
I've been looking for the cause of one of my client's problems with MySQL for over a year
now, just to find out through this bug report I stumbled upon by pure chance that it's
just the 64bit version of MySQL that's broken.

Wow. But what really makes me wonder are two things:

First, why doesn't MySQL aknowledge this as a bug but instead chooses to use the band aid
in the form of increased timeouts? On my client's system we upped the connect_timeout to a
ridiculous 240 seconds - with the problem still prevailing. Heck, upping the timeout is
like adding a second tank when your car starts leaking gas. You'll get the sime mileage
from one fillup - it's just a hell of a lot more expensive to fill the tank. Why not plumb
the leak?

Second: I've been on this issue not "on my own" but thanks to my client having invested in
a PLATINUM support contract, I had "help" from MySQL. The quality of which can essentially
be seen here: "Must be a network problem.", "Not a MySQL fault.", "Check your my.cnf.",
"The server is doing everything right there.". Not one single mentioning of this bug
report which would have helped me tremendously to recognize we were chasing ghosts in the
mist. I even offered live access to the servers affected but was turned down each and
every time.

By now I "fixed" the problem by patching up PHP - which is the major web app platform at
my client - so that it'll open connections to the MySQL database servers with the
Auto_Reconnect flag set. *This* finally fixed the problems. Still this is less than ideal
but then it's better than a web app that keeps crashing because of a bug MySQL obviously
neglects to even ack as a bug and therefore won't fix. Sad, sad, sad.

I actually believe that the bug could be found and consequentually fixed very easily.
Since it's only 64bit systems that are affected, this reeks a lot like a varibale
overflow/underflow or an unclean initialization somewhere. Most likely in the port setup.
Shouldn't be that hard to find. Just by looking at the startup messages of the 64bit
mysqld one can see that the source is everything but 64bit aware anyway.
[6 Apr 11:17] Magnus Blaudd
I hope you carefully read my previous explanation from "31 Aug 2007 11:21" about the
default value of "connect_timeout" being set quite low in MySQL Server as well as
describing how a connect works.

1. Would be very interesting to see a trace of a failed connect with for example
wireshark.

2. Have you run the query "SHOW STATUS LIKE 'aborted_connections'" on your server? This
would tell if it is the mysqld that get a read timeout during connect and thus closes the
new connection and increments that counter.

3. What version of MySQL are you using?

4. Yes, "reconnect" has been off by default for quite some time now.
[6 Apr 11:45] Rent Tor
> I hope you carefully read my previous explanation from "31 Aug 2007 11:21"
> about the default value of "connect_timeout" being set quite low in MySQL 
> Server as well as describing how a connect works.

Yes, I indeed did and I found it very interesting. 
However I have the feeling that you missed my statement about we having had
connect_timeout set as high as 240 seconds without this making any noticeable difference.

> 1. Would be very interesting to see a trace of a failed connect with for
> example wireshark.

I will see whether I can provide that. Unfortunatley the servers are hit by some millions
of requests a day so the log may be quite large and a bit hard to read. In case you've got
any information as to how to decrease the trace size, let me know.

> 2. Have you run the query "SHOW STATUS LIKE 'aborted_connections'" on your
> server? This would tell if it is the mysqld that get a read timeout during 
> connect and thus closes the new connection and increments that counter.

As mentioned in the comments above and as you've therefore surely read it's "SHOW STATUS
LIKE 'aborted_connects'" - not connections. Anyway, of course I did.

From 11:31am my local time today:

mysql> show status like 'abort%';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| Aborted_clients  | 199461 | 
| Aborted_connects | 59490  | 
+------------------+--------+
2 rows in set (0.05 sec)
mysql> 

From 11:34am my local time:
mysql> show status like 'abort%';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| Aborted_clients  | 199501 | 
| Aborted_connects | 59495  | 
+------------------+--------+
2 rows in set (0.05 sec)

As you can see not only the aborted_connects increase but the server also aborts clients
which is another, as of yet unresolved problem, my client encounters.

> What version of MySQL are you using?
Oh yes. That question again ;) Sorry, that's really not your fault, but by now I'm quite
sick and tired of it. That's because I've heared enough "Wait for next release" statements
already. Anyway:

mysql> show variables like 'version';
+---------------+---------------------------+
| Variable_name | Value                     |
+---------------+---------------------------+
| version       | 5.0.72-enterprise-gpl-log | 
+---------------+---------------------------+

And yes, I might even consider upgrading yet again as I did the last 5 to 7 times. But
since the problem didn't go away the last times, chances are it won't go away this time,
too.

> Yes, "reconnect" has been off by default for quite some time now.
To my knowledge, it has never been active in PHP but I may well be wrong there.
[6 Apr 12:59] Rent Tor
From 12:56pm:

+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| Aborted_clients  | 200198 | 
| Aborted_connects | 62007  | 
+------------------+--------+
2 rows in set (0.24 sec)
[6 Apr 13:01] Rent Tor
Operating System is RedHat Linux Enterprise Server 5.

Linux idgverl-li43.privatedomain 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 07:18:46 EST 2008
x86_64 x86_64 x86_64 GNU/Linux
[3 Jul 19:34] mike mike
I'm having this issue with the latest MySQL server from Ubuntu 64-bit Jaunty

mysql  Ver 14.12 Distrib 5.0.75, for debian-linux-gnu (x86_64) using readline 5.2

[root@sql02 ~]# dpkg -l | grep mysql
ii  libdbd-mysql-perl                                4.008-1                          A
Perl5 database interface to the MySQL data
ii  libmysqlclient15-dev                             5.1.30really5.0.75-0ubuntu10.2  
MySQL database development files
ii  libmysqlclient15off                              5.1.30really5.0.75-0ubuntu10.2  
MySQL database client library
ii  mysql-client                                     5.1.30really5.0.75-0ubuntu10.2  
MySQL database client (metapackage depending
ii  mysql-client-5.0                                 5.1.30really5.0.75-0ubuntu10.2  
MySQL database client binaries
ii  mysql-common                                     5.1.30really5.0.75-0ubuntu10.2  
MySQL database common files
ii  mysql-server                                     5.1.30really5.0.75-0ubuntu10.2  
MySQL database server (metapackage depending
ii  mysql-server-5.0                                 5.1.30really5.0.75-0ubuntu10.2  
MySQL database server binaries
ii  mysql-server-core-5.0                            5.1.30really5.0.75-0ubuntu10.2  
MySQL database core server files

It occurs maybe once per day. I have ~ 1000 queries per second on average coming from 4
different machines over TCP and batch jobs running locally over unix socket, and only the
unix socket fails - again, only maybe once a day, but it can be an important job to miss
if it doesn't work.

I've tried adding in PHP mysqli options to increase the connection timeout, it didn't
help.