MySQL Bugs: #71960: Mysql 5.6.16 on OSX throws connection errors with innodb_file_per

Bug #71960	Mysql 5.6.16 on OSX throws connection errors with innodb_file_per_table option
Submitted:	6 Mar 2014 23:57	Modified:	17 Apr 2015 0:18
Reporter:	Kris Iyer	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Options	Severity:	S1 (Critical)
Version:	5.6.16, 5.6.20	OS:	MacOS (10.8)
Assigned to:		CPU Architecture:	Any
Tags:	innodb_file_per_table, OSX 10.8, OSX 10.9, system error: 0, system error: 32

Description:
Under heavy workload mysql 5.6.16 (replicated on 5.6.13 and higher) throws connection errors "Lost connection to MySQL server at 'sending authentication information', system error: 32". This was tested on OSX 10.8 and 10.9. The issue is not seen on windows or other unix servers like centos.

How to repeat:
This was tested and the issue was repeatable with a simple database import through the mysql client.The database dump had around 1000 tables and 200 MB in size. While the dump is being sourced other mysql client(s) trying to connect (even if it were a different user than the one running the import) you get the authentication error.

"mysql -uroot -p
Enter password: 
ERROR 2013 (HY000): Lost connection to MySQL server at 'sending authentication information', system error: 32"

Also observed that passing in the host causes mysql to throw a different exception:

"mysql -uroot -p -h127.0.0.1
Enter password: 
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 0"

Running mysql under debug (mysql-dubug) and generating a trace shows the below when the issue occurs:

[login_connection: info: login_connection called by thread 11
login_connection: info: New connection received on TCP/IP (1089)
login_connection: info: Host: unknown host  ip: 127.0.0.1
acl_authenticate: info: com_change_user_pkt_len=0
my_message_sql: error: error: 1158  message: 'Got an error reading communication packets'
my_message_sql: error: error: 1043  message: 'Bad handshake'
one_thread_per_connection_end: info: thd 0x7f824d245200 block_pthread 1
trans_rollback: info: clearing SERVER_STATUS_IN_TRANS
intern_plugin_unlock: info: unlocking plugin, name= InnoDB, ref_count= 562
intern_plugin_unlock: info: unlocking plugin, name= InnoDB, ref_count= 561
one_thread_per_connection_end: info: remove_global_thread 0x7f824d245200 current_linfo 0x0
~THD(): info: THD dtor, this 0x7f824d245200
~THD(): info: freeing security context
one_thread_per_connection_end: info: Blocking pthread for reuse
do_command: info: Command on TCP/IP (32) = 3 (Query)
dispatch_command: info: packet: ''; command: 3
st_select_lex::add_item_to_list: info: Item: 0x7f824d24a2a8
JOIN::prepare: info: setup_ref_array this 0x7f82447e8250   10 :    0    0    1    1    0    0
setup_fields: info: thd->mark_used_columns: 1
setup_fields: info: thd->mark_used_columns: 1
setup_conds: info: thd->mark_used_columns: 1
THD::decide_logging_format: info: query: SELECT 1
THD::decide_logging_format: info: variables.binlog_format: 1
THD::decide_logging_format: info: lex->get_stmt_unsafe_flags(): 0x0
THD::decide_logging_format: info: decision: no logging since mysql_bin_log.is_open() = 0 and (options & OPTION_BIN_LOG) = 0x40000 and binlog_format = 1 and binlog_filter->db_ok(db) = 1
JOIN::optimize: info: No tables
handle_select: info: res: 0  report_error: 0
dispatch_command: info: query ready
net_send_eof: info: EOF sent, so no more error sending allowed]

The interesting piece of trace being "login_connection: info: Host: unknown host  ip: 127.0.0.1" where it sees the host as "unknown host" which appears to be the cause of the authentication error.

This could also be seen under the performance schema where the HOST is seen as NULL while multiple clients try to connect:

mysql> select * from hosts;
+-----------+---------------------+-------------------+
| HOST      | CURRENT_CONNECTIONS | TOTAL_CONNECTIONS |
+-----------+---------------------+-------------------+
| NULL      |                  18 |                35 |
| localhost |                   1 |                 2 |
+-----------+---------------------+-------------------+
2 rows in set (0.00 sec)

The counter "Aborted_connects" keeps increasing.

From the JDBC side, tested this with the recent connector/j and mariadb java client and both fail with similar exceptions:

[Caused by: java.io.EOFException: unexpected end of stream, read 0bytes from 4
	at org.mariadb.jdbc.internal.common.packet.buffer.ReadUtil.readFully(ReadUtil.java:84) [mariadb-java-client.jar:]
	at org.mariadb.jdbc.internal.common.packet.buffer.ReadUtil.readFully(ReadUtil.java:92) [mariadb-java-client.jar:]
	at org.mariadb.jdbc.internal.common.packet.RawPacket.nextPacket(RawPacket.java:77) [mariadb-java-client.jar:]
	at org.mariadb.jdbc.internal.common.packet.SyncPacketFetcher.getRawPacket(SyncPacketFetcher.java:67) [mariadb-java-client.jar:]
	at org.mariadb.jdbc.internal.mysql.MySQLProtocol.connect(MySQLProtocol.java:468) [mariadb-java-client.jar:]
	... 69 more"]

Also observed that bouncing the mysql instance clears up the problem. It was also seen that terminating all the mysql clients like jdbc (shutting down the app servers) also helps with the issue and you are able to login once again.

my.cnf that was used to test this:
[
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.6/en/server-configuration-defaults.html

[mysqld]
interactive_timeout=180
wait_timeout=180
max_connect_errors=9999
skip-name-resolve
query_cache_size=0
bind-address=127.0.0.1

# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
innodb_buffer_pool_size = 2G

# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin

# These are commonly set, remove the # and set as required.
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
port = 3306
server_id = kris_dev
socket = /tmp/mysql.sock 

# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
join_buffer_size = 128M
sort_buffer_size = 2M
read_rnd_buffer_size = 2M 

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES 

[client]
default-character-set=utf8

]

This was tested on machine specs ranging from 4GB, 8GB and 16 GB and 32 GB RAM. Things were a little better on the higher specs (16GB and above) but looked severe on anything 8GB and below.

Further debugging and tracing has led to the mysql option "innodb_file_per_table" which is turned on by default on Mysql 5.6 to be causing this issue. Turning off this option has revealed positive results so far and no occurrence of system error: 32. 

It was also noticed that increasing open file limits (ulimit -n) on the higher spec machines helped with better results while running with "innodb_file_per_table" turned on.

Experiencing what appears to be the exact same bug on OS X 10.7 and 10.8 with similar outcomes (re: innodb_file_per_table = off).

Thanks Kris for finding that little nugget.  I've been trying different things and regularly restarting MySQL for days with no success, but that directive seems to help.

Am affected by an issue with similar symptoms. Have tried workarounds suggested elsewhere (increasing max open files limits) - no avail.

Hello Kris Iyer,

Thank you for the report.

Thanks,
Umesh

I was affected by the same issue. I found that if I kept a MySQL root prompt open until I saw the error, I could perform a FLUSH TABLES command, and immediately I was able to log in again.

The issue was resolved finally by increasing the open files limit per process from 256 files to 65536 files by following the guide at :

http://docs.basho.com/riak/latest/ops/tuning/open-files-limit/#Mac-OS-X

I think that client behaviour, as well as lower default resource limits in Mac OS X Yosemite is responsible for the increase in the number of times I was seeing this issue. Whenever I use a MySQL command-line client, it scans the current database to read the table and field names for auto-complete purposes. This access all of the tables in the default database and brings them into the open table cache. Microsoft Excel 2013 Power Query is even worse; despite asking you for a database name when you connect, it continues to scan every table in every database. This adds a huge number of entries to the open tables cache, breaking the server pretty quickly.

It would be a huge step forward if you could arrange for the client to be sent the message "server ran out of file handles" or similar, instead of the distinctly unhelpful "Lost connection to MySQL server at 'reading authorization packet', system error: 0".

You might also consider keeping a couple of file handles in reserve so that it's always possible to open a connection as root. Currently, when this limit is hit, it's not possible to connect as root which makes it very difficult to diagnose or fix the problem. It's not even possible to shut down the database with mysqladmin in this state, since that relies on a new connection.

Cheers,
Paul

Occurring with 5.6.22 as well as Percona 5.6.21-70  10.8 and 10.9

I'm running Yosemite 10.10.2 / MySQL 5.6.22

We have the same symptoms as above. Strangely we can connect remotely but any connection via localhost, like our web server fails. Connecting using tools like navigate or Sequel pro fail from localhost and 127.0.0.1. Connecting using MySQL Workbench fails at localhost.

This machine is brand new and is NOT under any load. So the high load, large query note on MySQLs page about this issue does not apply here. Also because it is from localhost, long query times / slow connection can also be ruled out. (http://dev.mysql.com/doc/refman/5.0/en/error-lost-connection.html)

ATTEMPETED SOLUTIONS
1. connect_timeout = 10 (failed, problem persists)
2. upping file limits and process limits via plist (failed, problem persists)
3. As noted above if we issue "flush tables" from a remotely connected client then the web server can connect, MySQL Workbench can connect, local clients like navigate can connect. 

Item 3 is NOT a solution because the server will eventually fail again and "flush tables" must be reissued again. This could not be used as a production database.

We installed 5.6.23 on the machine described in the post above and the same bug persists.

I raised the OS X open files limit to 65536 and I think this delayed the problem, but didn't solve it.  Even with the higher limit, in my case the server still starts refusing connections when there are just over 1000 tables in the open table cache.  It could do this fairly repeatably.

I reduced the table_open_cache value to 500 and that seems to have resolved it for me, albeit with a (possibly minor) performance hit.

on our 5.6.23 mysql we tried the table_open_cache = 500 patch and that seems to be working so far for us as well. 

I have not done exhaustive testing yet, but frankly the database would go away frequently without any exhaustive activity so my guess is that this has fixed or at least worked around the bug.

We upgraded our XServe 2009 to Yosemite 2 weeks ago and soon after discovered this MySQL Workbench connections problem. Mostly, we would get Error 32 lost connection while using external MySQL terminal clients, sometimes we would see an authentication Error 0. These errors would generally occur within a few minutes of initiating an external Workbench connection. Our Tomcat server would also cease to function and essentially it looked like a crash. At the time, I did not know that the 'flush tables' command might possibly fix the WEB server issue. 

We are using Yosemite 10.10.3, and I am very thankful to James Kemmerer for his note about setting the table_open_cache value to 500 as it does appear to fix the problem.

Hi George,

I'm glad your workaround is helping, although I think that James was just endorsing the fix I had originially mentioned.  :)

From reading the comments on this bug report, this appears to be a very real issue for many, possible all, Mac OSX users running MySQL servers.  If this platform is to be considered a genuinely supported platform for MySQL, then the repeatable bug when the open table cache number gets to around 1000 should be investigated and addressed.

I'd be delighted if a MySQL representative can give some reassurance that this is being taken seriously.  It's a real issue affecting a lot of users.

Regards,
Paul

Just wrapping up with #perconalive 2015. From talking to people it seems like a lot of folks are running into this and infact i find most of the developers in the community using a macbook.  Likely just working around this bug. Mentioned this to a couple of folks who work closely with the bug database. Hopefully gets some attention soon.

Reliably reproduces the issue described by Bug #71960

Attachment: mysql_issue.sh (application/octet-stream, text), 695 bytes.

I've attached a file (http://bugs.mysql.com/file.php?id=22683&bug_id=71960) that reliably reproduces this issue by creating a large number of tables.

None of the suggested work-arounds remove the problem save for disabling `innodb_file_per_table`, which is not ideal. I have verified I can increase my processes' available file descriptors, and I have attempted to limit the `table_open_cache` size. In any case where innodb_file_per_table is enabled, my mysql server becomes unresponsive usually between creating table #2200 and #2300. After which point mysql is completely unresponsive as it will not accept additional connections.

I'm having the same issue. A workaround would help a ton. Thanks for your time.

Still persists at 5.6.25. Limiting the table_open_cache to 500 has held it off, but if I go above 500 the error returns.

Several quick notes on this bug.

Note 1 of 2
We installed 5.6.26 and the BUG IS STILL PRESENT. 

The bug fixes noted above added to my.cnf still serve as a work around: 
sql_mode=NO_ENGINE_SUBSTITUTION
table_open_cache = 500

Note 2 of 2

With great excitement and anticipation we installed 5.7.9 last week hoping to finally put this bug to rest.

IT APPEARS THIS BUG PERSISTS IN 5.7

We have had results similar to those mentioned above for 5.6.x. Our new install of 5.7.9 seems to go down and refuse connections in a manner similar to 5.6.

We are proceeding with applying and testing the fixes / work arounds used for 5.6. We'll let you know if either or both of these modifications to your my.cnf are necessary.

sql_mode=NO_ENGINE_SUBSTITUTION
table_open_cache = 500

Fingers crossed that these work, otherwise we'll have to develop an entirely new work around for this long lived and nasty bug.

We are seeing on our JAMF Software Server (JSS) running OS X 10.10.5 “Yosemite” with Server application version 5.0.15, we have seen under heavy workload with MySQL version 14.14 distribution 5.6.24 is it throws connection errors “Lost connection to MySQL server at ‘sending authentication information’, system error: 32”. Primarily we noticed errors during scheduled JSS database backup using the JSS Database Utility.

A workaround was modifying the "my.cnf" file.

/usr/local/mysql/my.cnf

And Under the |mysqld| section, add the following three variables:

innodb_file_per_table = OFF
table_definition_cache = 400
table_open_cache = 400

And restarted MySQL and it seems to temporarily resolved the issue.

I have a blog post that gives details:

https://apple.lib.utah.edu/?p=511

This bug is afflicting our environment as well. It has severely impacted performance of our JAMF Software Casper server that is running on OS X.

We're able to raise the table_open_cache to 1000. Anything higher and mySQL will destabilize.

mysqltuner advises against the workarounds we have to put in place due to this issue.