MySQL Bugs: #70735: Lose ability to connect locally and remotely to mysqld

Bug #70735	Lose ability to connect locally and remotely to mysqld
Submitted:	26 Oct 2013 18:06	Modified:	9 Jan 2015 17:50
Reporter:	William Levering	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server	Severity:	S1 (Critical)
Version:	5.6.14	OS:	MacOS (Server, 10.6.8)
Assigned to:		CPU Architecture:	Any

Description:
Perl scripts and mysqldump, which all use the information_schema database, lead to dropping the connection limit to 0, 1, 2 or 3. I have noticed while they were running before making changes, their connections led to blocking any other connections. We had more problems until we had the Perl scripts close the mysqld connection, then sleep for 5 minutes. However, those changes lead to frequently not being able to run batch jobs throughout the day, and during the day, sometimes block users from connecting locally, remotely, and by MySQL Workbench remotely. On occasion, I find that I can cannot connect locally despite mysqld still running, and the shutdown process also does not work. In such cases, I reboot the server. I am waiting for my contact at the IT Center to return so that we can address trying hardware tests to see if the fault is with the server. I am raising this bug here to see if there is any input from the MySQL team, such as with parameter settings, which I have gone over a number of times, adding some to hopefully improve results. No luck. This issue was not present with Mac OS X Server 10.5.8 and mysqld at 5.5.21, if I remember the number correctly. After upgrading the about 1 two months ago, and then upgrading mysqld, these issues arose. max_connections is set at 200. I wonder what the internal limit might be, or memory problems. Any clues or guidance to check anything, including libraries, would be most grateful because I am not certain this is a mysqld problem. I also set show-warnings to 2048, and I see nothing in the log about connection problems...

How to repeat:
Start mysqldump for a sizable database, and see how many new connections are allowed.

Thank you for the report.

Please send full MySQL server error log file.

Latest error log file covering a run started Sunday morning and crashed. I hope to have another log file example soon whereby mysql remains running, but no connections are possible or just one, which is typical for workdays.

131117 06:14:24 mysqld_safe Starting mysqld daemon with databases from /Volumes/CBOMacRaid1/NPIData/mysql/data
2013-11-17 06:14:25 74461 [Note] Plugin 'FEDERATED' is disabled.
2013-11-17 06:14:25 74461 [Note] InnoDB: The InnoDB memory heap is disabled
2013-11-17 06:14:25 74461 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2013-11-17 06:14:25 74461 [Note] InnoDB: Compressed tables use zlib 1.2.3
2013-11-17 06:14:25 74461 [Note] InnoDB: Not using CPU crc32 instructions
2013-11-17 06:14:25 74461 [Note] InnoDB: Initializing buffer pool, size = 4.0G
2013-11-17 06:14:26 74461 [Note] InnoDB: Completed initialization of buffer pool
2013-11-17 06:14:26 74461 [Note] InnoDB: Highest supported file format is Barracuda.
2013-11-17 06:14:26 74461 [Note] InnoDB: The log sequence numbers 36401174437610 and 36401174437610 in ibdata files do not match the log sequence number 36416427476134 in the ib_logfiles!
2013-11-17 06:14:26 74461 [Note] InnoDB: Database was not shutdown normally!
2013-11-17 06:14:26 74461 [Note] InnoDB: Starting crash recovery.
2013-11-17 06:14:26 74461 [Note] InnoDB: Reading tablespace information from the .ibd files...
2013-11-17 06:14:52 74461 [Note] InnoDB: Restoring possible half-written data pages
2013-11-17 06:14:52 74461 [Note] InnoDB: from the doublewrite buffer...
2013-11-17 06:14:54 74461 [Note] InnoDB: 128 rollback segment(s) are active.
2013-11-17 06:14:54 74461 [Note] InnoDB: Waiting for purge to start
2013-11-17 06:14:54 74461 [Note] InnoDB: 5.6.14 started; log sequence number 36416427476134
2013-11-17 06:14:54 74461 [Note] Server hostname (bind-address): '*'; port: 3306
2013-11-17 06:14:54 74461 [Note] IPv6 is available.
2013-11-17 06:14:54 74461 [Note]   - '::' resolves to '::';
2013-11-17 06:14:54 74461 [Note] Server socket created on IP: '::'.
2013-11-17 06:14:55 74461 [Note] Event Scheduler: Loaded 0 events
2013-11-17 06:14:55 74461 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.6.14-log'  socket: '/tmp/mysql.sock'  port: 3306  MySQL Community Server (GPL)
04:02:01 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=536870912
read_buffer_size=4194304
max_used_connections=3
max_threads=200
thread_count=2
connection_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2165408 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x2627d3a00
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 20a4e2e88 thread_stack 0x40000
0   mysqld                              0x00000001002c0fce my_print_stacktrace + 53^@
1   mysqld                              0x0000000100133674 handle_fatal_signal + 704^@
2   libSystem.B.dylib                   0x00007fff8224a1ba _sigtramp + 26^@
3   ???                                 0x5b2074736f686c61 0x0 + 6566376295845358689^@

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 527
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
131117 21:02:02 mysqld_safe mysqld restarted
nohup: can't detach from console: Inappropriate ioctl for device
131117 21:02:02 mysqld_safe mysqld from pid file /Volumes/CBOMacRaid1/NPIData/mysql/data/vhahaccbomac.vha.med.va.gov.pid ended

Thank you for the feedback.

Error 11 in most cases means out of memory error and together with your description of how parallel scripts behave this is expected guess. Please check value of max_connections variable and send us your configuration file.

I updated the script that had used the information_schema database. Even though I was not able to restart mysqld in the afternoon yesterday, the jobs ran without crashing due to failure to connect through the day. This morning I could only connect once, unable to start a parallel connection, so I restarted the server.

With 8 GB at the server, I wonder if any other memory options in /etc/my.cnf could be expanded to help.

Thank you for working with me.

Thank you for the feedback.

Well, with innodb_buffer_pool_size = 4G and key_buffer_size = 512M your MySQL server always uses 4,5G RAM and 3,5 which left can be easily spent if your connections create temporary tables: 200*512 = 100G

I recommend you to watch resource usage in time when MySQL can not accept new connections: I expect it to be very high.

Thank you.

Why would a 100 GB size temporary table consume 3.5 GB RAM if the rule is to write such a table to disk when space runs out?

Also, how do I watch resource usage and which usage? Should I maintain a connection? I run a script every 20 minutes to check some status. But if connections are blocked, the script cannot connect to report the information. Here is the script. Should it wait between max_connections changes to give mysqld time to clean up?:
mysql --user=npiadmin -t -vvv <<EOF
SELECT NOW()\G
SHOW GLOBAL STATUS WHERE Variable_name LIKE '%pen\_%' OR Variable_name LIKE 'Open%';
FLUSH TABLES;
SHOW GLOBAL STATUS WHERE Variable_name LIKE '%pen\_%' OR Variable_name LIKE 'Open%';
SHOW PROCESSLIST;
SELECT @@max_connections;
SET GLOBAL max_connections = 100;
SELECT @@max_connections;
SET GLOBAL max_connections = 200;
SELECT @@max_connections;
SHOW GLOBAL STATUS LIKE '%connect%';
EOF

Thank you for the feedback.

> Why would a 100 GB size temporary table consume 3.5 GB RAM if the rule
is to write such a table to disk when space runs out?

Not single table, but many tables up to 512M each

> Also, how do I watch resource usage and which usage? Should I maintain
a connection?

Not, you should run your operating system tools to watch externally.

The npiadmin account has ALL PRIVILEGES. I read something in the MySQL documentation a bit ago that such a set up used commonly can lead to problems that block root (and npiadmin) from connecting, whereas if the scripts instead of using npiadmin used a new account that does not have ALL PRIVILEGES would help my every-20-minutes script to run and show the information that may help me better understand the issue.

Also, I am unaware of creating a temporary table larger than 10 MB, and far more often less than that.

Thank you for triggering my memory, not that the new account would necessarily lead to the inability to connect with that account and users' accounts. I just wonder if there is an internal mysqld limit for connections at times that drops below (well below) max_connections...

Thank you for the feedback.

What I am trying to get from you is repeatable scenario which leads to not desired behavior of MySQL. What you describe so far is common symptom of resource over-usage. This can be result of the bug in MySQL code or not.

Therefore first thing I want to check is if my guess is correct: MySQL uses too much resources. This is easy test: simply run command `top` or `top -n 1` in parallel terminal when problem occurs. It will show RAM and CPU usage. Copy-paste its output and send to us.

Regarding to npiadmin, looks like you use some third-party product. 

> I read something in the MySQL documentation a bit ago that such a set up used commonly can lead to problems that block root (and npiadmin) from connecting,

This is correct: granting too much privileges can lead to problems. But this database is designed for reporting and verifying bugs in MySQL code only. So we need to stack with possible bug in MySQL code. However, if you need more help with usage of npiadmin and privileges: support on using our products is available both free in our forums at http://forums.mysql.com/ and for a reasonable fee direct from our skilled support engineers at http://www.mysql.com/support/

Thank you for your comments.

Scripts, for the most part, run serially using cron. Given that no temporary table exceeds 10 MB, it would take 350 to add up to 3.5 GB. No script. No single nor parallel running scripts come anywhere close to 350 active temporary tables, and nearly all are much smaller. I have learned to do things in small increments.

I run an ssh tunnel for the MySQL connection from my client computer. The tunnel is set up in its own Terminal window in which I run top -s 5. I have been doing that for months. I have not seen memory exhausted, but with your suggestion, thank you, I will keep a closer eye on the memory. Normally, I'm watching CPU load.

As for a repeatable scenario, I offered one in my original message, running mysqldump for a complete database backup, for a database with at least 1 GB of disk space used, and 50 or more tables. My script runs several backups in a series, which may be a better scenario...

As for MySQL using too much resources, I did not have this issue with Mac OS X 5.5.8 and MySQL 5.5.23. After upgrading to Mac OS X 5.6.8 and MySQL 5.6.13, the problem appeared with no script changes in between. Then I and my partner worked on our backup script to run during the day, one backup at a time, and netted the connectivity loss with maybe only one other script running during those runs. I had to restart mysqld repeatedly. The result is having the backup process be the day's last job.

I do not understand your thought that a MySQL account name npiadmin suggests using a third-party product. Maybe I was not clear enough? I do use Perl and bash scripts. Nonetheless, the MySQL 5.6 documentation has been fine for assigning privileges to accounts. Thank you.

I am switching to another account that does not have SUPER PRIVILEGES. Now a few scripts are having problems, e.g. setting the definer in CREATE OR REPLACE VIEW statements. I will be taking care of that.

Another point. The InnoDB 4 GB space was 2 GB through the upgrade. I changed to 4 GB as one of my attempts to resolve the problem. Sigh. It did not.

This morning, I ran SHOW DATABASES, and maybe a SHOW CREATE TABLE. Whereas the current background job had nothing to do with the information_schema database (yes, I'm still focused on that while keeping eyes open for other causes), the following background job in the series could not connect. I will do a true test of this situation, noting the background job, and see if the problem reoccurs. I will let you know the results.

Could the ibdata1 file be an issue, even though the error log is not so indicating? I'm wondering if a black out period (local users only) with complete backups of all databases (excepting the MySQL supplied databases, mysql, test, information_schema and performance_schema), shutdown mysqld, maybe clear out the data directory or simply remove it, startup mysqld, then import each mysqldump file to recreate the respective database and tables and restore the tables' values, and along the way rebuild ibdata1. I would need to do some digging to see if this makes sense, and what are the proper steps.

I ask because would ibdata1 be updated between 5.5 and 5.6, if need be, and so should not be thought as a problem.

Still puzzling.

Thank you for your help.

This morning during the jobs run, the ability to connect as a user and as subsequent jobs was lost. I found another script that uses the information_schema database. I changed the script to no longer use the information_schema database and posted the changes to production. After the jobs completed just a bit ago, I was able to shutdown mysqld and restart it. I hope the change helps going forward.

I am also looking at the threads global status variables every twenty minutes as part of the script that examines a number of items and changes max_connections down and back up. I will see if there might be a threads issue, like reusing threads.

Thank you for the feedback.

We know about few similar issues with Information Schema. See, for example, bug #72322. I recommend you to wait when that bug is fixed. Meanwhile you can upgrade to version 5.7 and use memory summary tables in Performance Schema (http://dev.mysql.com/doc/refman/5.7/en/memory-summary-tables.html) to track when your server leaks memory.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".