Bug #72796 File descriptors exhaustion gets unreported / threads will starve forever
Submitted: 29 May 2014 8:08 Modified: 30 Mar 2018 11:35
Reporter: Slawomir Pryczek Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.6 OS:Linux (Fedora)
Assigned to: CPU Architecture:Any

[29 May 2014 8:08] Slawomir Pryczek
Description:
I'm reporting as 5.6, because im seeing similar behaviours on many servers ( from 5.6 branch and also on mariadb - https://mariadb.atlassian.net/browse/MDEV-6277 )

It appears that the server is getting deadlocked when running out of linux file  descriptors or table_open_cache is too high?

Basically im getting like 1000 threads lingering in "waiting for metadata lock" running for half hour (or a couple of hours on another ocassion). While just one from these 1000 threads is in "Sending data". Doing multi join on 10 small, indexed tables (by primary IDs)... this normally takes about 2-3 seconds. VMstats report 0 reads and 0 writes during that time.

Then after lowering table_open_cache from 400 to 64 everyting gets executed within a couple of seconds.

How to repeat:
Lower open files limit for mysql process eg. to 8, set table_open_cache to 400 and run some insert/select queries in parallel

Suggested fix:
Add some (even hardcoded) timeout to opening files and show these errors as critical in dmesg. Basically this is impossible to diagnose, because i can't see anything wrong in error log / dmesg.

Can it be some other limit that prevented queries to execute (epoll / etc?)
[29 May 2014 8:49] MySQL Verification Team
the timeout for "waiting for metadata lock"  is --lock-wait-timeout option, which is 1 year by default...
[29 May 2014 8:51] Slawomir Pryczek
But this timeout could kill queries where something will need in fact more time to run, eg. 2 long running queries issued on one table, so it won't fix the starvation problem if there is one, because it can't be set to prevent starvation while allowing long queries to run.
[29 May 2014 9:05] Slawomir Pryczek
Actually you see what my problem is... if im running out of file descriptors, i want to have maybe not even a timeout but some error logged, like "Input error, opening file X is taking more time than expected (## seconds), try increasing file descriptors limit". It took us 3 days to diagnose, because no errors / warnings are reported and the bug just happened when we run some more complex queries in crons. The threads would probably sit there, like you said for a year before mysql giving error message completely unrelated to problem...

The warning for file open taking too long should be like after 2-3 seconds i guess...

Any recommendation for this limit on systems with 128GB of ram and 16/32 cores?
[29 Mar 2018 15:35] MySQL Verification Team
Hi Mr. Pryczek,

Unfortunately, we can not implement what you desire, although it would be nice. Problem is that only some operating systems, like latest macOS, are capable of returning the error code that would be unique to running out of file descriptors. Even those operating systems that have error code, like macOS, do not return it consistently.

Regarding your last question, my opinion is that 16K would be sufficient number for your case. You have to check the limits on your OS and if those are insufficient to increase both the soft and hard limits for those. Manuals for Linux contain all the info required.

Any other questions ......
[29 Mar 2018 16:10] Slawomir Pryczek
Ok i understand, for the limits, etc. i know to increase, thanks. As for another comment maybe you can detect the limit on server startup and at least add a warning to dmesg or log, if the file description limit is very low and it can make the server to operate improperly...

You should be able to compute approximated number of descriptors needed based on config... something like "WARNING: File descriptor limit is considerably lower than required for current configuration, the server can operate improperly, consider increasing the limit from XXX to at least YYY".

Thanks,
Slawomir.
[30 Mar 2018 11:35] MySQL Verification Team
Actually, there is a warning on the startup. If you setup table cache configuration too high and number of the available file descriptors is too low, you will get a warning in the general log.

What we can't do at startup is count how many tables are there, as it would make no sense and would take a long time. You may be using only 10 % of those, for example.