MySQL Bugs: #90581: Unable to open new connections while InnoDB pager cleaner is running

Bug #90581	Unable to open new connections while InnoDB pager cleaner is running
Submitted:	23 Apr 2018 17:33	Modified:	3 Jul 2018 6:36
Reporter:	monty solomon	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.7.18, 5.7.21	OS:	CentOS
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
We sometimes experience failures opening new connections to master servers in different clusters while InnoDB is running the page_cleaner.

How to repeat:
For example, the error log contains the message

2018-04-23T14:11:04.815576Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 9846ms. The settings might not be optimal. (flushed=8, during the time.)

There were app exceptions trying to use the master

JdbcCircuitBreakerException
MySQL Circuit breaker is open
2018-04-23 14:10:57

HikariPool.createTimeoutException
2018-04-23 14:10:57

Suggested fix:
Make sure new connections are not prevented while running the InnoDB page_cleaner.

Hi,

I'm not sure why you consider this a bug, the connection issue is on the client's side; client should notice the server is busy and try to reconnect after short delay?

Optimizing server so this situations are rare is something you can discuss with MySQL Support team but I'm going to need something more to accept this to be a bug.

kind regards
Bogdan

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

We use the HikariCP connection pool and it does reconnect but there are user facing actions that fail due to the server being too busy to accept a new connection.

Internal features of the server should monitor themselves and not cause the server to unreachable.

Hi Monty,

Server is busy, no internal monitoring can change the fact that server is busy and can't handle new connection till it's less busy? Properly sizing and configuring server for a certain task is a must on a serious system. We, of course, work on optimizing the MySQL server to squeeze more from the same hardware and get server to be faster and more responsive but I don't see how is this a bug. 

kind regards
Bogdan

The server is multi-threaded and should be able to monitor its own activity via other threads.  The server is operating the page cleaner and other open connections are continuing to operate properly.  The server has plenty of reserve capacity for CPU, memory, disk, and IOPS. It is not undersized.

It can be a bug when one part of the system is unexpectedly blocking another part of the system.  Why is the server unable to open new connections while the page cleaner is running?  Is the page cleaner grabbing a mutex or some other resource that it shouldn't?  Is it holding on to the mutex or other resource longer than it should?  Is it providing a higher priority to the page cleaner thread over the new connection thread?

The server should not refuse to open a connection unless the connection limit was reached and/or other resources were consumed. The reason(s) for not opening a connection should be logged to the error log.

Thank you.