Description:
I have experienced some recent pileups on some of our systems. One cause was the larger than normal number of user accounts on the server. Eric provided a patch to improve things. See: https://bugs.mysql.com/90244.
However, connections can be slow for a number of reasons:
* number of connections happening at once
* latency of the connection handling processing
* interactions between different threads etc
It's not currently possible as far as I'm aware to see metrics on connection times how long they take and latency measurements. This would be most useful when you see connection pile ups to be able to measure changes in server behaviour and see if the problem is here or elsewhere in the (mysqld) code.
How to repeat:
Trigger a pile up of connections. Most apps will either show that the connection time takes a long time but the reason may not be clear.
Suggested fix:
As stated above. Provide P_S metrics on connection processing so that we can detect where latency happens in the whole connection flow:
* connection handling
* authentication timings
Outside of this authorisation timings would be good as while Eric's patch helps reduce the latency if the number of users is high some people configure users by network so may have user1@'10.0.0.%', user1@'10.0.1.%', user1@'10.0.2.%' or similar combinations. The code seems to scan through the network part after finding the user part and so if this list is long it may also generate some performance issues. (I'm not sure at what point a map/hash night help here).
Anyway: more and better monitoring of these timing attributes would be useful.