Bug #83109 mysql_real_connect() is failing due to EINTR errors
Submitted: 22 Sep 2016 23:19 Modified: 22 Nov 2016 1:30
Reporter: Larry Adams Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Connection Handling Severity:S2 (Serious)
Version:5.7.15 OS:Linux
Assigned to: CPU Architecture:Any
Tags: EINTR mysql_real_connect

[22 Sep 2016 23:19] Larry Adams
Description:
After the fix of bug #82019 was commited, I thought we were out of the woods on this 4+ year old issue.  However, a friend of mine indicated that the server error counts were still climbing with the latest client.

So, I decided to test myself by ICMP pinging 1000 hosts with 100 threads.  Each thread during this process attempts to connect to MySQL, and there are periodic failures that mysql_real_connect() function still seems to allow the EINTR's return to the caller, which is a bug.

The other client issues such as mysql_query() EINTR's appear to be solved.  So, it must be in this connection logic that the error is being returned.

Using Cacti's 'spine' binary, this issue can be reproduced 100% of the time on a multi-core machine.

How to repeat:
1) Download/Install cacti 1.0 from github.

2) Download spine 1.0 from github.

3) Setup a large group of hosts that are generally reachable using ICMP

4) Setup spine to poll using ~100 threads

5) Instrument spine to print out a warning every time the mysql_real_connect() returns with MySQL error 2013 and EINTR.  It already there, you just need to add instrumentation.  See sql.c ~ line 263 which looks as follows:
if (error == 2013 && errno == EINTR) {

6) Make and setuid the spine binary

7) Run a debug run as follows: ./spine -R --first=1 --last=2000 -S

8) You will see multiple EINTR's instrumentation errors being thrown by the mysql_real_connect() function call.

Here is a run from my sandbox, every 'Her we go again 5' is an EINTR around mysql_real_connect() polling around 1000 hosts.

[root@hostname source]# ./spine -R --first=1 --last=2000 -S
SPINE: Using spine config file [spine.conf]
SPINE: Version 1.0.0 starting
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Her we go again 5
Device[1642] Hostname[www.somehost.com] ERROR: HOST EVENT: Device is DOWN Message: ICMP: Ping timed out
SPINE: Time: 17.5571 s, Threads: 40, Devices: 750
[root@hostname source]#

Suggested fix:
Not sure yet.  Still poking with your code.
[23 Sep 2016 4:06] MySQL Verification Team
not sure of exact fix,  but i see MYSQL_OPT_RETRY_COUNT isn't in the 5.7 source.
it was only in cluster trees afaik due to compatibility issues...
[11 Oct 2016 13:20] MySQL Verification Team
Thank you very much for your bug report.

You have provided us with a very elaborate test case. Is there a possibility of simplifying it ???

If you could discover the part of code in mysql_real_connect() where EINTR could have occurred,  it would help us very much in repeating the problem ourselves.

Also, is Linux the only OS where this problems occur or it can occur on Windows and macOS as well ???

Thanks in advance for your answers.
[11 Oct 2016 21:59] Roel Van de Paar
Sinisa, I do not believe that 'Need Feedback' is the correct status. A testcase that works was given. This bug should be marked as 'Verified' which still allows the original poster to add a shorter testcase if he chooses to do so. As it stands, this bug could be 'lost forever' if the original poster misses the request, which would be incorrect.
[12 Oct 2016 13:53] MySQL Verification Team
Roel,

In order to make a bug 'Verified", we must be able to repeat it. And this test cases requires hundreds of hosts. None of us has access to hundred hosts. The maximum number of hosts that I can access in a LAN is expressed in a single digit.

If we do not get a proper feedback soon, I will write a program in which I will try to make mysql_real_connect() to fail due to EINTR. But, if I do not succeed, then this will be "Not a bug".  Hence "Need feedback" is much better choice for the time being.
[12 Oct 2016 14:10] Larry Adams
To verify that it was not fixed, I just grabbed ~1500 hosts from the internet and used the ICMP ping.  The ICMP responses interrupt (EINTR) all threads, and sporadically cause the initial connections to fail.

Would you like a list of hostnames to make things easier?
[12 Oct 2016 14:47] MySQL Verification Team
Do I need a static IP on my machine for that purpose ???

My ISP changes my IP couple of times per diem.

More importantly, we have some house rules, which prohibit usage of hosts for bug verification, except for the internal hosts.

Also, ICMP is not the only way to cause EINTR. I can try to repeat the bug with a small program that would first not catch and then catch that interrupt.

Do you have anything against that approach ???
[12 Oct 2016 15:35] MySQL Verification Team
I once had a simple testcase on:
Bug 22336527 - MYSQL_REAL_CONNECT CAN FAIL WITH SYSTEM ERROR: 4 

I'll re-upload it here in case it can inspire a simpler reproduction of the issue.
It basically spams each thread with SIGUSR1 in a loop.
[12 Oct 2016 22:58] Roel Van de Paar
* There may be other ways to emulate hosts also.
* Larry has already verified his own bug. IMHO, I believe the bug status needs to be 'Verified'.
* Oracle does not have 1000 ICMP responding hosts internally? (Network department could be asked if such a test would be acceptable?)
[12 Oct 2016 23:06] Roel Van de Paar
I apologize for the last line above. Let me try again.

* Hopefully Oracle has 1000 ICMP responding hosts internally that could be used for an ICMP ping test.

If Sinisa finds a shorter script, that would be good too.
[13 Oct 2016 14:58] MySQL Verification Team
Fully verified. Without any signals, test program makes all 10.000 connections successfully.

With either SIGUSR1 or SIGINT, all connections fail !!!!!

The program that is written will be kept internally for further QA testings.

Here is the output:

 g++ -I../include -o test nobug83109.cc ../libmysql/Debug/libmysqlclient.a
[sinisa@local tmp]$ ./test 
signals sent: 1000
signals sent: 2000
signals sent: 3000
signals sent: 4000
signals sent: 5000
signals sent: 6000
signals sent: 7000
signals sent: 8000
signals sent: 9000
signals sent: 10000
signals sent: 11000
signals sent: 12000
signals sent: 13000
signals sent: 14000
signals sent: 15000
signals sent: 16000
signals sent: 17000
signals sent: 18000
signals sent: 19000
signals sent: 20000
signals sent: 21000
signals sent: 22000
signals sent: 23000
signals sent: 24000
signals sent: 25000
signals sent: 26000
signals sent: 27000
signals sent: 28000
signals sent: 29000
signals sent: 30000
signals sent: 31000
signals sent: 32000
signals sent: 33000
signals sent: 34000
signals sent: 35000
signals sent: 36000
signals sent: 37000
signals sent: 38000
signals sent: 39000
signals sent: 40000
signals sent: 41000
signals sent: 42000
signals sent: 43000
signals sent: 44000
signals sent: 45000
signals sent: 46000
signals sent: 47000
signals sent: 48000
signals sent: 49000
signals sent: 50000
signals sent: 51000
signals sent: 52000
signals sent: 53000
signals sent: 54000
signals sent: 55000
signals sent: 56000
signals sent: 57000
signals sent: 58000
signals sent: 59000
signals sent: 60000
signals sent: 61000
signals sent: 62000
signals sent: 63000
signals sent: 64000
signals sent: 65000
signals sent: 66000
signals sent: 67000
signals sent: 68000
signals sent: 69000
signals sent: 70000
signals sent: 71000
signals sent: 72000
signals sent: 73000
signals sent: 74000
signals sent: 75000
signals sent: 76000
signals sent: 77000
signals sent: 78000
signals sent: 79000
signals sent: 80000
signals sent: 81000
signals sent: 82000
signals sent: 83000
signals sent: 84000
signals sent: 85000
signals sent: 86000
signals sent: 87000
signals sent: 88000
signals sent: 89000
signals sent: 90000
signals sent: 91000
signals sent: 92000
signals sent: 93000
signals sent: 94000
signals sent: 95000
signals sent: 96000
signals sent: 97000
signals sent: 98000
signals sent: 99000
signals sent: 100000
signals sent: 101000
signals sent: 102000
signals sent: 103000
signals sent: 104000
signals sent: 105000
signals sent: 106000
signals sent: 107000
signals sent: 108000
signals sent: 109000
signals sent: 110000
signals sent: 111000
signals sent: 112000
signals sent: 113000
signals sent: 114000
signals sent: 115000
signals sent: 116000
signals sent: 117000
signals sent: 118000
signals sent: 119000
signals sent: 120000
signals sent: 121000
signals sent: 122000
signals sent: 123000
signals sent: 124000
signals sent: 125000
signals sent: 126000
signals sent: 127000
signals sent: 128000
signals sent: 129000
signals sent: 130000
signals sent: 131000
signals sent: 132000
signals sent: 133000
signals sent: 134000
signals sent: 135000
signals sent: 136000
signals sent: 137000
signals sent: 138000
signals sent: 139000
signals sent: 140000
signals sent: 141000
signals sent: 142000
signals sent: 143000
signals sent: 144000
signals sent: 145000
signals sent: 146000
signals sent: 147000
signals sent: 148000
signals sent: 149000
signals sent: 150000
signals sent: 151000
signals sent: 152000
signals sent: 153000
signals sent: 154000
signals sent: 155000
signals sent: 156000
signals sent: 157000
signals sent: 158000
signals sent: 159000
signals sent: 160000
signals sent: 161000
signals sent: 162000
signals sent: 163000
signals sent: 164000
signals sent: 165000
signals sent: 166000
signals sent: 167000
signals sent: 168000
signals sent: 169000
signals sent: 170000
signals sent: 171000
signals sent: 172000
signals sent: 173000
signals sent: 174000
signals sent: 175000
signals sent: 176000
signals sent: 177000
signals sent: 178000
signals sent: 179000
signals sent: 180000
signals sent: 181000
signals sent: 182000
signals sent: 183000
signals sent: 184000
signals sent: 185000
signals sent: 186000
signals sent: 187000
signals sent: 188000
signals sent: 189000
signals sent: 190000
signals sent: 191000
signals sent: 192000
signals sent: 193000
signals sent: 194000
signals sent: 195000
signals sent: 196000
signals sent: 197000
signals sent: 198000
signals sent: 199000
signals sent: 200000
signals sent: 201000
signals sent: 202000
signals sent: 203000
signals sent: 204000
signals sent: 205000
signals sent: 206000
signals sent: 207000
Successfull connections: 10000
Failed connections:      0
[sinisa@local tmp]$ g++ -I../include -o test bug83109.cc ../libmysql/Debug/libmysqlclient.a
[sinisa@local tmp]$ ./test 
mysql_real_connect: mysql_error: Can't connect to MySQL server on '127.0.0.1' (4) mysql_errno: 2003, errno=4
mysql_real_connect: mysql_error: Lost connection to MySQL server at 'waiting for initial communication packet', system error: 4 mysql_errno: 2013, errno=4
mysql_real_connect: mysql_error: Lost connection to MySQL server at 'waiting for initial communication packet', system error: 4 mysql_errno: 2013, errno=4
mysql_real_connect: mysql_error: Lost connection to MySQL server at 'waiting for initial communication packet', system error: 4 mysql_errno: 2013, errno=4
mysql_real_connect: mysql_error: Can't connect to MySQL server on '127.0.0.1' (4) mysql_errno: 2003, errno=4
mysql_real_connect: mysql_error: Lost connection to MySQL server at 'waiting for initial communication packet', system error: 4 mysql_errno: 2013, errno=4
mysql_real_connect: mysql_error: Lost connection to MySQL server at 'waiting for initial communication packet', system error: 4 mysql_errno: 2013, errno=4
mysql_real_connect: mysqsignals sent: 1000
signals sent: 2000
signals sent: 3000
signals sent: 4000
signals sent: 5000
signals sent: 6000
signals sent: 7000
signals sent: 8000
signals sent: 9000
signals sent: 10000
signals sent: 11000
signals sent: 12000
signals sent: 13000
signals sent: 14000
signals sent: 15000
signals sent: 16000
signals sent: 17000
signals sent: 18000
signals sent: 19000
signals sent: 20000
signals sent: 21000
signals sent: 22000
signals sent: 23000
signals sent: 24000
signals sent: 25000
signals sent: 26000
signals sent: 27000
signals sent: 28000
signals sent: 29000
signals sent: 30000
signals sent: 31000
signals sent: 32000
signals sent: 33000
signals sent: 34000
signals sent: 35000
signals sent: 36000
signals sent: 37000
signals sent: 38000
signals sent: 39000
signals sent: 40000
signals sent: 41000
signals sent: 42000
signals sent: 43000
signals sent: 44000
Successfull connections: 0
Failed connections:      10000
[14 Oct 2016 11:47] Larry Adams
Please keep us posted on the outcome of the bug hunt.
[30 Oct 2016 16:16] Larry Adams
Any updates?
[31 Oct 2016 13:42] MySQL Verification Team
Bugs like this one will take very long time for the proper internal verification, diagnosis of causes, design of solution and scheduling of the work on the fix. We have so many outstanding bugs, so there will be  a queue.
[18 Nov 2016 7:49] Erlend Dahl
Fixed in 8.0.0 under the heading of

Bug#22336527 MYSQL_REAL_CONNECT CAN FAIL WITH SYSTEM ERROR: 4
[22 Nov 2016 1:30] Larry Adams
Erlend,

When would you expect this to be released for people to leverage?  In what releases?

Larry