Bug #88108 A non-blocking socket operSystemErroration could not be completed immediately
Submitted: 16 Oct 2017 10:14 Modified: 25 Apr 2018 9:16
Reporter: Krish KM Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Router Severity:S4 (Feature request)
Version:2.1.4 OS:Windows (Server2016 x64)
Assigned to: CPU Architecture:Any
Tags: Copy server-client failed, MySQL Router, non-blocking, systemError

[16 Oct 2017 10:14] Krish KM
Description:
Hi all,

we are moving from win2008R2x64 server to 2016 x64 and the MySql Router is giving us very hard time.

We have two servers:
ServerA: Internal no public access
win 2016 x64 server (fresh built)
.net 4.7 installed
c++ redistributable x64 (2013 & 2015) installed
Domain Connected

MySQl server 5.7.20

ServerB: External Server
win 2016 x64 server (fresh built)
.net 4.7 installed
c++ redistributable x64 (2013 & 2015) installed
NOT domain connected
private connection to serverA

Router installed with configuration
[routing:read_write]
bind_address = 0.0.0.0
bind_port = 3306
destinations = ServerA:3306
mode = read-write

Problem:
Routing work when we select 10 records and throws below error when whe select more than 100 records:
[routing:read_write] Routing stopped (up:2905680b;down:1719b) Copy server-client failed: SystemError: A non-blocking socket operation could not be completed immediately.

Catch:
*we moved to Microsoft Azure.

we have another private ServerC same setup like ServerB. between serverB and ServerC routing works even if we select 50k records. 
Not sure whats going on. would be really helpful if you could shed some light.

Many thanks
krish

How to repeat:
Have two server.
ServerA domain connected(internal connection only)
ServerB public access, private connection to serverB
install MySQL router on ServerB pointing to ServerA

Try selecting a table with 50k records. Selection fails with an error "A non-blocking socket operation could not be completed immediately"
[17 Oct 2017 16:21] MySQL Verification Team
Hi,

So you have on azure mysqld and router and you connect to router and if you query more then 50k records it fails? What happens if you connect directly to mysql going around router?

I don't think this is a bug, I thing this is either some azure issue or missconfiguration of your mysqld but in any way not a bug.

all best
Bogdan
[19 Oct 2017 0:48] Krish KM
Hi Bogdan,

I have eventually got it working. Here are some observations from my side.

when I connect directly using Workbench, I'm able to select & retrieve >50k rows.So MySQL server wasn't the issue  not the network between SERVERA & B
I've reinstalled x86 & x64 c++ runtimes 2013 & 2015 (especially 14.0.24215 version of 2015) on serverb

At this time the error was still there. I also realised server-handshaking timesout 3 times per packet before it received an invalid packet.
I then recompiled a fresh copy of MySQL Router from the source and changed the "MySQLRouting::start_acceptor()" procedure, altering " if (service_tcp_ >0) routing::set_socket_blocking(service_tcp_, true);" thinking the socket should be blocked until it reads all the packets instead of using UDPs/non-blocking?

This fixed the problem(temporarily) but not sure whether this is a correct approach or If it would have any side-effects on concurrent connections/executions.

Can you help me to understand why this would happen and would it be possible to include an option in .conf to specify whether tcp should use blocked socket?

Many thanks and have a nice day
krish
[19 Oct 2017 14:38] MySQL Verification Team
Hi,

I still can't reproduce this inside test environment not set on AZURE so for now I still call it "azure problem" and not "mysql problem", but since you changed the sev to feature request, the "should it use blocking or non-blocking" kinda makes sense so I'll let router lead decide

all best
Bogdan
[25 Apr 2018 9:06] MySQL Verification Team
Does mysqlrouter 2.1.5 solve this?  I see a related fix in 
https://dev.mysql.com/doc/relnotes/mysql-router/en/mysql-router-news-2-1-5.html
[25 Apr 2018 9:16] Krish KM
I haven't tried the latest version as my compiled version changing from blocked to non blocked socket worked and still working. I didn't have the time to take off a working system.

Having said that, changes in 2.1.5 says: 
"Router assumed that a resulting socket from accept()ing a socket would be always blocking. On Solaris and Windows this assumption is not valid, and this resulted in broken connections with large result sets. (Bug #26834769)"

This was defo my issue and I think dev team  have recognised this bug and fixed it. thanks for pointing this out. all the best