Bug #84847 Calls that access the network hang at 100% CPU in bad connectivity environments
Submitted: 6 Feb 2017 17:44 Modified: 10 Apr 2017 15:07
Reporter: Martin Bonac Email Updates:
Status: Can't repeat Impact on me:
None 
Category:Connector / C Severity:S1 (Critical)
Version:5.7 OS:MacOS (10.12.2)
Assigned to: CPU Architecture:Any

[6 Feb 2017 17:44] Martin Bonac
Description:
do a mysql_fetch_row, or mysql_real_connect, or any call that accesses the network with a bad connection like poor wifi. randomly some internal ssl functions will hang at 100% CPU.

How to repeat:
it seems to easily be repro'ed by doing a large query, do a mysql_use_result, then loop with mysql_fetch_rows, hit pause in the debugger randomly so its inside the mysql lib, wait 15 secs, unpause and watch your CPU be at 100% indefinitely with no progress.

Suggested fix:
don't hang and spin the CPU
[9 Feb 2017 12:14] Chiranjeevi Battula
Hello Martin Bonac,

Thank you for the bug report.
Could you please provide repeatable test case (exact steps/sample code, create table statements/database details etc. - please make it as private if you prefer) to confirm this issue at our end?

Thanks,
Chiranjeevi.
[10 Feb 2017 2:22] Martin Bonac
{
      auto mySql = mysql_init(nullptr);
      
      mysql_real_connect(mySql, "host", "user", "password", nullptr, 0, nullptr, 0);
  
      mySql->reconnect = 1;
  
      mysql_select_db(mySql, "schema");
  
      auto statement =
        "SELECT * FROM table";
      mysql_real_query(mySql, statement, strlen(statement));/// Try to be in a bad wifi area, and/or pause the debugger during the download, wait for 10 secs, then resume and watch it hang at 100% CPU
  
      mysql_close(mySql);
    }
[10 Feb 2017 2:26] Martin Bonac
i've even added this after i connect:

      int readTimeout = 15;
      mysql_options(mySql, MYSQL_OPT_READ_TIMEOUT, &readTimeout);

and it won't timeout either
[16 Mar 2017 22:24] Martin Bonac
Is there any progress on this?
[24 Mar 2017 11:09] Chiranjeevi Battula
Hello Martin Bonac,

Thank you for your feedback.
Could you please provide more details like does the program always hang or is it intermittent and full call stack for the program when it hangs (- please make it as private if you prefer) to confirm this issue at our end?

Thanks,
Chiranjeevi.
[27 Mar 2017 15:40] Martin Bonac
repro is 100% if you pause the debugger during a download and wait like 10 secs.

* thread #1: tid = 0x73fef2, 0x000000010617d541 liblvphp.so`yaSSL::SSL::GetError(this=0x0000000103025800) const + 1 at yassl_int.cpp:1514, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x000000010617d541 liblvphp.so`yaSSL::SSL::GetError(this=0x0000000103025800) const + 1 at yassl_int.cpp:1514 [opt]
    frame #1: 0x000000010616c3e9 liblvphp.so`yaSSL::receiveData(ssl=0x0000000103025800, data=0x00007fff5fbfdd30, peek=false) + 25 at handshake.cpp:1045 [opt]
    frame #2: 0x000000010616eb2e liblvphp.so`::yaSSL_read(ssl=<unavailable>, buffer=<unavailable>, sz=<unavailable>) + 62 at ssl.cpp:363 [opt]
    frame #3: 0x000000010615286e liblvphp.so`vio_ssl_read(vio=<unavailable>, buf=<unavailable>, size=<unavailable>) + 46 at viossl.c:186 [opt]
    frame #4: 0x000000010612cef0 liblvphp.so`net_read_packet(st_net*, unsigned long*) + 24 at net_serv.cc:670 [opt]
    frame #5: 0x000000010612ced8 liblvphp.so`net_read_packet(st_net*, unsigned long*) [inlined] net_read_packet_header(net=<unavailable>) + 20 at net_serv.cc:760 [opt]
    frame #6: 0x000000010612cec4 liblvphp.so`net_read_packet(net=<unavailable>, complen=<unavailable>) + 36 at net_serv.cc:820 [opt]
    frame #7: 0x000000010612cd88 liblvphp.so`::my_net_read(net=<unavailable>) + 440 at net_serv.cc:1004 [opt]
    frame #8: 0x000000010611feac liblvphp.so`cli_safe_read_with_ok(mysql=0x000000010301b400, parse_ok='\0', is_data_packet="") + 172 at client.c:1054 [opt]
    frame #9: 0x0000000106122409 liblvphp.so`read_one_row [inlined] cli_safe_read(mysql=0x000000010301b400, is_data_packet=<unavailable>) + 7 at client.c:1187 [opt]
    frame #10: 0x0000000106122402 liblvphp.so`read_one_row(mysql=0x000000010301b400, fields=2, row=0x0000000103b597c0, lengths=0x0000000103b59a90) + 34 at client.c:2414 [opt]
    frame #11: 0x0000000106127513 liblvphp.so`mysql_fetch_row(res=0x0000000103b599f0) + 99 at client.c:5370 [opt]
    ...

call stack is not the same for every time i pause, but seems to diverge at the first "net_read_packet"
[28 Mar 2017 11:37] Chiranjeevi Battula
Hello Martin Bonac,

Thank you for your feedback.
Application is waiting after call to net_read_packet() this means that the application is waiting for server to send data and server has not sent anything.

Thanks,
Chiranjeevi.
[28 Mar 2017 15:01] Martin Bonac
Sure. The bug is, calls that access the network hang at 100% CPU in bad connectivity environments. You can simulate bad connectivity by pausing the debugger for like 5 secs during the query, then resume, and then the CPU jumps to 100% and NEVER recovers. Even with a timeout set with MYSQL_OPT_READ_TIMEOUT. It is stuck in some infinite loop in the sqlclient.
[28 Mar 2017 15:05] Martin Bonac
the call stack i gave you was when it was already in this 100% CPU non recoverable state. once in this state, i then randomly paused the debugger to print out a backtrace for you as per requested
[10 Apr 2017 13:18] Chiranjeevi Battula
Hello Martin Bonac,

Thank you for the feedback.
I tried to reproduce the issue at my end using MySQL Connector / C 6.1.9, MySQL 5.7.17 but not seeing any hanging in weak network.
If you can provide more information, feel free to add it to this bug and change the status back to 'Open'.

Thank you for your interest in MySQL.

Thanks,
Chiranjeevi.
[10 Apr 2017 15:07] Martin Bonac
did you pause the debugger during mysql_real_query? for me its 100% repro