Bug #94219 libmysqlclient enters and infinite loop and consume CPU usage 100%
Submitted: 6 Feb 2019 12:07 Modified: 21 Feb 2019 4:04
Reporter: Masaaki HIROSE Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: C API (client library) Severity:S2 (Serious)
Version:5.7.25 OS:Linux
Assigned to: CPU Architecture:x86

[6 Feb 2019 12:07] Masaaki HIROSE
Description:
A process satisfies the following conditions will eventually get stuck in an infinite loop in libmysqlclient.so when calling mysql_real_query() and consume CPU usage (user%) 100%.

- mysql_real_connect() with disabling reconnect option
- received any signals
- exceeds `wait-timeout` and disconnect the connection by MySQL server
- call mysql_real_query()
- -> entering an infinite loop and cosume CPU usage

How to repeat:
Set mysqld's wait-timeout to 10 and compile, execute the following code.

https://gist.github.com/hirose31/3440065a7bdc9d77f1c70ec8bc007ad5
[6 Feb 2019 12:46] Masaaki HIROSE
gdb backtrace:
https://gist.github.com/hirose31/3440065a7bdc9d77f1c70ec8bc007ad5#file-backtrace
[7 Feb 2019 14:10] MySQL Verification Team
Working on it .....
[7 Feb 2019 14:21] MySQL Verification Team
Hi,

I have analysed your example code and I need some feedback from you.

First of all, 5.7.25 is not yet out, so I am interested whether this occurs on 5.7.24 or not ????

Second and much more important, have I seen well, but you are forking your program using our C API ??? If that is the case, then we must inform you that we do not support that. We recommend multi-threading of your programs. Due to the forking, you get 100 % CPU usage. You can achieve the same result without our C API and without our server.

Third, if your program sleeps longer than wait_timeout, connection will be broken. Hence, your program is not written well, because you need to re-connect.  I do not see in your program where do you re-connect !!!

In short you can write a program with any API that will crash any time, but it does not make it a bug.

In my opinion, this is definitely not a bug, but I am leaving to you to change your program and write it properly and show us a bug.
[8 Feb 2019 0:57] Tsubasa Tanaka
MySQL 5.7.25 has been released at 2019-01-21.

https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-25.html
[8 Feb 2019 5:58] Masaaki HIROSE
Hi,

Thanks for your responce.

> Second and much more important, have I seen well, but you are forking your program using our C API ???

Forking in the program is to easily reproduce this problem. Even without fork, if you send the USR1 signal (kill -USR1 PID) within 20 seconds from another terminal, it will be the same result (entering infinite loop).

> Third, ... I do not see in your program where do you re-connect !!!

If the connection is disconnected, I want to detect disconnecting and terminate the program with an error without reconnecting. Because reconnection has various side effects. https://dev.mysql.com/doc/refman/5.7/en/c-api-auto-reconnect.html

I expected mysql_real_query() to return CR_SERVER_GONE_ERROR if it was disconnected, but mysql_real_query() does not return, as gdb's backtrace shows, it loops infinitely and consumes almost 100% CPU time.

Supplementarily, mysql_ping() also goes into an infinite loop in the same way. So there is no way to check if the connection is disconnected.

> In my opinion, this is definitely not a bug, but I am leaving to you to change your program and write it properly and show us a bug.

- It is the documented behavior that mysql_real_query() and mysql_ping() return CR_SERVER_GONE_ERROR when the connection is disconnected.
  https://dev.mysql.com/doc/refman/5.7/en/mysql-real-query.html
  https://dev.mysql.com/doc/refman/5.7/en/mysql-ping.html
- MySQL 5.6 (libmysqlclient 18) and 8.0 (libmysqlclient 21) work as documented
- MySQL 5.7 (libmysqlclient 20) behaves differently from the document, it also enters an infinite loop and consumes CPU time
- It is a very big problem for system to eat up CPU time

For the above reasons I think this is definitely a bug. Perhaps you do not think so, please ask your team colleagues to run my reproduction program and their options.
[8 Feb 2019 13:13] MySQL Verification Team
Hi,

First of all, I do not need to ask my colleagues to do anything, since I was involved in designing and coding the entire client-server protocol.

Second, in which fork of your program you do not get the proper error ???? Hence, remove the fork.

Third and last, look at our example programs in the source distribution and see how should these programs be written.

Not a bug.
[12 Feb 2019 9:43] Masaaki HIROSE
* How to reproduce:

1. connect to mysqld via TCP/IP (not unix domain socket) without reconnect option
2. recieve signal
3. disconnect from mysqld
  - stop mysqld
  - exceed wait-timeout or interactive_timeout
  - and so on...
# It can reproduce that 3 (disconnect from mysqld) and then 2 (recieve signal)
4. do query (mysql_real_query() or mysql_ping())
5. -> entering infinite loop and consume almost 100% CPU time

* How to reproduce with "mysql" command

$ mysql --version
mysql  Ver 14.14 Distrib 5.7.25, for Linux (x86_64) using  EditLine wrapper

$ mysql --disable-reconnect --sigint-ignore -uroot -h127.0.0.1

(type Control-C or kill -INT <PID of mysql> from another terminal)
mysql> ^C

(stop mysqld from another terminal to disconnect from mysqld)
$ sudo service mysql stop

(query)
mysql> select 1;

This query does not return and consume almost 100% CPU time.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28191 hirose31  20   0   32504   6080   5476 R  99.7  0.3   0:17.94 mysql
[12 Feb 2019 13:33] MySQL Verification Team
Hi,

If you disable reconnect and exceed wait/interactive timeout, during which MySQL server is shutdown, then what would you expect to happen ????

If client is disconnected and is forbidden to re-connect, or to react to signals, what should be the correct behaviour, in your opinion ????
[12 Feb 2019 14:35] Masaaki HIROSE
> If you disable reconnect and exceed wait/interactive timeout, during which MySQL server is shutdown, then what would you expect to happen ????
> If client is disconnected and is forbidden to re-connect, or to react to signals, what should be the correct behaviour, in your opinion ????

I expect it returns CR_SERVER_GONE_ERROR immediately, that's documented behavior <https://dev.mysql.com/doc/refman/5.7/en/mysql-real-query.html>, and libmysqlclient 18 (MySQL 5.6), libmysqlclient 21 (MySQL 8.0) behavie so, but only limysqlclient 20 (MySQL 5.7) behaves differently (entering infinite loop). That's why I've reported this.
[12 Feb 2019 17:57] MySQL Verification Team
Hi,

I was unable to repeat hanging with 5.7.25.

I used both mysql CLI and your program. 

I have removed fork() and the signals. I test only what we support.

I have connected via TCP/IP.

With reconnect disabled, I just get the error:

ERROR 2013 (HY000): Lost connection to MySQL server during query

I can send you a revised version of your program, without signals and forks.

We can verify only what we support ......
[13 Feb 2019 5:16] Masaaki HIROSE
Please try to reproduce on Ubuntu 18.04 or 18.10.
[13 Feb 2019 7:22] Masaaki HIROSE
I've confirmed reproduction on:
- Ubuntu 18.10, mysql-community-client 5.7.25-1ubuntu18.10 
- Ubuntu 18.04, mysql-community-client 5.7.25-1ubuntu18.04
- Ubuntu 16.04, mysql-community-client 5.7.25-1ubuntu16.04
- CentOS 7.6, mysql-community-client-5.7.25-1.el7.x86_64
- Fedora 29, mysql-community-client-5.7.25-1.fc29.x86_64

And I've found an additional condition to reproduce:
- connect to mysqld which running on same host as client (-h127.0.0.1 or -hMY_LOCAL_IP_ADDRESS)
[20 Feb 2019 14:49] MySQL Verification Team
Hi,

I have  access only to Oracle Linux and macOS. But, this is irrelevant .....

I do not see why would this report be OS-specific.
[21 Feb 2019 4:04] Masaaki HIROSE
It was on Ubuntu 18.04 that I first found this problem.

As mentioned in the previous issue, I could reproduce on other Linux distributions as well.

Today, I've confirmed to reproduce this problem on Oracle Linux 7.6 (an EC2 instance created from the official AMI "OL7.6-x86_64-HVM-2019-01-29 (ami-054e85339904efdef)").

$ cat /etc/oracle-release
Oracle Linux Server release 7.6

$ mysql --version
mysql  Ver 14.14 Distrib 5.7.25, for Linux (x86_64) using  EditLine wrapper

Now I think this problem is not distribution specific.

I hear you can access Oracle Linux, I hope you could reproduce this problem.
[21 Feb 2019 13:58] MySQL Verification Team
Thank you for your feedback.

As I wrote, I have tested this on Oracle Linux. The version that I am using is 7.2.

I must emphasise that I did not use your original program. I used slightly changed version, without any fork() and without signal handlers. Anyway, your signal handler is not properly written as it does no cleaning jobs, like closing connections etc ...

I have followed all of your other steps to reproduce  to the letter.
[30 Oct 2019 7:15] Naoki Inada
It seems this bug is duplicate of #88428
[4 Nov 2019 14:13] MySQL Verification Team
Hi,

We do not mark bugs that are duplicates of the other reports that turn out not to be bug. We mark report as a duplicate, only if the original bug is verified or fixed.