Bug #97276 MySQL startup hangs on /dev/random
Submitted: 17 Oct 2019 19:48 Modified: 18 Oct 2019 15:09
Reporter: cPanel, LLC Senior Tech's Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.7.28 OS:CentOS
Assigned to: CPU Architecture:Any

[17 Oct 2019 19:48] cPanel, LLC Senior Tech's
Description:
This issue has been seen on:
Centos 7.7.1908 running via virtuozzo
Centos 6.10 running via bare metal

The issue is more pronounced on virtualized systems since they tend to have lower entropy than bare metal

After upgrading from MySQL 5.7.27 to 5.7.28, MySQL was unable to fully start, with log lines like below:
====
2019-10-14T12:07:42.575432Z 0 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2019-10-14T12:07:42.575437Z 0 [Note] InnoDB: Unable to open the first data file
2019-10-14T12:07:42.575454Z 0 [ERROR] InnoDB: Operating system error number 11 in a file operation.
2019-10-14T12:07:42.575471Z 0 [ERROR] InnoDB: Error number 11 means 'Resource temporarily unavailable'
2019-10-14T12:07:42.575477Z 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2019-10-14T12:07:42.575481Z 0 [ERROR] InnoDB: Cannot open datafile './ibdata1'
2019-10-14T12:07:42.575485Z 0 [ERROR] InnoDB: Could not open or create the system tablespace. If you tried to add new data files to the system tablespace, and it failed here, you should now edit innodb_data_file_path in my.cnf back to what it was, and remove the new ibdata files InnoDB created in this failed attempt. InnoDB only wrote those files full of zeros, but did not yet use them in any way. But be careful: do not remove old data files which contain your precious data!
2019-10-14T12:07:42.575497Z 0 [ERROR] InnoDB: Plugin initialization aborted with error Cannot open a file
2019-10-14T12:07:42.876183Z 0 [ERROR] Plugin 'InnoDB' init function returned error.
2019-10-14T12:07:42.876239Z 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2019-10-14T12:07:42.876246Z 0 [ERROR] Failed to initialize builtin plugins.
2019-10-14T12:07:42.876250Z 0 [ERROR] Aborting

2019-10-14T12:07:42.876256Z 0 [Note] Binlog end
2019-10-14T12:07:42.876297Z 0 [Note] Shutting down plugin 'CSV'
2019-10-14T12:07:42.876307Z 0 [Note] Shutting down plugin 'MyISAM'
2019-10-14T12:07:42.876526Z 0 [Note] /usr/sbin/mysqld: Shutdown complete
====

It appears this change has forced the use of /dev/random. Is it possible to use /dev/urandom instead?

/dev/random and /dev/urandom use the same CSPRNG(cryptographically secure pseudorandom number generator). CSPRNG should be suitable for computationally secure RNG; information-theoretic security seems overkill, especially when it can lead to a failure to start the service, a change introduced in a point release.

Strangely enough, on the same system, generating an x509 SSL CRT via openssl command line does not use /dev/random, it uses /dev/urandom. I am not familiar enough with the openssl-libs to know if there is a function/flag that could be used to force the use of /dev/urandom over /dev/random.

Additionally, it is strange to me that frame #1 is using wait_random_seeded(), a function I believe is introduced in opnessl 1.1.1d. This system has no openssl 1.1.1 installed:
====
# rpm -qa openssl\*
openssl-devel-1.0.2k-19.el7.x86_64
openssl-1.0.2k-19.el7.x86_64
openssl-libs-1.0.2k-19.el7.x86_64
====

Does MySQL intentionally use /dev/random?
Are there any options that MySQL can take to force the use of /dev/urandom, or is this considered an upstream issue with openssl?
Can you shed any light on the wait_random_seeded() function from openssl 1.1.1d when that library isn't currently installed?

How to repeat:
I don't know how to force 0 entropy on a system, but the fact that MySQL 5.7.28 reads from /dev/random should be enough to understand this bug report.

I also have the coredump from the above gdb session that I can upload somewhere if needed.

Suggested fix:
Read from /dev/urandom
[17 Oct 2019 19:51] cPanel, LLC Senior Tech's
Attaching to the hung process with gdb

Attachment: gdb.txt (text/plain), 8.03 KiB.

[17 Oct 2019 19:59] cPanel, LLC Senior Tech's
coredump uploaded to /support/incoming/mysql-bug-core.15097
[17 Oct 2019 20:04] cPanel, LLC Senior Tech's
When submitting this bug was blocked due to the 8K limit, I removed more than I expected; further context

Entropy on the systems was exhausted:
====
# time head -c 10 < /dev/random
^C

real	2m4.353s

# cat /proc/sys/kernel/random/entropy_avail
0
====
[18 Oct 2019 14:00] MySQL Verification Team
Hi cPanel,

Thank you for your bug report.

I however fail to see what is MySQL bug that you are reporting. This all seems to me like expected behaviour.

First of all, one of required files is locked. Could be another instance of our server , but also could be some other program that you run there.

Next, our server works only with OpenSSL version & release that is bundled with our source and binary. We do not intend to change the version until we are forced to upgrade OpenSSL for all users.

Hence, not a bug.
[18 Oct 2019 15:09] cPanel, LLC Senior Tech's
I believe there is confusion due to the misleading error output. The EAGAIN is not coming from ibdata1, but from an unresponsive /dev/random. Did you review the attached coredump(thread 25) or gdb backtrace? 

https://docs.oracle.com/cd/E17952_01/mysql-5.7-relnotes-en/news-5-7-28.html
====
It is now possible to compile MySQL 5.7 using OpenSSL 1.1.1, enabling compilation support for MySQL 5.7 against OpenSSL even when OpenSSL 1.0.2 reaches End of Life status at the end of 2019.

All MySQL 5.7 builds now use OpenSSL. MySQL no longer supports using yaSSL as the SSL library, and source distributions no longer include yaSSL.

The WITH_SSL CMake option no longer permits bundled (use yaSSL) as a valid value, and the default option value has changed from bundled to system (use the version of OpenSSL installed on the host system).
====

It seems clear the change from yaSSL to OpenSSL has lead to depending on /dev/random, for which these systems had 0 entropy available for, leading to a hang in do_auto_cert_generation(), mysql-5.7.28/sql/auth/sql_authentication.cc. 

IMHO, reading from /dev/random should not be necessarry, /dev/urandom can be used just as well to provide computationally secure RNG. Even OpenSSL command line x509 generation uses /dev/urandom

On a relatively quiet system(disable all services you can), you should be able to replicate this by exhausting the entropy on the system:
====
# for i in {1..5}; do dd if=/dev/random of=/dev/null count=99999 & done
[1] 11790
[2] 11791
[3] 11792
[4] 11793
[5] 11794

# systemctl start mysqld
Job for mysqld.service failed because the control process exited with error code. See "systemctl status mysqld.service" and "journalctl -xe" for details.

# systemctl list-jobs
 JOB UNIT           TYPE  STATE
2524 mysqld.service start running
====

Then attaching with gdb, you will see the same output as attached in gdb.txt
[21 Oct 2019 12:04] MySQL Verification Team
Hi,

First of all, we do know very well that OpenSSL is the only choice in MySQL.

Second, do note that we support only those MySQL builds that are built with OpenSSL included in the source tarball. We also support our binaries, which are built with correct version of OpenSSL.

Last and most important, we just use OpenSSL, we do not maintain it at all. Hence, we do not fix its bugs nor do we add any features to it.