Bug #103563 upgrade to 8.0.24 with large number of tables fails
Submitted: 4 May 2021 6:07 Modified: 7 May 2021 7:34
Reporter: Simon Mudd (OCA) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:8.0.24 OS:Any
Assigned to: CPU Architecture:Any
Tags: file descriptors, upgrade

[4 May 2021 6:07] Simon Mudd
Description:
Upgrading from 8.0.21 to 8.0.23 failed as the server (mysqld) ran out of file descriptors because it tried to open all tablespace files (at the same time?) as part of the upgrade process.

The server had 260k tables over 20k databases.

This left the server in a stuck state unable to serve queries and restarting it made it go through the same process and get stuck the same way.

How to repeat:
Relevant server settings:

max_connections = 30000
open_files_limit = 20000
innodb_open_files = 3000

Systemd overrides config:

[Service]
# See adjust_open_files_limit() in sql/mysqld.cc for the calculation
LimitNOFILE=150000

With these settings the server works fine.

The specific error reported (by 8.0.23) was:

2021-04-30T17:54:34.298688Z 4 [System] [MY-013381] [Server] Server upgrade from '80021' to '80023' started.
2021-04-30T17:54:34.315845Z 4 [Note] [MY-013386] [Server] Running queries to upgrade MySQL server.
2021-04-30T17:54:42.301174Z 4 [ERROR] [MY-012592] [InnoDB] Operating system error number 24 in a file operation.
2021-04-30T17:54:42.310191Z 4 [ERROR] [MY-012596] [InnoDB] Error number 24 means 'Too many open files'

Suggested fix:
I consider this serious as:
* an upgrade fails, the reason is not 100% obvious to the user.
* it should not be necessary to have to configure a special value for the number of file descriptors in order for the upgrade to work
* the upgrade fails as there are insufficient descriptors leaving the server in an unusable state, however, it has partially completed so it's impossible to go back to the older version

Suggested fix:
* avoid the need for special settings during an 8.0 upgrade
* ensure that the server respects the settings provided by: max_connections, open_files_limit, innodb_open_files which seem to influence the number of needed file descriptors
* should such an issue occur provide a better error message to the user to help him/her resolve the issue.
* improve when scanning the tablespaces the reporting to indicate the number of tables scanned and the total number. Reporting by "thread" makes it harder to figure out the progress as this process can take quite a while (took 30 minutes with just under 300k tables a process which is considerably longer than the normal time to scan tables during normal server startup)
[4 May 2021 6:29] Simon Mudd
Example of the thread scanning seen was:

2021-04-30T18:33:40.793006Z 0 [Note] [MY-012350] [InnoDB] Thread# 5 - Validated 14938/16367 tablespaces so far.
2021-04-30T18:33:45.910946Z 0 [Note] [MY-012350] [InnoDB] Thread# 6 - Validated 15044/16367 tablespaces so far.
2021-04-30T18:33:50.790274Z 0 [Note] [MY-012350] [InnoDB] Thread# 5 - Validated 16324/16367 tablespaces so far.

In this case there were 17 threads and this was close to the end of the process but since logging switches between different threads it far harder to "see" the percentage of progress or estimate the expected time to completion.

Perhaps something along the lines of:

2021-04-30T18:33:45.910946Z 0 [Note] [MY-012350] [InnoDB] 10 of 17 threads scanning - Validated 200000/260123 tablespaces (76.9%) tablespaces so far. Estimated 3 min 10 sec to completion.
[5 May 2021 9:31] MySQL Verification Team
Hi Simon,

I am pretty sure you covered this already but asking just to be sure it is not missed. You also set the system limits for the mysql user ( /etc/security/limits/ nofile settings) and checked the limits you have set on the mysqld from proc fs?

grep "Max open files"  /proc/`pidof -s mysqld`/limits

If you configured everything correctly procfs should show that mysqld is running with limit of 150,000 files and this is bug on our side. I'm running a test to reproduce this as I'm writing this note, will check for additional info in the SR too.

Thanks
Bogdan
[5 May 2021 14:31] MySQL Verification Team
update, 8.0.24, 100k tables and very low limits

innodb_open_files = 300
innodb-thread-concurrency=30

and hard limit also very low at 8k:

$ grep "Max open files"  /proc/`pidof -s mysqld`/limits
Max open files            8161                 8161                 files

Verification passes ok:

2021-05-05T14:28:16.616484Z 0 [Note] [MY-012200] [InnoDB] Thread# 0 - Checked 101651/105001 files
2021-05-05T14:28:24.600191Z 0 [Note] [MY-012200] [InnoDB] Thread# 1 - Checked 104587/105001 files
2021-05-05T14:28:26.601345Z 0 [Note] [MY-012200] [InnoDB] Thread# 0 - Checked 102145/105001 files
2021-05-05T14:28:33.068949Z 0 [Note] [MY-012201] [InnoDB] Checked 105001 files
2021-05-05T14:28:36.604397Z 0 [Note] [MY-012200] [InnoDB] Thread# 0 - Checked 102798/105001 files
2021-05-05T14:28:46.605148Z 0 [Note] [MY-012200] [InnoDB] Thread# 0 - Checked 103747/105001 files
2021-05-05T14:28:56.601724Z 0 [Note] [MY-012200] [InnoDB] Thread# 0 - Checked 104694/105001 files
2021-05-05T14:28:59.855737Z 0 [Note] [MY-012201] [InnoDB] Checked 105001 files
2021-05-05T14:28:59.877921Z 1 [Note] [MY-012208] [InnoDB] Completed space ID check of 210005 files.

This is on single core machine so running with only 2 threads... Will retest now with 12 cores to see if it will make any difference
[5 May 2021 16:38] Simon Mudd
I think I saw as I ran in another shell a command to count the number of used file descriptors that the problem happens from:

2021-04-30T18:41:56.161045Z 4 [System] [MY-013381] [Server] Server upgrade from '80021' to '80023' started.
2021-04-30T18:41:56.173508Z 4 [Note] [MY-013386] [Server] Running queries to upgrade MySQL server.
... to ....
2021-04-30T19:03:56.283816Z 4 [Note] [MY-013387] [Server] Upgrading system table data.
2021-04-30T19:03:57.060699Z 4 [Note] [MY-013385] [Server] Upgrading the sys schema.
2021-04-30T19:03:57.905854Z 4 [Note] [MY-013400] [Server] Upgrade of help tables started.
2021-04-30T19:03:58.167141Z 4 [Note] [MY-013400] [Server] Upgrade of help tables completed.
2021-04-30T19:03:58.255277Z 4 [Note] [MY-013394] [Server] Checking 'mysql' schema.
2021-04-30T19:03:58.582438Z 4 [Note] [MY-013394] [Server] Checking 'sys' schema.
2021-04-30T19:03:58.699950Z 4 [System] [MY-013381] [Server] Server upgrade from '80021' to '80023' completed.

Certainly prior to this step the number of used file descriptors is incredibly low.
[5 May 2021 16:42] MySQL Verification Team
Update, with 6 core and 12 thread verifying it.. 200k tables.. works ok.

I will try to upgrade from .21 to .24 now but I'm not sure this will be much different as I see Shane did the similar test and looks like it requires much more cores so that verification is done with much more threads to reproduce this. I hoped reducing the limits significantly would get the job done but looks like I'm wrong. Lookin at the code it also makes no sense... we are not doing that many threads that open files in threads break the limit and if a single thread is not closing files less threads would open more files in single go.

Continuing to investigate so we'll see where we get.

all best
Bogdan

2021-05-05T15:56:41.791646Z 0 [Note] [MY-012200] [InnoDB] Thread# 11 - Checked 17456/17500 files
2021-05-05T15:56:41.806005Z 0 [Note] [MY-012200] [InnoDB] Thread# 6 - Checked 17163/17500 files
2021-05-05T15:56:41.817655Z 0 [Note] [MY-012200] [InnoDB] Thread# 4 - Checked 17466/17500 files
2021-05-05T15:56:41.830142Z 0 [Note] [MY-012200] [InnoDB] Thread# 9 - Checked 17304/17500 files
2021-05-05T15:56:41.851599Z 0 [Note] [MY-012200] [InnoDB] Thread# 1 - Checked 17346/17500 files
2021-05-05T15:56:44.660506Z 0 [Note] [MY-012201] [InnoDB] Checked 17500 files
2021-05-05T15:56:45.011570Z 0 [Note] [MY-012201] [InnoDB] Checked 17500 files
2021-05-05T15:56:46.878056Z 0 [Note] [MY-012201] [InnoDB] Checked 17500 files
2021-05-05T15:56:49.076333Z 0 [Note] [MY-012201] [InnoDB] Checked 17500 files
2021-05-05T15:56:51.759209Z 0 [Note] [MY-012200] [InnoDB] Thread# 9 - Checked 17468/17500 files
2021-05-05T15:56:51.766201Z 0 [Note] [MY-012200] [InnoDB] Thread# 6 - Checked 17363/17500 files
2021-05-05T15:56:51.772920Z 0 [Note] [MY-012200] [InnoDB] Thread# 0 - Checked 17452/17500 files
2021-05-05T15:56:51.779508Z 0 [Note] [MY-012200] [InnoDB] Thread# 1 - Checked 17485/17500 files
2021-05-05T15:56:52.414900Z 0 [Note] [MY-012201] [InnoDB] Checked 17500 files
2021-05-05T15:56:53.026562Z 0 [Note] [MY-012201] [InnoDB] Checked 17500 files
2021-05-05T15:56:53.428553Z 0 [Note] [MY-012201] [InnoDB] Checked 17500 files
2021-05-05T15:56:54.257514Z 0 [Note] [MY-012201] [InnoDB] Checked 17500 files
2021-05-05T15:56:54.286891Z 1 [Note] [MY-012208] [InnoDB] Completed space ID check of 210005 files.
[5 May 2021 16:52] MySQL Verification Team
I just started the test on a 12 core (24 logical cpu's) so expecting results soon

Bogdan
[5 May 2021 17:01] MySQL Verification Team
Reproduced on my main machine (12 cores, 24 logical cpu's) rather quickly, I started with .21, upgraded to .24 total failurem .24 won't start any more.
[5 May 2021 18:43] MySQL Verification Team
Not 100% if the reproduction is valid as I'm running with too low limits (300 open files). I'm retrying this now a bit differently both with special 23 and with normal 8.0.24.

Will post new updates soon

Bogdan
[6 May 2021 9:11] MySQL Verification Team
Bug#103562 marked as duplicate of this one
[7 May 2021 7:32] MySQL Verification Team
duplicate of internal:

Bug 32634620 - INNODB STARTED HITTING OPERATING SYSTEM'S OPEN_FILES LIMIT

fixed in 8.0.26.