Bug #89001 MySQL aborts without proper error message on startup if grant tables are corrupt
Submitted: 21 Dec 2017 9:01 Modified: 10 Apr 2018 17:35
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Security: Privileges Severity:S3 (Non-critical)
Version:5.7.19, 5.7.20 OS:Any
Assigned to: CPU Architecture:Any
Tags: crash

[21 Dec 2017 9:01] Daniël van Eeden
Description:
MySQL kept on crashing with this line in the error-log, but no other error indicating about what happened.
2017-12-21T07:02:15.901291Z 0 [ERROR] Aborting

This message comes from unireg_abort in sql/mysqld.cc:
1110 extern "C" void unireg_abort(int exit_code)
1111 {
1112   DBUG_ENTER("unireg_abort");
1113 
1114   // At this point it does not make sense to buffer more messages.
1115   // Just flush what we have and write directly to stderr.
1116   flush_error_log_messages();
1117 
1118   if (opt_help)
1119     usage();
1120   if (exit_code)
1121     sql_print_error("Aborting\n");

Which was called here:
sql/mysqld.cc
4872   if (mysql_rm_tmp_tables() || acl_init(opt_noacl) ||
4873       my_tz_init((THD *)0, default_tz_name, opt_bootstrap) ||
4874       grant_init(opt_noacl))
4875   {
4876     abort_loop= true;
4877 
4878     delete_pid_file(MYF(MY_WME));
4879 
4880     unireg_abort(MYSQLD_ABORT_EXIT);
4881   }

And this in turn was caused by acl_init() returning 1.
(gdb) finish
Run till exit from #0  acl_init (dont_read_acl_tables=false)
    at /export/home2/pb2/build/sb_1-23948522-1498141788.79/rpm/BUILD/mysql-5.7.19/mysql-5.7.19/sql/auth/sql_auth_cache.cc:1382
0x0000000000e9aa8a in mysqld_main (argc=120, argv=0x2ee2688) at /export/home2/pb2/build/sb_1-23948522-1498141788.79/rpm/BUILD/mysql-5.7.19/mysql-5.7.19/sql/mysqld.cc:4872
4872	/export/home2/pb2/build/sb_1-23948522-1498141788.79/rpm/BUILD/mysql-5.7.19/mysql-5.7.19/sql/mysqld.cc: No such file or directory.
Value returned is $2 = 1 '\001'

Which in turn was caused by acl_reload()
1426   return_val= acl_reload(thd);
1427 
1428   thd->release_resources();
1429   delete thd;
1430 
1431   DBUG_RETURN(return_val);
1432 }

(gdb) finish
Run till exit from #0  acl_reload (thd=0x127e2fbb0) at /export/home2/pb2/build/sb_1-23948522-1498141788.79/rpm/BUILD/mysql-5.7.19/mysql-5.7.19/sql/auth/sql_auth_cache.cc:2098
0x0000000000ec2914 in acl_init (dont_read_acl_tables=false)
    at /export/home2/pb2/build/sb_1-23948522-1498141788.79/rpm/BUILD/mysql-5.7.19/mysql-5.7.19/sql/auth/sql_auth_cache.cc:1426
1426	in /export/home2/pb2/build/sb_1-23948522-1498141788.79/rpm/BUILD/mysql-5.7.19/mysql-5.7.19/sql/auth/sql_auth_cache.cc
Value returned is $1 = 1 '\001'

Which was caused by an error reading the grant tables:
1849   end_read_record(&read_record_info);
1850   if (read_rec_errcode > 0)
1851     goto end;

So I started mysqld with --skip-slave-start

And then I was able to check the grant tables,

mysql> check table user;
+------------+-------+----------+------------------------------------------------+
| Table      | Op    | Msg_type | Msg_text                                       |
+------------+-------+----------+------------------------------------------------+
| mysql.user | check | warning  | Size of datafile is: 5312       Should be: 384 |
| mysql.user | check | error    | Record-count is not ok; is 41   Should be: 3   |
| mysql.user | check | warning  | Found 42 key parts. Should be: 3               |
| mysql.user | check | error    | Corrupt                                        |
+------------+-------+----------+------------------------------------------------+
4 rows in set (0.00 sec)

After checkin all tables it turned out these were corrupt:
user
db
event
tables_priv

after running REPAIR TABLE on them I was able to restart the server.

When comparing grants between this server and another server everything looked fine.

How to repeat:
See description

Suggested fix:
Instead of this:
2017-12-21T07:02:15.901291Z 0 [ERROR] Aborting

It should look like this:
2017-12-21T07:02:15.901290Z 0 [ERROR] Failed to read grant tables
2017-12-21T07:02:15.901291Z 0 [ERROR] Aborting

This would make it a lot easier to locate the issue. Then one can either fix the issue or start without grant tables and dump the data (similar to --innodb-force-recovery)
[21 Dec 2017 9:32] MySQL Verification Team
Hello Daniël,

Thank you for the report and feedback!
Verifying for the proper error message.

Thanks,
Umesh
[27 Dec 2017 9:20] Daniël van Eeden
Correction:

This: "So I started mysqld with --skip-slave-start"
Should be: "So I started mysqld with --skip-grant-tables"
[11 Jan 2018 12:01] Daniël van Eeden
https://github.com/mysql/mysql-server/pull/189
[12 Jan 2018 15:51] OCA Admin
Contribution submitted via Github - Fix for Bug #89001 
(*) Contribution by Daniël van Eeden (Github dveeden, mysql-server/pull/189#issuecomment-357019287): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_162364222.txt (text/plain), 629 bytes.

[9 Apr 2018 6:48] Arun Kuruvila
Posted by developer:
 
Hello Daniël,

Thank you for your contribution. We have used this as
reference while coming up with the fix for the issue
reported.

Thanks,
Arun.
[10 Apr 2018 17:35] Paul DuBois
Posted by developer:
 
Fixed in 5.7.23, 8.0.12.

If system tables could not be read or initialized, the server wrote
an incomplete error message. Thanks to Daniël van Eeden for a patch
related to the fix.