Bug #106942 alter user may lead replication broken with error 1396
Submitted: 7 Apr 2022 10:44 Modified: 19 Apr 2022 9:40
Reporter: dennis GAO (OCA) Email Updates:
Status: Not a Bug Impact on me:
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:8.0.28 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[7 Apr 2022 10:44] dennis GAO
Alter user may lead replication broken with error 1396, see the following case:

1. setup a replication m1->s1 with RBR

2. Init test user on m1:
   1) create user u1@'%' identified with 'mysql_native_password';
   2) grant all on *.* to u1@'%';

3. Modify the mysql.user table to set a invalid authentication_string for test user on m1:
   1) update mysql.user set authentication_string=MD5("password") where user="u1";
   2) flush privileges;

4. Reset authentication_string to empty for test user on m1:
   1) update mysql.user set authentication_string="" where user="u1";

5. Do the alter user twice on m1:
   1) alter user u1@'%' identified with mysql_native_password by 'abc123' password expire never;
      // this one will fail with "ERROR 1396 (HY000): Operation ALTER USER failed for 'u1'@'%'"
   2) alter user u1@'%' identified with mysql_native_password by 'abc123' password expire never;
     // this one will success

6. check slave status on s1, we will see the following error:

Last_SQL_Errno: 1396, Operation ALTER USER failed for...

How to repeat:
see description

Suggested fix:
The slave fail reason:
1. The first fail alter user command implicit to the reload_acl_caches, just like the "flush privileges". The stack is:

(gdb) bt
#0  reload_acl_caches (thd=0x7fff40000da0, mdl_locked=true) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/auth/sql_auth_cache.cc:3650
#1  0x00000000035eb843 in acl_end_trans_and_close_tables (thd=0x7fff40000da0, rollback_transaction=true) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/auth/sql_user_table.cc:653
#2  0x00000000035ebcf2 in log_and_commit_acl_ddl (thd=0x7fff40000da0, transactional_tables=true, extra_users=0x7fffa85f0260, rewrite_params=0x7fffa85f0120, extra_error=false, write_to_binlog=true)
    at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/auth/sql_user_table.cc:766
#3  0x00000000035fd773 in mysql_alter_user (thd=0x7fff40000da0, list=..., if_exists=false) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/auth/sql_user.cc:3483
#4  0x000000000330c688 in mysql_execute_command (thd=0x7fff40000da0, first_level=true) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/sql_parse.cc:4676
#5  0x000000000330df96 in dispatch_sql_command (thd=0x7fff40000da0, parser_state=0x7fffa85f5b60) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/sql_parse.cc:5174
#6  0x00000000033045e0 in dispatch_command (thd=0x7fff40000da0, com_data=0x7fffa85f6c40, command=COM_QUERY) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/sql_parse.cc:1938
#7  0x0000000003302867 in do_command (thd=0x7fff40000da0) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/sql_parse.cc:1352
#8  0x00000000034df9e7 in handle_connection (arg=0xa785660) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/conn_handler/connection_handler_per_thread.cc:302
#9  0x000000000502d937 in pfs_spawn_thread (arg=0xaa0b950) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/storage/perfschema/pfs.cc:2947
#10 0x00007ffff7bc16ba in start_thread (arg=0x7fffa85f7700) at pthread_create.c:333
#11 0x00007ffff5e3b41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

2. But this fail alter user command write nothing to the binlog. And the second alter user command, which success, will be written to the binlog.

3. So the slave only see one alter user, which will execute fail just like the first alter user on m1.

So for suggestion:
1. Invoke reload_acl_caches at the begging of mysql_alter_user.
[7 Apr 2022 11:13] dennis GAO
Cause the alter user command may invoke the reload_acl_caches if it execute fail, so it is better to invoke the reload_acl_cache

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: alter_user_1396.diff (text/x-patch), 2.69 KiB.

[7 Apr 2022 12:49] MySQL Verification Team
Hello dennis GAO,

Thank you for the report and contribution.

[8 Apr 2022 8:23] huahua xu
Hi dennis GAO,

I don't think your suggested fix is the best, which may affect the performance.

You can study the implementation of mysql 5.7, and will have a better idea.
[15 Apr 2022 3:28] dennis GAO
Hi huahua xu,

Thanks for the suggestion.
In 5.7 the acl_end_trans_and_close_tables will only check thd->transaction_rollback_request and rollback_whole_statement to decide whether do the acl reload.
But in 8.0, the acl_end_trans_and_close_tables will also check thd->is_error(), which in some case really need to restore correct state of in-memory structures by reloading privileges.

Cause the alter user command may still need do the acl reload in some cases, I prefer to always do the acl reload at the beginning.

So the behavior of alter user can be more predicatable
[19 Apr 2022 9:40] Georgi Kodinov
Posted by developer:
Manual updates to system tables using DML statements should be avoided. Exactly for the reason of the ACL cache and the persisted values going out of sync. When it absolutely must be done a *MANUAL* FLUSH PRIVILEGES should be executed (and binlogged) on all affected mysqlds.
Even with that, we try to fix the bugs that can cause server crashes etc. But since there's no crash here,  closing as "won't fix".