Bug #93085 Stall when concurrently execute create/alter user with flush privilege
Submitted: 5 Nov 2018 14:24 Modified: 16 Nov 2018 13:32
Reporter: dave do Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Security: Privileges Severity:S2 (Serious)
Version:8.0.12 OS:Any
Assigned to: CPU Architecture:Any
Tags: dcl, DDL, dead lock, regression

[5 Nov 2018 14:24] dave do
Description:
The server will stall if a create/alter user concurrently executed with flush privilege statements.

How to repeat:
 concurrently execute some statements cases include create/alter user and  flush privilege.

Suggested fix:
1. Why it happen ?
the mysql_alter_user and mysql_create_user acquired mdl lock through acl_cache_lock.lock() and not released while calling reset_mqh, which will need mysql_mutex_lock(&LOCK_user_conn) in it.
The LOCK_user_conn could hold by a flush privilege thread , but will wait a in get_mqh this thread will call.
So it will cause a dead-lock to stall the server.

2. suggestion of fix
compare with mysql 5.7, the acl_cache_lock will explicit release before call the reset_mqh.
so I suggest we can also do it in mysql 8.
[6 Nov 2018 13:10] Sinisa Milivojevic
Hi,

Thank you for your report.

However, this is not a bug. This is expected behaviour which is fully described in our Reference Manual.

Privileges are stored both, in tables and in the memory. FLUSH PRIVILEGES command takes a lock over the entire privilege system, releases the memory, allocates it and fills it up from the tables on disk. After this process is done, lock is released. This command should NOT be used, unless you are running direct DML commands on the privilege tables, which is not recommended.

As CREATE / ALTER USER has to work in memory in memory, it can not do anything during the entire time when (the above described) lock is held.

Once again, this is expected behaviour and not a bug .....
[15 Nov 2018 7:00] Jean-François Gagné
Please consider re-opening.  From [1], "A followup to this: @dveeden pointed out that modifying thousands of users with CREATE, GRANT etc. is very slow.  There is a use-case still for INSERT/UPDATE to privs tables directly."

[1]: https://twitter.com/morgo/status/1062706379092312066
[15 Nov 2018 13:44] Sinisa Milivojevic
Jean-Francois,

I truly do not see what could be done here.

First of all, we are recommending to all not to access privilege tables directly. 

Second, if direct access is here to stay, then all data that we need in memory have to be protected, so that you can not do concurrently GRANT commands and direct DML commands. So, if you wish to use direct access to privilege tables,  you need to flush privileges. By flushing them, you would get changed privileges in the memory, so that they can be used. In order to maintain this dual access, the lock must be there for the protection of data in memory.

I do not see what can be done there.

Not a Bug.
[16 Nov 2018 8:25] Valeriy Kravchuk
I can tell you what you can do, dear Sinisa. 

You can first check that the problem is repeatable with 8.0.x and that the same scenario does NOT cause stalls with 5.7.x. If this is really so (as bug reporter states), then you can add "regression" tag, verify this bug and ask developers to check if there is a way to make 8.0 work the same as older MySQL versions worked.

If your check shows that there is no stall, then you can share the details of it and set this as "Can't repeat". If the same stall actually happens in 5.7, you can add one more version affected and state it's not a recent regression.

You can also just say "Won't fix" after checking how and when this really happens (assuming you have the authority for such decisions). You can also file a documentation request (separate one) to have safe way of using FLUSH PRIVILEGES properly documented or explained, while it is not deprecated to change mysql.* privilege tables directly.

You can initiate deprecation of FLUSH PRIVILEGES statement, and make sure this decision is documented in the manual. You can 9and should) do many things.

What you should NOT do is simply closing this as "Not a bug" after sharing your "explanations". Nowhere in the manual it is stated that one can NOT run FLUSH PRIVILEGES in concurrent environment or that it is supposed to hang everything. Correct me if I am wrong, with a link.

This is a bug, from common sense point of view (and especially if this worked before 8.0), like many others involving various FLUSH statements that caused hangs and deadlocks (that were verified and fixed, do you want a list?). Surely there is either a way to serialize this with concurrent CREATE/ALTER USER properly, or it's time to explicitly deprecate the feature and/or implement it differently for a new data dictionary.
[16 Nov 2018 13:32] Sinisa Milivojevic
Hi Dave, Jean-Francois,

I have decided to revisit this bug. I analysed the code more thoroughly and I found that analysis is correct. I test both 5.7 and 8.0, both experience waits, but 8.0 has much longer wait. I did not experience deadlock, but it might be possible to have one with much larger number of users.

Hence, I verify it as a regression bug.

Verified as reported.