Bug #97825 | dd_mdl_acquire in dd_table_open with dict_sys->mutex hold may cause deadlock | ||
---|---|---|---|
Submitted: | 28 Nov 2019 9:27 | Modified: | 9 Mar 2020 17:07 |
Reporter: | dave do | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 8.0.18 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[28 Nov 2019 9:27]
dave do
[28 Nov 2019 13:45]
MySQL Verification Team
Hello Mr. do, Thank you for your bug report. However, we do not think that this is a bug. MDL works at one level, regardless of which storage engine i used, while InnoDB locks work only on storage engine level. Hence, as long as our developers stick to the discipline of the order in which each lock is taken, there should be no deadlocks between the two. If you manage to reproduce a deadlock between MDL and mutex locks, we would then reconsider your report.
[28 Nov 2019 14:21]
dave do
Hi Sinisa, As you mentioned, the code should stick to the discipline of the order in which each lock is taken. So let's say if a table is to be opened, first the MDL on this table should be taken, then it dives into InnoDB, and InnoDB probably take the dict_sys mutex. This is a correct locking order. However, in above dd_table_open_on_name(), if dict_locked is true, which means dict_sys mutex is now being held, it is really dangerous to call dd_mdl_acquire(), which acquires the MDL. This indeed break the locking order. If we go through other calls to dd_mdl_acquire() in dict0dd.cc, we will find out that this function always gets called after releasing the dict_sys mutex. Please re-consider this bug, and hopefully fix it if it is confirmed. Thanks!
[28 Nov 2019 16:24]
MySQL Verification Team
I see no problem in what you report. The outcome of the your scenario is that MDL will be taken and mutex lock will be free. That is expected behaviour. If you want to know why MDLs have been introduced, read carefully bug #989.
[2 Dec 2019 4:28]
dave do
Hi Sinisa, Maybe i didn't make the issue clear enough, let me try again. First, my concern is the lock order in dd_table_open_on_name is not correct, or let`s say it is different from other similar functions in the same source file. Second, there is some examples and lock order analysis of the function similar with dd_table_open_on_name in dict0dd.cc. eg1) 4278 static void dd_open_table_one_on_name(const char *name, bool dict_locked, 4279 dict_names_t &fk_list, THD *thd) { 4280 dict_table_t *table = nullptr; 4281 const dd::Table *dd_table = nullptr; 4282 MDL_ticket *mdl = nullptr; 4283 4284 if (!dict_locked) { 4285 mutex_enter(&dict_sys->mutex); ==========> make the dict_sys->mutex locked anyway. 4286 } 4287 4288 table = dict_table_check_if_in_cache_low(name); 4289 4290 if (table != NULL) { 4291 /* If the table is in cached already, do nothing. */ 4292 if (!dict_locked) { 4293 mutex_exit(&dict_sys->mutex); 4294 } 4295 4296 return; 4297 } else { 4298 /* Otherwise, open it by dd obj. */ 4299 char db_buf[MAX_DATABASE_NAME_LEN + 1]; 4300 char tbl_buf[MAX_TABLE_NAME_LEN + 1]; 4301 4302 /* Exit sys mutex to access server info */ 4303 mutex_exit(&dict_sys->mutex); ==========> make the dict_sys->mutex unlocked before dd_mdl_acquire. 4304 4305 if (!dd_parse_tbl_name(name, db_buf, tbl_buf, nullptr, nullptr, nullptr)) { 4306 goto func_exit; 4307 } 4308 4309 if (dd_mdl_acquire(thd, &mdl, db_buf, tbl_buf)) { 4310 goto func_exit; 4311 } 4312 4313 dd::cache::Dictionary_client *client = dd::get_dd_client(thd); 4314 dd::cache::Dictionary_client::Auto_releaser releaser(client); 4315 4316 if (client->acquire(db_buf, tbl_buf, &dd_table) || dd_table == nullptr) { 4317 goto func_exit; 4318 } ... 4350 func_exit: 4351 if (table != NULL) { 4352 dd_table_close(table, thd, &mdl, false); 4353 } else { 4354 dd_mdl_release(thd, &mdl); 4355 } 4356 4357 if (dict_locked) { 4358 mutex_enter(&dict_sys->mutex); ==========> make the dict_sys->mutex re-locked if it is locked before call this function. 4359 } 4360 } eg2) 611 dict_table_t *dd_table_open_on_id(table_id_t table_id, THD *thd, 612 MDL_ticket **mdl, bool dict_locked, 613 bool check_corruption) { 614 dict_table_t *ib_table; 615 const ulint fold = ut_fold_ull(table_id); 616 char db_buf[MAX_DATABASE_NAME_LEN + 1]; 617 char tbl_buf[MAX_TABLE_NAME_LEN + 1]; 618 char full_name[MAX_FULL_NAME_LEN + 1]; 619 620 if (!dict_locked) { 621 mutex_enter(&dict_sys->mutex); ==========> make the dict_sys->mutex locked anyway. 622 } 623 ... 627 reopen: 628 if (ib_table == nullptr) { 629 #ifndef UNIV_HOTBACKUP 630 if (dict_table_is_sdi(table_id)) { 653 } else { 654 mutex_exit(&dict_sys->mutex); ==========> make the dict_sys->mutex unlocked because dd_mdl_acquire will called in dd_table_open_on_id_low (L446 dict0dd.cc). 655 656 ib_table = dd_table_open_on_id_low(thd, mdl, table_id); 657 658 if (dict_locked) { 659 mutex_enter(&dict_sys->mutex); 660 } 661 } ... 678 } else { 679 for (;;) { 680 #ifndef UNIV_HOTBACKUP 681 bool ret = dd_parse_tbl_name(ib_table->name.m_name, db_buf, tbl_buf, 682 nullptr, nullptr, nullptr); 683 #endif /* !UNIV_HOTBACKUP */ 684 685 memset(full_name, 0, MAX_FULL_NAME_LEN + 1); 686 687 strcpy(full_name, ib_table->name.m_name); 688 689 ut_ad(!ib_table->is_temporary()); 690 691 mutex_exit(&dict_sys->mutex); ==========> make the dict_sys->mutex unlocked before dd_mdl_acquire. 692 693 #ifndef UNIV_HOTBACKUP 694 if (ret == false) { 695 if (dict_locked) { 696 mutex_enter(&dict_sys->mutex); 697 } 698 return (nullptr); 699 } 700 701 if (dd_mdl_acquire(thd, mdl, db_buf, tbl_buf)) { 702 if (dict_locked) { 703 mutex_enter(&dict_sys->mutex); ==========> make the dict_sys->mutex re-locked if it is locked before call this function. 704 } 705 return (nullptr); 706 } 707 #endif /* !UNIV_HOTBACKUP */ 708 709 /* Re-lookup the table after acquiring MDL. */ 710 mutex_enter(&dict_sys->mutex); ==========> make the dict_sys->mutex re-locked if it is locked before call this function. 711 712 HASH_SEARCH(id_hash, dict_sys->table_id_hash, fold, dict_table_t *, 713 ib_table, ut_ad(ib_table->cached), ib_table->id == table_id); 714 We can find that all of them will release dict_sys->mutex before dd_mdl_acquire. And from the comments of the code ( /* Exit sys mutex to access server info */, /* Re-lookup the table after acquiring MDL. */ ), we can see some clue of the author`s thinking about the lock order between MDL(server layer lock) and dict_sys->mutex(storage engine layer lock). Since the locking order in above examples is always MDL->dict_sys, and this is consistent with how server access a table as I mentioned in previous comment, it looks the locking order in dd_table_open_on_name violates the orders in other functions, which can be obviously and easily observed by reading the code. From the design point of view, this is somehow odd, and in practice, we ran into a deadlock which is caused by this unexpected locking order(please to my comment in Bug #95743). I would really appreciate if this bug can be re-evaluated.
[2 Dec 2019 13:34]
MySQL Verification Team
Hi Mr. do, Lock order could be different, but it is irrelevant, since these are locks on totally different levels and can't, in themselves, cause any deadlock.
[20 Jan 2020 8:57]
Marc ALFF
Bug report verified by the LOCK ORDER tool. There is indeed a problematic dependency, detected as: ARC FROM "mutex/innodb/dict_sys_mutex" TO "prlock/sql/MDL_lock::rwlock" OP "W" This report can not be closed as "not a bug" by claiming that locks are independent and taken in independent layers, because this is precisely what the report is about: there is a dependency from the innodb layer back to the sql layer, that breaks the general flow of sql -> innodb.
[9 Mar 2020 17:07]
Daniel Price
Posted by developer: Fixed as of the upcoming 8.0.20 release, and here's the proposed changelog entry from the documentation team: A data dictionary table open function was implemented with incorrect lock ordering.
[10 Mar 2020 13:01]
MySQL Verification Team
Thank you, Daniel.