Bug #96412 Mess usages of latch meta data for InnoDB latches (mutex and rw_lock)
Submitted: 2 Aug 2019 12:06 Modified: 5 Aug 2019 12:46
Reporter: Fungo Wang Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S6 (Debug Builds)
Version:8.0.17 OS:Any
Assigned to: CPU Architecture:Any
Tags: latch, latch meta, mutex, rw_lock

[2 Aug 2019 12:06] Fungo Wang
Description:
I found some wrong usages (IIUC) of latch meta data while reading code about InnoDB latches implementation (mutex and rw_lock).

1. mixed usage of LATCH_ADD_MUTEX() and LATCH_ADD_RWLOCK() in sync_latch_meta_init()

a) latch_id for rw_lock is used for mutex_create

in sync0debug.cc
1443  LATCH_ADD_RWLOCK(INDEX_ONLINE_LOG, SYNC_INDEX_ONLINE_LOG,
1444                   index_online_log_key);

in row0log.cc
3063  mutex_create(LATCH_ID_INDEX_ONLINE_LOG, &log->mutex);

LATCH_ID_INDEX_ONLINE_LOG is for a mutex

b) sync_level for mutex is used for rw_lock_create

in sync0debug.cc
1374  LATCH_ADD_MUTEX(SRV_DICT_TMPFILE, SYNC_DICT_OPERATION,
1375                  srv_dict_tmpfile_mutex_key);

1464  LATCH_ADD_RWLOCK(DICT_OPERATION, SYNC_DICT, dict_operation_lock_key);

in dict0dict.cc
1034  rw_lock_create(dict_operation_lock_key, dict_operation_lock,
1035                 SYNC_DICT_OPERATION);

dict_operation_lock rw lock use pfs_key and sync_level from a rwlock and mutex meta separately 

2. different rw lock use same sync level

in sync0debug.cc
 1486  LATCH_ADD_RWLOCK(INDEX_TREE, SYNC_INDEX_TREE, index_tree_rw_lock_key);
 1487
 1488  LATCH_ADD_RWLOCK(DICT_TABLE_STATS, SYNC_INDEX_TREE, dict_table_stats_key);

in rw_lock_create_func(), latch_id is determined based on latch level
241  lock->m_id = sync_latch_get_id(sync_latch_get_name(level));

This will cause all table->stats_latch and table->stats_latch use LATCH_ID_INDEX_TREE

323:  rw_lock_create(dict_table_stats_key, table->stats_latch, SYNC_INDEX_TREE);
2447:  rw_lock_create(index_tree_rw_lock_key, &new_index->lock, SYNC_INDEX_TREE);

How to repeat:
Read the code
[2 Aug 2019 18:24] Sinisa Milivojevic
Hi Mr. Wang,

Thank you for your bug report.

We do require some clarifications.

How do you know that LATCH_ID_INDEX_ONLINE_LOG is for a mutex  ONLY ???

How do you know that dict_operation_lock can be used for mutex only ???

How do you know that SYNC_INDEX_TREE can not be used for different sync levels ????

How do you know that  table->stats_latch and table->stats_latch can not  use LATCH_ID_INDEX_TREE ???

Last, but not least, seems that S6 would be a correct severity here, since you are using some debug-only commands.
[4 Aug 2019 8:25] Fungo Wang
Hi Sinisa,

Let me explain it more clearly (I should have put all these info in my initial bug report).

The current usage of latch_id and sync level will not cause crash, but I think it will make trouble for mysqld diagnose under some circumstances.

latch_meta is used to maintain a relation for latch_id, latch_level, pfs_key and latch_counter, each element in latch_meta vector contains:
(latch_id, latch_name, latch_level, latch_level_name, pfs_key, latch_counter)

InnoDB Mutex and RW lock are implemented differently, and for the latch_meta vector initialization (i.e.. sync_latch_meta_init()), two different macros are used to add mutex and RW lock.

1. LATCH_ADD_MUTEX() 
2. LATCH_ADD_RWLOCK()

The defination of these two macros are controlled by #ifdef UNIV_PFS_MUTEX and #ifdef UNIV_PFS_RWLOCK, which decide whether mutex or rw lock is instrumented in P_S. 

> How do you know that LATCH_ID_INDEX_ONLINE_LOG is for a mutex  ONLY ???

1443  LATCH_ADD_RWLOCK(INDEX_ONLINE_LOG, SYNC_INDEX_ONLINE_LOG,
1444                   index_online_log_key);

LATCH_ID_INDEX_ONLINE_LOG is initialized through RW_LOCK, so if you want to use create a mutex, it's not a good idea to use a rw_lock latch_id.

If you check the code to initialzed a mutex, you can see the latch_id is used fetch the corresponding PFS key.
So a pfs_key for rw_lock will be used for a mutex, which will lead confusing if the user check P_S table.

And think in such situation:
the rw_lock P_S instrumentation is disabled (PFS_NOT_INSTRUMENTED), while the mutex P_S instrumentation is not, if a mutex initialized with a rw_lock latch id, you will not see info for such mutex in P_S table.

815  @param[in]    id              Mutex ID
816  @param[in]    filename        file where created
817  @param[in]    line            line number in file where created */
818  void init(latch_id_t id, const char *filename, uint32_t line) UNIV_NOTHROW {
819#ifdef UNIV_PFS_MUTEX
820    pfs_add(sync_latch_get_pfs_key(id));
821#endif /* UNIV_PFS_MUTEX */
822
823    m_impl.init(id, filename, line);
824  }

> How do you know that dict_operation_lock can be used for mutex only ???

I guess you mean "How do you know that SYNC_DICT_OPERATION (a sync level) can be used for mutex only ???"

Different kind of mutex or rw lock used different sync level, thus the debug code can detect sync_level confliction. 

1374  LATCH_ADD_MUTEX(SRV_DICT_TMPFILE, SYNC_DICT_OPERATION,
1375                  srv_dict_tmpfile_mutex_key);

1464  LATCH_ADD_RWLOCK(DICT_OPERATION, SYNC_DICT, dict_operation_lock_key);

Through the above two initialiaztion, I assume that
SYNC_DICT_OPERATION is for LATCH_ID_SRV_DICT_TMPFILE mutex, and SYNC_DICT is used for LATCH_ID_DICT_OPERATION RW lock.

So I guess dict_operation_lock rw_lock should use sync level SYNC_DICT, instead of SYNC_DICT_OPERATION.
1034  rw_lock_create(dict_operation_lock_key, dict_operation_lock,
1035                 SYNC_DICT_OPERATION);

> How do you know that SYNC_INDEX_TREE can not be used for different sync levels ????

As SYNC_INDEX_TREE is a sync level, I guess you mean "How do you know that SYNC_INDEX_TREE can not be used for different rw lock ????"

 1486  LATCH_ADD_RWLOCK(INDEX_TREE, SYNC_INDEX_TREE, index_tree_rw_lock_key);
 1487
 1488  LATCH_ADD_RWLOCK(DICT_TABLE_STATS, SYNC_INDEX_TREE, dict_table_stats_key);

If you check the rw_lock_create_func() function:

241  lock->m_id = sync_latch_get_id(sync_latch_get_name(level));

This will cause both table->stats_latch and new_index->lock initialized latch id (lock->m_id) with LATCH_ID_INDEX_TREE, even if new_index->lock should be new_index->lock->m_id should be LATCH_ID_DICT_TABLE_STATS.

323:  rw_lock_create(dict_table_stats_key, table->stats_latch, SYNC_INDEX_TREE);
2447:  rw_lock_create(index_tree_rw_lock_key, &new_index->lock, SYNC_INDEX_TREE);

This will not cause too much trouble, but will cause LatchDebug::print_latches and LatchDebug::crash print wrong info if invoked:

537  ib::error(ER_IB_MSG_1163)
538#endif /* UNIV_NO_ERR_MSGS */
539      << "Thread " << os_thread_get_curr_id() << " already owns a latch "
540      << sync_latch_get_name(latch->m_id) << " at level"
541      << " " << latched->m_level << " (" << latch_level_name
542      << " ), which is at a lower/same level than the"
543      << " requested latch: " << level << " (" << in_level_name << "). "
544      << latch->to_string();

The sync_latch_get_name(latch->m_id) will give unexpected name ("INDEX_TREE") for all table->stats_latch (which should be "DICT_TABLE_STATS") .

Hope the above could also answer the 4th question.
> How do you know that  table->stats_latch and table->stats_latch can not  use LATCH_ID_INDEX_TREE ???

Thanks.
[4 Aug 2019 8:30] Fungo Wang
A correction of previous comment:

This will cause both table->stats_latch and new_index->lock initialized latch id (lock->m_id) with LATCH_ID_INDEX_TREE, even if new_index->lock should be new_index->lock->m_id should be LATCH_ID_DICT_TABLE_STATS.

==== should be ===>

This will cause both table->stats_latch and new_index->lock initialized latch id (lock->m_id) with LATCH_ID_INDEX_TREE, even if table->stats_latch->m_id should be LATCH_ID_DICT_TABLE_STATS.
[5 Aug 2019 12:46] Sinisa Milivojevic
Thank you Mr. Wang,

I have analysed all the info provided by you and I agree with your conclusion. I am changing only the severity level.

Verified.