MySQL Bugs: #53046: dict_update_statistics_low can still be run concurrently on same table

Bug #53046	dict_update_statistics_low can still be run concurrently on same table
Submitted:	22 Apr 2010 5:58	Modified:	10 Jan 2011 22:30
Reporter:	Shane Bester (Platinum Quality Contributor)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB Plugin storage engine	Severity:	S5 (Performance)
Version:	plugin 1.0.7	OS:	Any
Assigned to:	Vasil Dimov	CPU Architecture:	Any

Description:
this is a continuation of bug #38996

calls from ha_innobase::info can still invoke dict_update_statistics_low concurrently on the same table.  this is especially true if innodb_stats_on_metadata=ON which is the default.

How to repeat:
set a breakpoint, or sleep in dict_update_statistics_low then run
show table status like 't1' in multiple connections.

Suggested fix:
use a per table mutex, as this could save wasted disk i/o on concurrent
index dives.  

stop using the global analyze_mutex mutex, so that multiple analyze tables can be run on different tables at the same time.

Grab!

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/106765

3427 Vasil Dimov	2010-04-28
      Fix Bug#53046 dict_update_statistics_low can still be run concurrently
      on same table
      
      Protect dict_index_t::stat_n_diff_key_vals[] with an array of
      mutexes.
      
      Testing: tested all code paths under UNIV_SYNC_DEBUG
      for the one in dict_print() one has to enable the InnoDB table monitor:
      CREATE TABLE innodb_table_monitor (a int) ENGINE=INNODB;

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/106786

3430 Vasil Dimov	2010-04-28
      Bug#53046 dict_update_statistics_low can still be run concurrently
      on same table
      
      Followup to vasil.dimov@oracle.com-20100428102033-dt3caf531rs3lidr :
      
      Add more asserions, which I forgot.

Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:kristofer.pettersson@sun.com-20100503172109-f9hracq5pqsaomb1) (merge vers: 5.1.47) (pib:16)

Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:alik@sun.com-20100512070920-xgpmqeytp0gc183c) (pib:16)

Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:alik@sun.com-20100507093037-7cykrx1n73v0tetc) (merge vers: 6.0.14-alpha) (pib:16)

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:alexey.kopytov@sun.com-20100507164602-8w09samq3mpvbxbn) (merge vers: 5.5.5-m3) (pib:16)

The race is still there. dict_table_get() checks table->stat_initialized without locking and then calls dict_update_statistics() when stat_initialized == 0.

dict_update_statistics_low() can then do too much work when several sessions call it concurrently for the same index.

The facebook patch has a fix without this race:
http://bazaar.launchpad.net/~mysqlatfacebook/mysqlatfacebook/5.1/revision/3470

I think your change can work if you increase the duration for which dict_index_stat_mutex_enter/exit are called. Your change allows stats to be concurrently collected for different indexes on a table. I am not sure that is worth supporting.

This fix also added use of the mutex for reads done by ha_innobase::info

I am wary of adding another source of mutex contention

The assignments at the end of dict_update_statistics_low are another race. There are no pthread mutex calls that follow them. Another thread can observe stat_initialized=TRUE without observing the other writes. Read the C++ double-checked locking paper for details.

        index = dict_table_get_first_index(table);

        dict_index_stat_mutex_enter(index);

        table->stat_n_rows = index->stat_n_diff_key_vals[
                dict_index_get_n_unique(index)];

        dict_index_stat_mutex_exit(index);

        table->stat_clustered_index_size = index->stat_index_size;

        table->stat_sum_of_other_index_sizes = sum_of_index_sizes
                - index->stat_index_size;

        table->stat_initialized = TRUE;

        table->stat_modified_counter = 0;
}

Hi Mark,

The original problem was about concurrently read/writing the index->stat_n_diff_key_vals[] array. Yes, dict_update_statistics_low() can be called concurrently on the same index. But is this a real problem, these statistics are only estimates anyway? I think eventually the whole body of dict_update_statistics_low() could be protected by dict_index_stat_mutex_enter/exit if this is a real problem.

Using a few mutexes instead of a single one (see dict_index_stat_mutex_enter()) is exactly to avoid another source of mutex contention.

You write "Another thread can observe stat_initialized=TRUE without observing the other writes", I do not understand how this could happen since stat_initialized = TRUE is done at the end, can you elaborate? What is "the C++ double-checked locking paper"?

Thanks!

Vasil, see section 6 on page 11 of this article: http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf . I'm assuming that Mark is referring to the cache coherency and reordering issue that makes it possible for table->stat_clustered_index_size and table->stat_sum_of_other_index_sizes assignment to happen in CPU caches visible to other threads after table->stat_initialized = TRUE is set instead of before.

Writes can be reordered on architectures other than x86. This is the code at the end of dict_update_statistics_low():

table->stat_clustered_index_size = index->stat_index_size;
table->stat_sum_of_other_index_sizes = sum_of_index_sizes-index->stat_index_size;
table->stat_initialized = TRUE;
table->stat_modified_counter = 0;

With write reordering it can be observed by another thread as:
table->stat_initialized = TRUE;
table->stat_clustered_index_size = index->stat_index_size;
table->stat_sum_of_other_index_sizes = sum_of_index_sizes-index->stat_index_size;
table->stat_modified_counter = 0;

When that is done another thread can see that table->stat_initialized==TRUE and then read other fields that have yet to be initialized.

I don't want concurrent threads running dict_update_statistics_low() for the same table because this can cause huge pileups at startup. In my case that can be hundreds of threads trying to open the same table.

Hi Mark,

This patch would fix the ordering issue:

--- cut ---
--- storage/innodb_plugin/dict/dict0dict.c	revid:jimmy.yang@oracle.com-20100617021204-x3790up7eeezox2c
+++ storage/innodb_plugin/dict/dict0dict.c	2010-06-17 07:01:47 +0000
@@ -4235,22 +4235,22 @@ dict_update_statistics_low(
 
 	dict_index_stat_mutex_enter(index);
 
 	table->stat_n_rows = index->stat_n_diff_key_vals[
 		dict_index_get_n_unique(index)];
 
-	dict_index_stat_mutex_exit(index);
-
 	table->stat_clustered_index_size = index->stat_index_size;
 
 	table->stat_sum_of_other_index_sizes = sum_of_index_sizes
 		- index->stat_index_size;
 
 	table->stat_initialized = TRUE;
 
 	table->stat_modified_counter = 0;
+
+	dict_index_stat_mutex_exit(index);
 }
--- cut ---

What do you think?

Thank you!

Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:martin.skold@mysql.com-20100616204905-jxjg342w35ks9vfy) (merge vers: 5.1.47-ndb-7.0.16) (pib:16)

Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:martin.skold@mysql.com-20100615090726-jotpykke96le59w5) (merge vers: 5.1.47-ndb-6.2.19) (pib:16)

Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:martin.skold@mysql.com-20100616120453-jh7wr05z1vf7r8pm) (merge vers: 5.1.47-ndb-6.3.35) (pib:16)

I thought I replied elsewhere. The latest change does not prevent concurrent callers from calling dict_update_statistics for initialization and all repeating the work which may mean extra IO.

Code in dict_table_get reads table->stat_initialized without holding a mutex.
Concurrent threads can all see a value 0 and then all do the work in dict_update_statistics.

        if (table != NULL) {
                if (!table->stat_initialized) {
                        /* If table->ibd_file_missing == TRUE, this will
                        print an error message and return without doing
                        anything. */
                        dict_update_statistics(table);
                }
        }

I wrote a more consistent high level patch that protects all table and index stats, now in review process...

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/122571

3650 Vasil Dimov	2010-11-02
      Fix Bug#53046 dict_update_statistics_low can still be run concurrently on same table
      
      Replace the array of mutexes that used to protect
      dict_index_t::stat_n_diff_key_vals[] with an array of rw locks that protects
      all the stats related members in dict_table_t and all of its indexes.
      
      Approved by:	Jimmy (rb://503)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/122625

3651 Vasil Dimov	2010-11-03
      Add ChangeLog entry for Bug#53046

5.1.52 has code in dict_update_statistics_low that is different from 5.1.50. Is this a race as it is done without calling dict_index_stat_mutex?

                } else {
                        /* If we have set a high innodb_force_recovery
                        level, do not calculate statistics, as a badly
                        corrupted index can cause a crash in it.
                        Initialize some bogus index cardinality
                        statistics, so that the data can be queried in
                        various means, also via secondary indexes. */
                        ulint   i;

                        sum_of_index_sizes++;
                        index->stat_index_size = index->stat_n_leaf_pages = 1;

                        for (i = dict_index_get_n_unique(index); i; ) {
                                index->stat_n_diff_key_vals[i--] = 1;
                        }
                }

                index = dict_table_get_next_index(index);
        } while (index);

        index = dict_table_get_first_index(table);

        dict_index_stat_mutex_enter(index);

        table->stat_n_rows = index->stat_n_diff_key_vals[
                dict_index_get_n_unique(index)];

        dict_index_stat_mutex_exit(index);

        table->stat_clustered_index_size = index->stat_index_size;

        table->stat_sum_of_other_index_sizes = sum_of_index_sizes
                - index->stat_index_size;

        table->stat_initialized = TRUE;

        table->stat_modified_counter = 0;
}

Mark, yes, this is a race but I have committed a patch that fixes it in vasil.dimov@oracle.com-20101102165720-164z3666y3tut4c2 in 5.1/plugin.

From the commit message:
  Replace the array of mutexes that used to protect
  dict_index_t::stat_n_diff_key_vals[] with an array of rw locks that protects
  all the stats related members in dict_table_t and all of its indexes.

I will attach the patch to this bug because somehow the commit email did not make it to the mailing list.

Thanks!

Sorry, the patch did made it to the mailing list, it is here: http://lists.mysql.com/commits/122571

Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)

Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (pib:21)

Added to change log:

Improved concurrency when several ANALYZE TABLE or SHOW TABLE STATUS
statements are run simultaneously for InnoDB tables.

Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (version source revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (merge vers: 5.6.1) (pib:23)

Pushed into mysql-5.1 5.1.55 (revid:sunanda.menon@oracle.com-20101215054055-vgwki317xg1wphhh) (version source revid:sunanda.menon@oracle.com-20101215054055-vgwki317xg1wphhh) (merge vers: 5.1.55) (pib:23)

Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)