Bug #53046 dict_update_statistics_low can still be run concurrently on same table
Submitted: 22 Apr 2010 5:58 Modified: 10 Jan 2011 22:30
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S5 (Performance)
Version:plugin 1.0.7 OS:Any
Assigned to: Vasil Dimov CPU Architecture:Any

[22 Apr 2010 5:58] Shane Bester
Description:
this is a continuation of bug #38996

calls from ha_innobase::info can still invoke dict_update_statistics_low concurrently on the same table.  this is especially true if innodb_stats_on_metadata=ON which is the default.

How to repeat:
set a breakpoint, or sleep in dict_update_statistics_low then run
show table status like 't1' in multiple connections.

Suggested fix:
use a per table mutex, as this could save wasted disk i/o on concurrent
index dives.  

stop using the global analyze_mutex mutex, so that multiple analyze tables can be run on different tables at the same time.
[22 Apr 2010 7:40] Vasil Dimov
Grab!
[28 Apr 2010 8:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/106765

3427 Vasil Dimov	2010-04-28
      Fix Bug#53046 dict_update_statistics_low can still be run concurrently
      on same table
      
      Protect dict_index_t::stat_n_diff_key_vals[] with an array of
      mutexes.
      
      Testing: tested all code paths under UNIV_SYNC_DEBUG
      for the one in dict_print() one has to enable the InnoDB table monitor:
      CREATE TABLE innodb_table_monitor (a int) ENGINE=INNODB;
[28 Apr 2010 10:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/106786

3430 Vasil Dimov	2010-04-28
      Bug#53046 dict_update_statistics_low can still be run concurrently
      on same table
      
      Followup to vasil.dimov@oracle.com-20100428102033-dt3caf531rs3lidr :
      
      Add more asserions, which I forgot.
[5 May 2010 15:15] Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:kristofer.pettersson@sun.com-20100503172109-f9hracq5pqsaomb1) (merge vers: 5.1.47) (pib:16)
[28 May 2010 5:55] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:alik@sun.com-20100512070920-xgpmqeytp0gc183c) (pib:16)
[28 May 2010 6:24] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:alik@sun.com-20100507093037-7cykrx1n73v0tetc) (merge vers: 6.0.14-alpha) (pib:16)
[28 May 2010 6:52] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:alexey.kopytov@sun.com-20100507164602-8w09samq3mpvbxbn) (merge vers: 5.5.5-m3) (pib:16)
[8 Jun 2010 22:59] Mark Callaghan
The race is still there. dict_table_get() checks table->stat_initialized without locking and then calls dict_update_statistics() when stat_initialized == 0.

dict_update_statistics_low() can then do too much work when several sessions call it concurrently for the same index.

The facebook patch has a fix without this race:
http://bazaar.launchpad.net/~mysqlatfacebook/mysqlatfacebook/5.1/revision/3470

I think your change can work if you increase the duration for which dict_index_stat_mutex_enter/exit are called. Your change allows stats to be concurrently collected for different indexes on a table. I am not sure that is worth supporting.
[8 Jun 2010 23:11] Mark Callaghan
This fix also added use of the mutex for reads done by ha_innobase::info

I am wary of adding another source of mutex contention
[8 Jun 2010 23:26] Mark Callaghan
The assignments at the end of dict_update_statistics_low are another race. There are no pthread mutex calls that follow them. Another thread can observe stat_initialized=TRUE without observing the other writes. Read the C++ double-checked locking paper for details.

        index = dict_table_get_first_index(table);

        dict_index_stat_mutex_enter(index);

        table->stat_n_rows = index->stat_n_diff_key_vals[
                dict_index_get_n_unique(index)];

        dict_index_stat_mutex_exit(index);

        table->stat_clustered_index_size = index->stat_index_size;

        table->stat_sum_of_other_index_sizes = sum_of_index_sizes
                - index->stat_index_size;

        table->stat_initialized = TRUE;

        table->stat_modified_counter = 0;
}
[9 Jun 2010 6:57] Vasil Dimov
Hi Mark,

The original problem was about concurrently read/writing the index->stat_n_diff_key_vals[] array. Yes, dict_update_statistics_low() can be called concurrently on the same index. But is this a real problem, these statistics are only estimates anyway? I think eventually the whole body of dict_update_statistics_low() could be protected by dict_index_stat_mutex_enter/exit if this is a real problem.

Using a few mutexes instead of a single one (see dict_index_stat_mutex_enter()) is exactly to avoid another source of mutex contention.

You write "Another thread can observe stat_initialized=TRUE without observing the other writes", I do not understand how this could happen since stat_initialized = TRUE is done at the end, can you elaborate? What is "the C++ double-checked locking paper"?

Thanks!
[9 Jun 2010 12:43] James Day
Vasil, see section 6 on page 11 of this article: http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf . I'm assuming that Mark is referring to the cache coherency and reordering issue that makes it possible for table->stat_clustered_index_size and table->stat_sum_of_other_index_sizes assignment to happen in CPU caches visible to other threads after table->stat_initialized = TRUE is set instead of before.
[9 Jun 2010 14:55] Mark Callaghan
Writes can be reordered on architectures other than x86. This is the code at the end of dict_update_statistics_low():

table->stat_clustered_index_size = index->stat_index_size;
table->stat_sum_of_other_index_sizes = sum_of_index_sizes-index->stat_index_size;
table->stat_initialized = TRUE;
table->stat_modified_counter = 0;

With write reordering it can be observed by another thread as:
table->stat_initialized = TRUE;
table->stat_clustered_index_size = index->stat_index_size;
table->stat_sum_of_other_index_sizes = sum_of_index_sizes-index->stat_index_size;
table->stat_modified_counter = 0;

When that is done another thread can see that table->stat_initialized==TRUE and then read other fields that have yet to be initialized.

I don't want concurrent threads running dict_update_statistics_low() for the same table because this can cause huge pileups at startup. In my case that can be hundreds of threads trying to open the same table.
[17 Jun 2010 7:05] Vasil Dimov
Hi Mark,

This patch would fix the ordering issue:

--- cut ---
--- storage/innodb_plugin/dict/dict0dict.c	revid:jimmy.yang@oracle.com-20100617021204-x3790up7eeezox2c
+++ storage/innodb_plugin/dict/dict0dict.c	2010-06-17 07:01:47 +0000
@@ -4235,22 +4235,22 @@ dict_update_statistics_low(
 
 	dict_index_stat_mutex_enter(index);
 
 	table->stat_n_rows = index->stat_n_diff_key_vals[
 		dict_index_get_n_unique(index)];
 
-	dict_index_stat_mutex_exit(index);
-
 	table->stat_clustered_index_size = index->stat_index_size;
 
 	table->stat_sum_of_other_index_sizes = sum_of_index_sizes
 		- index->stat_index_size;
 
 	table->stat_initialized = TRUE;
 
 	table->stat_modified_counter = 0;
+
+	dict_index_stat_mutex_exit(index);
 }
--- cut ---

What do you think?

Thank you!
[17 Jun 2010 11:57] Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:martin.skold@mysql.com-20100616204905-jxjg342w35ks9vfy) (merge vers: 5.1.47-ndb-7.0.16) (pib:16)
[17 Jun 2010 12:36] Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:martin.skold@mysql.com-20100615090726-jotpykke96le59w5) (merge vers: 5.1.47-ndb-6.2.19) (pib:16)
[17 Jun 2010 13:23] Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:martin.skold@mysql.com-20100616120453-jh7wr05z1vf7r8pm) (merge vers: 5.1.47-ndb-6.3.35) (pib:16)
[2 Jul 2010 16:07] Mark Callaghan
I thought I replied elsewhere. The latest change does not prevent concurrent callers from calling dict_update_statistics for initialization and all repeating the work which may mean extra IO.

Code in dict_table_get reads table->stat_initialized without holding a mutex.
Concurrent threads can all see a value 0 and then all do the work in dict_update_statistics.

        if (table != NULL) {
                if (!table->stat_initialized) {
                        /* If table->ibd_file_missing == TRUE, this will
                        print an error message and return without doing
                        anything. */
                        dict_update_statistics(table);
                }
        }
[28 Oct 2010 13:16] Vasil Dimov
I wrote a more consistent high level patch that protects all table and index stats, now in review process...
[2 Nov 2010 17:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/122571

3650 Vasil Dimov	2010-11-02
      Fix Bug#53046 dict_update_statistics_low can still be run concurrently on same table
      
      Replace the array of mutexes that used to protect
      dict_index_t::stat_n_diff_key_vals[] with an array of rw locks that protects
      all the stats related members in dict_table_t and all of its indexes.
      
      Approved by:	Jimmy (rb://503)
[3 Nov 2010 9:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/122625

3651 Vasil Dimov	2010-11-03
      Add ChangeLog entry for Bug#53046
[5 Nov 2010 20:47] Mark Callaghan
5.1.52 has code in dict_update_statistics_low that is different from 5.1.50. Is this a race as it is done without calling dict_index_stat_mutex?

                } else {
                        /* If we have set a high innodb_force_recovery
                        level, do not calculate statistics, as a badly
                        corrupted index can cause a crash in it.
                        Initialize some bogus index cardinality
                        statistics, so that the data can be queried in
                        various means, also via secondary indexes. */
                        ulint   i;

                        sum_of_index_sizes++;
                        index->stat_index_size = index->stat_n_leaf_pages = 1;

                        for (i = dict_index_get_n_unique(index); i; ) {
                                index->stat_n_diff_key_vals[i--] = 1;
                        }
                }

                index = dict_table_get_next_index(index);
        } while (index);

        index = dict_table_get_first_index(table);

        dict_index_stat_mutex_enter(index);

        table->stat_n_rows = index->stat_n_diff_key_vals[
                dict_index_get_n_unique(index)];

        dict_index_stat_mutex_exit(index);

        table->stat_clustered_index_size = index->stat_index_size;

        table->stat_sum_of_other_index_sizes = sum_of_index_sizes
                - index->stat_index_size;

        table->stat_initialized = TRUE;

        table->stat_modified_counter = 0;
}
[11 Nov 2010 11:54] Vasil Dimov
Mark, yes, this is a race but I have committed a patch that fixes it in vasil.dimov@oracle.com-20101102165720-164z3666y3tut4c2 in 5.1/plugin.

From the commit message:
  Replace the array of mutexes that used to protect
  dict_index_t::stat_n_diff_key_vals[] with an array of rw locks that protects
  all the stats related members in dict_table_t and all of its indexes.

I will attach the patch to this bug because somehow the commit email did not make it to the mailing list.

Thanks!
[11 Nov 2010 11:55] Vasil Dimov
Sorry, the patch did made it to the mailing list, it is here: http://lists.mysql.com/commits/122571
[13 Nov 2010 16:13] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[13 Nov 2010 16:35] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (pib:21)
[30 Nov 2010 23:26] John Russell
Added to change log:

Improved concurrency when several ANALYZE TABLE or SHOW TABLE STATUS
statements are run simultaneously for InnoDB tables.
[5 Dec 2010 12:40] Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (version source revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (merge vers: 5.6.1) (pib:23)
[15 Dec 2010 5:52] Bugs System
Pushed into mysql-5.1 5.1.55 (revid:sunanda.menon@oracle.com-20101215054055-vgwki317xg1wphhh) (version source revid:sunanda.menon@oracle.com-20101215054055-vgwki317xg1wphhh) (merge vers: 5.1.55) (pib:23)
[16 Dec 2010 22:30] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)