Bug #53046 | dict_update_statistics_low can still be run concurrently on same table | ||
---|---|---|---|
Submitted: | 22 Apr 2010 5:58 | Modified: | 10 Jan 2011 22:30 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB Plugin storage engine | Severity: | S5 (Performance) |
Version: | plugin 1.0.7 | OS: | Any |
Assigned to: | Vasil Dimov | CPU Architecture: | Any |
[22 Apr 2010 5:58]
Shane Bester
[22 Apr 2010 7:40]
Vasil Dimov
Grab!
[28 Apr 2010 8:50]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/106765 3427 Vasil Dimov 2010-04-28 Fix Bug#53046 dict_update_statistics_low can still be run concurrently on same table Protect dict_index_t::stat_n_diff_key_vals[] with an array of mutexes. Testing: tested all code paths under UNIV_SYNC_DEBUG for the one in dict_print() one has to enable the InnoDB table monitor: CREATE TABLE innodb_table_monitor (a int) ENGINE=INNODB;
[28 Apr 2010 10:36]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/106786 3430 Vasil Dimov 2010-04-28 Bug#53046 dict_update_statistics_low can still be run concurrently on same table Followup to vasil.dimov@oracle.com-20100428102033-dt3caf531rs3lidr : Add more asserions, which I forgot.
[5 May 2010 15:15]
Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:kristofer.pettersson@sun.com-20100503172109-f9hracq5pqsaomb1) (merge vers: 5.1.47) (pib:16)
[28 May 2010 5:55]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:alik@sun.com-20100512070920-xgpmqeytp0gc183c) (pib:16)
[28 May 2010 6:24]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:alik@sun.com-20100507093037-7cykrx1n73v0tetc) (merge vers: 6.0.14-alpha) (pib:16)
[28 May 2010 6:52]
Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:alexey.kopytov@sun.com-20100507164602-8w09samq3mpvbxbn) (merge vers: 5.5.5-m3) (pib:16)
[8 Jun 2010 22:59]
Mark Callaghan
The race is still there. dict_table_get() checks table->stat_initialized without locking and then calls dict_update_statistics() when stat_initialized == 0. dict_update_statistics_low() can then do too much work when several sessions call it concurrently for the same index. The facebook patch has a fix without this race: http://bazaar.launchpad.net/~mysqlatfacebook/mysqlatfacebook/5.1/revision/3470 I think your change can work if you increase the duration for which dict_index_stat_mutex_enter/exit are called. Your change allows stats to be concurrently collected for different indexes on a table. I am not sure that is worth supporting.
[8 Jun 2010 23:11]
Mark Callaghan
This fix also added use of the mutex for reads done by ha_innobase::info I am wary of adding another source of mutex contention
[8 Jun 2010 23:26]
Mark Callaghan
The assignments at the end of dict_update_statistics_low are another race. There are no pthread mutex calls that follow them. Another thread can observe stat_initialized=TRUE without observing the other writes. Read the C++ double-checked locking paper for details. index = dict_table_get_first_index(table); dict_index_stat_mutex_enter(index); table->stat_n_rows = index->stat_n_diff_key_vals[ dict_index_get_n_unique(index)]; dict_index_stat_mutex_exit(index); table->stat_clustered_index_size = index->stat_index_size; table->stat_sum_of_other_index_sizes = sum_of_index_sizes - index->stat_index_size; table->stat_initialized = TRUE; table->stat_modified_counter = 0; }
[9 Jun 2010 6:57]
Vasil Dimov
Hi Mark, The original problem was about concurrently read/writing the index->stat_n_diff_key_vals[] array. Yes, dict_update_statistics_low() can be called concurrently on the same index. But is this a real problem, these statistics are only estimates anyway? I think eventually the whole body of dict_update_statistics_low() could be protected by dict_index_stat_mutex_enter/exit if this is a real problem. Using a few mutexes instead of a single one (see dict_index_stat_mutex_enter()) is exactly to avoid another source of mutex contention. You write "Another thread can observe stat_initialized=TRUE without observing the other writes", I do not understand how this could happen since stat_initialized = TRUE is done at the end, can you elaborate? What is "the C++ double-checked locking paper"? Thanks!
[9 Jun 2010 12:43]
James Day
Vasil, see section 6 on page 11 of this article: http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf . I'm assuming that Mark is referring to the cache coherency and reordering issue that makes it possible for table->stat_clustered_index_size and table->stat_sum_of_other_index_sizes assignment to happen in CPU caches visible to other threads after table->stat_initialized = TRUE is set instead of before.
[9 Jun 2010 14:55]
Mark Callaghan
Writes can be reordered on architectures other than x86. This is the code at the end of dict_update_statistics_low(): table->stat_clustered_index_size = index->stat_index_size; table->stat_sum_of_other_index_sizes = sum_of_index_sizes-index->stat_index_size; table->stat_initialized = TRUE; table->stat_modified_counter = 0; With write reordering it can be observed by another thread as: table->stat_initialized = TRUE; table->stat_clustered_index_size = index->stat_index_size; table->stat_sum_of_other_index_sizes = sum_of_index_sizes-index->stat_index_size; table->stat_modified_counter = 0; When that is done another thread can see that table->stat_initialized==TRUE and then read other fields that have yet to be initialized. I don't want concurrent threads running dict_update_statistics_low() for the same table because this can cause huge pileups at startup. In my case that can be hundreds of threads trying to open the same table.
[17 Jun 2010 7:05]
Vasil Dimov
Hi Mark, This patch would fix the ordering issue: --- cut --- --- storage/innodb_plugin/dict/dict0dict.c revid:jimmy.yang@oracle.com-20100617021204-x3790up7eeezox2c +++ storage/innodb_plugin/dict/dict0dict.c 2010-06-17 07:01:47 +0000 @@ -4235,22 +4235,22 @@ dict_update_statistics_low( dict_index_stat_mutex_enter(index); table->stat_n_rows = index->stat_n_diff_key_vals[ dict_index_get_n_unique(index)]; - dict_index_stat_mutex_exit(index); - table->stat_clustered_index_size = index->stat_index_size; table->stat_sum_of_other_index_sizes = sum_of_index_sizes - index->stat_index_size; table->stat_initialized = TRUE; table->stat_modified_counter = 0; + + dict_index_stat_mutex_exit(index); } --- cut --- What do you think? Thank you!
[17 Jun 2010 11:57]
Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:martin.skold@mysql.com-20100616204905-jxjg342w35ks9vfy) (merge vers: 5.1.47-ndb-7.0.16) (pib:16)
[17 Jun 2010 12:36]
Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:martin.skold@mysql.com-20100615090726-jotpykke96le59w5) (merge vers: 5.1.47-ndb-6.2.19) (pib:16)
[17 Jun 2010 13:23]
Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:martin.skold@mysql.com-20100616120453-jh7wr05z1vf7r8pm) (merge vers: 5.1.47-ndb-6.3.35) (pib:16)
[2 Jul 2010 16:07]
Mark Callaghan
I thought I replied elsewhere. The latest change does not prevent concurrent callers from calling dict_update_statistics for initialization and all repeating the work which may mean extra IO. Code in dict_table_get reads table->stat_initialized without holding a mutex. Concurrent threads can all see a value 0 and then all do the work in dict_update_statistics. if (table != NULL) { if (!table->stat_initialized) { /* If table->ibd_file_missing == TRUE, this will print an error message and return without doing anything. */ dict_update_statistics(table); } }
[28 Oct 2010 13:16]
Vasil Dimov
I wrote a more consistent high level patch that protects all table and index stats, now in review process...
[2 Nov 2010 17:04]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/122571 3650 Vasil Dimov 2010-11-02 Fix Bug#53046 dict_update_statistics_low can still be run concurrently on same table Replace the array of mutexes that used to protect dict_index_t::stat_n_diff_key_vals[] with an array of rw locks that protects all the stats related members in dict_table_t and all of its indexes. Approved by: Jimmy (rb://503)
[3 Nov 2010 9:31]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/122625 3651 Vasil Dimov 2010-11-03 Add ChangeLog entry for Bug#53046
[5 Nov 2010 20:47]
Mark Callaghan
5.1.52 has code in dict_update_statistics_low that is different from 5.1.50. Is this a race as it is done without calling dict_index_stat_mutex? } else { /* If we have set a high innodb_force_recovery level, do not calculate statistics, as a badly corrupted index can cause a crash in it. Initialize some bogus index cardinality statistics, so that the data can be queried in various means, also via secondary indexes. */ ulint i; sum_of_index_sizes++; index->stat_index_size = index->stat_n_leaf_pages = 1; for (i = dict_index_get_n_unique(index); i; ) { index->stat_n_diff_key_vals[i--] = 1; } } index = dict_table_get_next_index(index); } while (index); index = dict_table_get_first_index(table); dict_index_stat_mutex_enter(index); table->stat_n_rows = index->stat_n_diff_key_vals[ dict_index_get_n_unique(index)]; dict_index_stat_mutex_exit(index); table->stat_clustered_index_size = index->stat_index_size; table->stat_sum_of_other_index_sizes = sum_of_index_sizes - index->stat_index_size; table->stat_initialized = TRUE; table->stat_modified_counter = 0; }
[11 Nov 2010 11:54]
Vasil Dimov
Mark, yes, this is a race but I have committed a patch that fixes it in vasil.dimov@oracle.com-20101102165720-164z3666y3tut4c2 in 5.1/plugin. From the commit message: Replace the array of mutexes that used to protect dict_index_t::stat_n_diff_key_vals[] with an array of rw locks that protects all the stats related members in dict_table_t and all of its indexes. I will attach the patch to this bug because somehow the commit email did not make it to the mailing list. Thanks!
[11 Nov 2010 11:55]
Vasil Dimov
Sorry, the patch did made it to the mailing list, it is here: http://lists.mysql.com/commits/122571
[13 Nov 2010 16:13]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[13 Nov 2010 16:35]
Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (pib:21)
[30 Nov 2010 23:26]
John Russell
Added to change log: Improved concurrency when several ANALYZE TABLE or SHOW TABLE STATUS statements are run simultaneously for InnoDB tables.
[5 Dec 2010 12:40]
Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (version source revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (merge vers: 5.6.1) (pib:23)
[15 Dec 2010 5:52]
Bugs System
Pushed into mysql-5.1 5.1.55 (revid:sunanda.menon@oracle.com-20101215054055-vgwki317xg1wphhh) (version source revid:sunanda.menon@oracle.com-20101215054055-vgwki317xg1wphhh) (merge vers: 5.1.55) (pib:23)
[16 Dec 2010 22:30]
Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)