Bug #60071 | QUERYING I_S.PARTITIONS CHANGES THE CARDINALITY OF THE PARTITIONS. | ||
---|---|---|---|
Submitted: | 10 Feb 2011 0:50 | Modified: | 18 Sep 2013 12:25 |
Reporter: | Matthew Montgomery | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Partitions | Severity: | S3 (Non-critical) |
Version: | 5.5.8 | OS: | Any |
Assigned to: | Mattias Jonsson | CPU Architecture: | Any |
[10 Feb 2011 0:50]
Matthew Montgomery
[7 Mar 2011 17:52]
Vasil Dimov
Simpler test case (this is not related to innodb_stats_on_metadata): DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( c1 int, c2 int, PRIMARY KEY (c1, c2) ) ENGINE = InnoDB PARTITION BY LIST (c1) (PARTITION p0 VALUES IN (0) ENGINE = InnoDB, PARTITION p1 VALUES IN (1) ENGINE = InnoDB); INSERT INTO t1 VALUES (0, 0); INSERT INTO t1 SELECT 0, c2+1 FROM t1; INSERT INTO t1 SELECT 0, c2+2 FROM t1; INSERT INTO t1 SELECT 0, c2+4 FROM t1; INSERT INTO t1 SELECT 0, c2+8 FROM t1; ANALYZE TABLE t1; SELECT cardinality FROM information_schema.statistics WHERE table_name = 't1'; SELECT partition_name, table_rows FROM information_schema.partitions WHERE table_name = 't1'; SELECT cardinality FROM information_schema.statistics WHERE table_name = 't1'; ANALYZE TABLE t1; SELECT cardinality FROM information_schema.statistics WHERE table_name = 't1';
[7 Mar 2011 18:14]
Vasil Dimov
The data for information_schema.statistics.cardinality is fetched by sql_show.cc/get_schema_stat_record() (line numbers from 5.5 tree): 4886 KEY *key=show_table->key_info+i; 4887 if (key->rec_per_key[j]) 4888 { 4889 ha_rows records=(show_table->file->stats.records / 4890 key->rec_per_key[j]); 4891 table->field[9]->store((longlong) records, TRUE); show_table->key_info[].rec_per_key[] is initialized by ha_innobase::info(), called from ha_partition.cc/ha_partition::get_dynamic_partition_info(): 5396 void ha_partition::get_dynamic_partition_info(PARTITION_STATS *stat_info, 5397 uint part_id) 5398 { 5399 handler *file= m_file[part_id]; 5400 file->info(HA_STATUS_CONST | HA_STATUS_TIME | HA_STATUS_VARIABLE | 5401 HA_STATUS_VARIABLE_EXTRA | HA_STATUS_NO_LOCK); In our test case we have a single index (primary key) on two columns and the table is partitioned in two partitions where all 16 rows are in the first partition. When SELECT partition_name, table_rows FROM information_schema.partitions WHERE table_name = 't1'; is executed then ha_partition::get_dynamic_partition_info() is called two times and we have: Breakpoint 5, ha_partition::get_dynamic_partition_info (this=0x803e53410, stat_info=0x7ffffddac150, part_id=0) at /bzrroot/server/mysql-5.5-innodb/sql/ha_partition.cc:5401 5401 HA_STATUS_VARIABLE_EXTRA | HA_STATUS_NO_LOCK); (gdb) p file $278 = (handler *) 0x803e53810 (gdb) p file->table $279 = (TABLE *) 0x8037cd700 (gdb) p file->table->alias $280 = 0x803e99184 "t1" (gdb) p ((ha_innobase*)file)->prebuilt->table->name $281 = 0x803638370 "test/t1#P#p0" (gdb) p file->table->key_info[0].rec_per_key $282 = (ulong *) 0x803e53330 (gdb) Continuing. Breakpoint 5, ha_partition::get_dynamic_partition_info (this=0x803e53410, stat_info=0x7ffffddac150, part_id=1) at /bzrroot/server/mysql-5.5-innodb/sql/ha_partition.cc:5401 5401 HA_STATUS_VARIABLE_EXTRA | HA_STATUS_NO_LOCK); (gdb) p file $283 = (handler *) 0x803e52810 (gdb) p file->table $284 = (TABLE *) 0x8037cd700 (gdb) p file->table->alias $285 = 0x803e99184 "t1" (gdb) p ((ha_innobase*)file)->prebuilt->table->name $286 = 0x803638360 "test/t1#P#p1" (gdb) p file->table->key_info[0].rec_per_key $287 = (ulong *) 0x803e53330 (gdb) Notice that the handler object is different in both calls but the ::table member is the same. So the second call to ::info() overwrites the rec_per_key array with the data from the second partition (which from InnoDB pov is a separate table). This is a bug in the partitioning code. I am not sure how it should be fixed.
[7 Mar 2011 18:30]
Vasil Dimov
Workaround: don't query information_schema.partitions. This may turn out to be a more widespread problem because the storage engine (at least innodb) provides stats for each partition separately (because from its point of view a partition is a separate table) but stats are stored in table->key_info[] which is per-table, not per partition. So the partitioning code should aggregate somehow stats from all partitions and store them in the common table->key_info[].
[7 Mar 2011 22:18]
Mattias Jonsson
I_S seems to query each partition in partitioning order. Resulting in it will always set key statistics from the last partition. So the bug is not related to innodb_stat_on_metadata, but that I_S.PARTITIONS will always set the key distribution to the last partition. (Related to bug#44059). Is key distribution/cardinality (as SQL-layer table->key_info[].rec_per_key[]) a different thing than 'innodb table/index statistics'/innodb_stats_on_metadata (as in dict_update_statistics())? It seems so!? The fix should be to always set the key distribution to the partitions with the most rows in it. There are two solutions as I see it: 1) HA_STATUS_CONST is never called per partition, only on the ha_partition/table level (i.e called on the partition with the highest number of rows). This will result in block_size, create_time and ref_length will always be taken from the partition with the most rows in it, and be set the same for all partitions. Will be faster since HA_STATUS_CONST may be 'expensive' 2) The last thing in get_schema_partitions_record table->file->info(HA_STATUS_CONST) is called, to reset the statistics from the partitions with the most rows. Always, error or not! This will be a bit slower, but always give correct result for I_S.PARTITIONS.CREATE_TIME column. Note however that InnoDB sets stats.create_time on HA_STATUS_TIME, and my_base.h says it should be set on HA_STATUS_CONST :) I.e InnoDB will show correct create_time anyway! Another note is that ha_innodb::analyze is implemented through an info() call, resulting in 'ALTER TABLE t ANALYZE PARTITION p' will set the key statistics of the table from the specified partition! Yet another note is that after WL#4443 is done, one could store key statistics per partition and set the table's key statistic to the USED partition with the most rows per statement, instead of always having it set to the partition with the most rows regardless if that partition is used or not.
[18 Feb 2013 13:25]
Joe Grasse
Just curious about the status of this bug. I would like to be able to query the i_s.partitions table without the cardinality of the table being changed. Could at least fix this bug first, and then open a separate bug for the issue of using the stats from the last partition.
[28 May 2013 17:24]
Justin Swanhart
Please fix the title of this bug. It is incorrect. It has nothing to do with innodb_stats_on_metadata and instead on a) the info() call on the handlerton b) picking a partition to pick stats from
[18 Sep 2013 12:25]
Jon Stephens
Fixed in 5.5+. Documented as follows in the 5.5.34, 5.6.14, and 5.7.2 changelogs: Following any query on the INFORMATION_SCHEMA.PARTITIONS table, InnoDB index statistics as shown in the output of statements such as SHOW INDEX were updated, even with innodb_stats_on_metadata=OFF. Closed.
[19 Sep 2013 9:11]
Jon Stephens
Updated changelog entry, which now reads: Following any query on the INFORMATION_SCHEMA.PARTITIONS table, InnoDB index statistics as shown in the output of statements such as SELECT * FROM INFORMATION_SCHEMA.STATISTICS were read from the last partition, instead of from the partition containing the greatest number of rows. Status unchanged.
[23 Sep 2013 15:08]
Jon Stephens
BUG#69179 is a duplicate of this bug.
[24 Sep 2013 14:54]
Laurynas Biveinis
5.5$ bzr log -r 4423 ------------------------------------------------------------ revno: 4423 committer: Aditya A <aditya.a@oracle.com> branch nick: mysql-5.5 timestamp: Mon 2013-07-29 11:41:13 +0530 message: Bug #11766851 QUERYING I_S.PARTITIONS CHANGES THE CARDINALITY OF THE PARTITIONS. ANALYSIS -------- Whenever we query I_S.partitions, ha_partition::get_dynamic_partition_info() is called which resets the cardinality according to the number of rows in last partition. Fix --- When we call get_dynamic_partition_info() avoid passing the flag HA_STATUS_CONST to info() since HA_STATUS_CONST should ideally not be called for per partition. [Approved by mattiasj rb#2830 ]