Bug #35745 SELECT COUNT(*) is not correct for some partitioned tables.
Submitted: 1 Apr 2008 16:21 Modified: 23 Jul 2008 12:41
Reporter: Tim Clark Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Partitions Severity:S2 (Serious)
Version:5.1.23 OS:Any
Assigned to: Mattias Jonsson
Tags: count, partition, records
Triage: D2 (Serious)

[1 Apr 2008 16:21] Tim Clark
Description:
SELECT COUNT(*) is incorrect for a table that is partitioned using a storage engine that has HA_HAS_RECORDS but not HA_STATS_RECORDS_IS_EXACT.

The partition handler bases its table flags on the underlying storage engines' flags. Because the underlying storage engine sets HA_HAS_RECORDS, the partition handler also reports HA_HAS_RECORDS. However, HA_STATS_RECORDS_IS_EXACT is not set. Under these conditions, when opt_sum_query() optimizes the SELECT COUNT(*) query, it chooses to call get_exact_record_count(), which in turn calls table->file->records().

However, the partition handler does not implement records(), so instead of accumulating the value of underlying records() implementations (as it should), the default handler::records() function is called, which returns the value of stats.records. This is unreliable for a storage engine that does not have HA_STATS_RECORDS_IS_EXACT.

How to repeat:
This was discovered on a storage engine that is under development. None of the included storage engines appear to have the necessary combination of table flags to recreate this. However, it could probably be recreated by manually tweaking the results of handler::table_flags() with a debugger during runtime and executing the following query:
CREATE TABLE t1 (a int not null,b int not null,c int not null,
primary key(a,b)) partition by key (a);
insert into t1 values (1,1,1),(1,2,3), (2,3,4);
select count(*) from t1; 
+----------+
| count(*) |
+----------+
|        2 |
+----------+

Suggested fix:
ha_partition should implement a records function which, when (cached_table_flags & HA_HAS_RECORDS) is true, should use each partition's underlying handler's records() function to accumulate the total number of records in the table.
[1 Apr 2008 18:17] Miguel Solorzano
Thank you for the bug report. I wasn't able to repeat with server built from
current source tree, so I assume it was fixed somehow. Please build from
source or wait the next release. Thanks in advance.

[miguel@amanhecer dbs]$ 5.0/bin/mysqladmin -uroot create db9
[miguel@amanhecer dbs]$ 5.0/bin/mysql -uroot db9
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.24-rc-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> CREATE TABLE t1 (a int not null,b int not null,c int not null,
    -> primary key(a,b)) partition by key (a);
Query OK, 0 rows affected (0.04 sec)

mysql> insert into t1 values (1,1,1),(1,2,3), (2,3,4);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(*) from t1; 
+----------+
| count(*) |
+----------+
|        3 | 
+----------+
1 row in set (0.00 sec)
[1 Apr 2008 19:20] Tim Clark
It appears that the archive storage engine is the only engine shipped in the base source that can be used to reproduce this. Please set a breakpoint in ha_archive::table_flags and try the following query:

CREATE TABLE t1 (a int not null,b int not null,c int not null) engine=archive
partition by range(a)
( partition p0 values less than (2), partition p1 values less than (3));
insert into t1 values (1,1,1),(1,2,3), (2,3,4);
select count(*) from t1;

Whenever the breakpoint is hit, modify the value returned by table_flags so that it does not have HA_STATS_RECORDS_IS_EXACT (1 << 13) set. The result will be that the SELECT COUNT statement will return 2, as explained above.
[2 Apr 2008 9:14] Mattias Jonsson
Tim, is it possible to get some more information about your storage engine? (i.e. sources).

I have verified that a partitioned archive table will give wrong result to count(*) when it does not support HA_STATS_RECORDS_IS_EXACT.

If you have a support contract, please report this, since it will add more attention for the problem.
[2 Apr 2008 9:53] Mattias Jonsson
I have a patch that implement ha_partition::records(), will commit soon
[2 Apr 2008 12:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/44780

ChangeSet@1.2576, 2008-04-02 14:09:46+02:00, mattiasj@client-10-129-10-147.upp.off.mysql.com +3 -0
  Bug#35745: SELECT COUNT(*) is not correct for some partitioned tables.
  
  problem was that ha_partition::records was not implemented, thus
  using the default handler::records, which is not correct if the engine
  does not support HA_STATS_RECORDS_IS_EXACT.
  
  Solution was to implement ha_partition::records as a wrapper around
  the underlying partitions records, including checking pruning.
[7 Apr 2008 10:30] Mattias Jonsson
I am removing the bitmap_is_set from ha_partition::records()
[8 Apr 2008 8:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45039

ChangeSet@1.2576, 2008-04-08 10:02:44+02:00, mattiasj@witty. +8 -0
  Bug#35745: SELECT COUNT(*) is not correct for some partitioned tables.
  
  problem was that ha_partition::records was not implemented, thus
  using the default handler::records, which is not correct if the engine
  does not support HA_STATS_RECORDS_IS_EXACT.
  
  Solution was to implement ha_partition::records as a wrapper around
  the underlying partitions records.
  
  The rows column in explain partitions will now include the total
  number of records in the partitioned table.
[5 Jun 2008 18:12] Sveta Smirnova
There is related bug #37219
[6 Jun 2008 11:37] Sveta Smirnova
There is related bug #37235
[23 Jun 2008 10:26] Mattias Jonsson
bug#37219 and bug#37235 is not related to this, they are duplicates of bug#35931.
[7 Jul 2008 20:14] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49130

2690 Mattias Jonsson	2008-07-07
      Bug#35745: SELECT COUNT(*) is not correct for some partitioned tables.
      
      problem was that ha_partition::records was not implemented, thus
      using the default handler::records, which is not correct if the engine
      does not support HA_STATS_RECORDS_IS_EXACT.
      
      Solution was to implement ha_partition::records as a wrapper around
      the underlying partitions records.
      
      The rows column in explain partitions will now include the total
      number of records in the partitioned table.
[8 Jul 2008 10:51] Mattias Jonsson
Pushed into mysql-6.0-bugteam and mysql-5.1-bugteam
[8 Jul 2008 10:54] Mattias Jonsson
The part that changed the rows column in EXPLAIN PARTITION for partitioned tables should be handled by bug#37532.
[23 Jul 2008 9:16] Georgi Kodinov
Pushed into 5.1.28 and 6.0.7-alpha
[23 Jul 2008 12:41] Jon Stephens
Documented bugfix in the 5.1.28 and 6.0.7 changelogs as follows:

        SELECT COUNT(*) was not correct for some partitioned tables using a 
        storage engine that did not support HA_STATS_RECORDS_IS_EXACT. Tables 
        using the ARCHIVE storage engine were known to be affected.
        
        This was because ha_partition::records was not implemented, and so
        the default handler::records was used in its place. However, this is 
        not correct behavior if the storage engine does not support 
        HA_STATS_RECORDS_IS_EXACT.
        
        The ssolution was to implement ha_partition::records as a wrapper 
        around the underlying partition records.
        
        As a result of this fix, the rows column in the output of EXPLAIN 
        PARTITIONS now includes the total number of records in the partitioned 
        table.
[28 Jul 2008 16:46] Bugs System
Pushed into 5.1.28  (revid:joerg@mysql.com-20080714105031-88hmr2baz5di9xej) (version source revid:joerg@mysql.com-20080714105031-88hmr2baz5di9xej) (pib:3)