Bug #29493 Falcon: searches fail after partition alteration
Submitted: 2 Jul 2007 16:52 Modified: 22 Nov 2007 11:16
Reporter: Peter Gulutzan Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Partitions Severity:S3 (Non-critical)
Version:6.0.1-alpha-debug OS:Linux (SUSE 10.0 / 64-bit)
Assigned to: Assigned Account CPU Architecture:Any

[2 Jul 2007 16:52] Peter Gulutzan
Description:
I create a partitioned Falcon table.
I insert three rows. Two of them are between 0 and 1.
I search for 'between 0 and 1'. I see two rows.
I alter the table to change the partition count.
I search for 'between 0 and 1'. I see one row.

How to repeat:
mysql> create table t1 (s1 decimal(10,1) unique)
    -> engine=falcon partition by key(s1) partitions 3;
Query OK, 0 rows affected (0.05 sec)

mysql> insert into t1 values (0.5),(0.6),(1.1);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(*) from t1 where s1 between 0 and 1;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> alter table t1 partition by key(s1) partitions 7;
Query OK, 3 rows affected (0.08 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(*) from t1 where s1 between 0 and 1;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)
[2 Jul 2007 18:46] MySQL Verification Team
Thank you for the bug report. Verified as described.

c:\dev\6.0>bin\mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 6.0.1-alpha-nt Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table t1 (s1 decimal(10,1) unique)
    -> engine=falcon partition by key(s1) partitions 3;
Query OK, 0 rows affected (0.54 sec)

mysql> insert into t1 values (0.5),(0.6),(1.1);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(*) from t1 where s1 between 0 and 1;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> alter table t1 partition by key(s1) partitions 7;
Query OK, 3 rows affected (0.17 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(*) from t1 where s1 between 0 and 1;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.01 sec)

mysql>
*************************************************************************
c:\dev\5.1>bin\mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.1.21-beta-nt Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table t1 (s1 decimal(10,1) unique)
    -> engine=MyISAM partition by key(s1) partitions 3;
Query OK, 0 rows affected (0.12 sec)

mysql> insert into t1 values (0.5),(0.6),(1.1);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(*) from t1 where s1 between 0 and 1;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql>  alter table t1 partition by key(s1) partitions 7;
Query OK, 3 rows affected (0.19 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(*) from t1 where s1 between 0 and 1;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.02 sec)

mysql>
[12 Jul 2007 22:53] Kevin Lewis
I highly suspect that this problem is in the server because the Falcon response seems to be the same whether it succeeds or not.  The Falcon index is correct and is usable on other queries.  

Queries like this;
'select * from t1 where s1 < n;' 
have a very strange success range.
  0 <= n <= 0.5    success
  0.5 < n < 1.05   fails
  1.05 <= n < 1.1  succeeds
  1.1 = n          fails
  1.1 < n          succeeds
 
I cannot catch any place in the Falcon code where the comparison is failing like this so it must be in the server code.  It may also be related to the fact that the records are not evenly distributed between the 7 partitions;
1.1 and 0.5 are in partition 0 and 0.6 is in partition 5.
This uneven distribution may also be a symptom of a problem with comparing these decimal numbers with the partition engine.
[13 Jul 2007 15:08] Calvin Sun
It works fine with InnoDB.

mysql> create table t1 (s1 decimal(10,1) unique)
    -> engine=innodb partition by key(s1) partitions 3;
Query OK, 0 rows affected (0.33 sec)

mysql> insert into t1 values (0.5),(0.6),(1.1);
Query OK, 3 rows affected (0.05 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(*) from t1 where s1 between 0 and 1;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> alter table t1 partition by key(s1) partitions 7;
Query OK, 3 rows affected (1.05 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(*) from t1 where s1 between 0 and 1;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.05 sec)
[17 Jul 2007 17:47] Ingo Strüwing
The problem is that ha_partition expects handler::index_read(...key...) to return the smallest key greater or equal to 'key'. But Falcon returns another key, which in this test case is by chance beyond the search range (1.1, range 0..1). This makes ha_partition believe there are no (more) keys in range in this partition.

IMHO the expectations of ha_partition are correct.

Falcon has to deliver the smalles key first.

Just for testing, I disabled the check in ha_partition so that it reads all keys with handler::index_next() until HA_ERR_END_OF_FILE. This made the test case succeed. But I think this is very wrong. Falcon must return the keys in order from smallest to biggest.
[17 Jul 2007 20:25] Kevin Lewis
>From: Jim Starkey
>I hate to disagree, strongly.
>
>Falcon returns HA_REC_NOT_IN_SEQ from handler::table_flags().  This
>explicitly means that Falcon does not return records in index order.
>The bug is that Partition does not respect this flag.

Another problem here is that ha_partition is not sending the selection criteria to Falcon.  If it had, 1.1 would not have been returned.  I saw this while debugging it.  So the partition engine is reading all records from the storage engine and down-selecting by the search criteria based on the assumption that the returned records are in index order in each partition.  I think that this is a performance issue as well as the wrong approach.  The partition engine should allow the storage engine to apply the selection criteria.  If it did, this test would work, I think. 

Kevin Lewis
[18 Jul 2007 9:07] Ingo Strüwing
Can someone please point me at the documentation for the handler
interface, especially the HA_REC_NOT_IN_SEQ flag?

I suspect a misunderstanding here: We are talking about index_next(),
not rnd_next().

In some engines handlers I found comments which seem to say that this
flag is relevant for rnd_next(). None of them mentions index_next() in
this context.

I wonder how the selection criteria could be given to the engine.

The failing statement is
SELECT COUNT(*) FROM t1 WHERE c1 BETWEEN 0 AND 1

It uses an index scan. handler::index_read() accepts one key value only.
In this case it is set to the value 0 (the lowest value of the search
range (0..1)) and the find flag is HA_READ_KEY_OR_NEXT. So it wants all
keys that are greater or equal to 0.

The signature of handler::index_read() does not allow to pass a second
key value. So ha_partition cannot pass the upper bound too.

The only question is if the handler interface is specified so that
::index_read() returns the lowest key value in this context and
::index_next() returns the next key value in ascending order, or if
::index_read() (and hence index_next()) are allowed to return the key
values in random order.
[20 Jul 2007 18:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/31282

ChangeSet@1.2636, 2007-07-20 20:27:16+02:00, istruewing@chilla.local +5 -0
  Bug#29493 - Falcon: searches fail after partition alteration
  
  Incomplete result sets could be returned from a partitioned
  table on a storage engine that does not return keys in order
  in an index scan.
  
  The partition engine assumed that the first call of an index
  scan (handler::index_read()) returns the lowest key from the
  index. This is not guaranteed for engines that set HA_REC_NOT_IN_SEQ
  in their table flags.
  
  Solved in ha_partition::handle_unordered_scan_next_partition():
  When an engine with HA_REC_NOT_IN_SEQ returns a key above range
  from index_read(), continue with index_next() until a key in range
  is returned or no more keys are available.
[22 Nov 2007 11:16] Mattias Jonsson
Duplicate of Bug#30480, will be verified in test case of that bug
[12 Dec 2007 23:00] Bugs System
Pushed into 6.0.5-alpha
[12 Dec 2007 23:02] Bugs System
Pushed into 5.1.23-rc