Bug #40607 Falcon unsigned indexes broken
Submitted: 10 Nov 2008 0:43 Modified: 15 May 2009 14:16
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0-falcon-team OS:Any
Assigned to: Lars-Erik Bjørk
Tags: F_ENCODING
Triage: Triaged: D2 (Serious)

[10 Nov 2008 0:43] Philip Stoev
Description:
Falcon decimal unsigned indexes are broken -- even very simple updates are not reflected in the database.

How to repeat:
DROP TABLE IF EXISTS t1;
CREATE TABLE `t1` (
        `decimal_unsigned_key` decimal unsigned,
        key (`decimal_unsigned_key` )
);

INSERT IGNORE INTO t1 VALUES ('8');
UPDATE `t1` SET `decimal_unsigned_key` = 9 WHERE `decimal_unsigned_key` > 7;
SELECT * FROM t1;

The update will not update any records and the SELECT will return 8 rather than the correct value of 9.
[10 Nov 2008 0:52] Philip Stoev
This issue is also observed with bigint unsigned columns. Maybe all integer types are affected?
[10 Nov 2008 15:11] Philip Stoev
Here is an example with bigint unsigned. The constant in WHERE must be large for this to show up, so please file a separate bug if this is not the same issue.

CREATE TABLE `table10` (
        `bigint_unsigned_key` bigint unsigned,
        key (`bigint_unsigned_key` )
);

INSERT IGNORE INTO table10 VALUES ('9');

UPDATE `table10` SET `bigint_unsigned_key` = 10 WHERE `bigint_unsigned_key` < 1.65025801745927e+19;
SELECT * FROM table10;

The update fails to match any rows in Falcon, and matches correctly in Innodb.
[22 Nov 2008 9:16] Lars-Erik Bjørk
These are two unrelated bugs. The case for unsigned bigint now has its own bug report, bug#40950.
[22 Nov 2008 9:39] Lars-Erik Bjørk
First, let's start with the key format as described in sql/opt_range.cc:

/*
<snip>

The tuple is a sequence of key part values. The length of key part value
depends only on its type (and not depends on the what value is stored)

KeyTuple: keypart1-data, keypart2-data, ...

The value of each keypart is stored in the following format:

keypart_data: [isnull_byte] keypart-value-bytes

If a keypart may have a NULL value (key_part->field->real_maybe_null()
can be used to check this), then the first byte is a NULL indicator with
the following valid values:
   1  - keypart has NULL value.
   0  - keypart has non-NULL value.

<questionable-statement> 
If isnull_byte==1 (NULL value), then the following keypart->length bytes
must be 0.
</questionable-statement>

<snip>
*/

I have debugged this, using the following query:

select decimal_unsigned_key from t1 where decimal_unsigned_key > 5;

and

insert into table t1 values (5);

1) Searhing for records:
-------------------------
In StorageInterface::read_range_first we receive a 'key_range' struct, which among other things, has the member 'key'. This key is passed on as an argument to 
StorageTable::setIndexBounds(..) -> StorageDatabase::makeKey(...)

Here we do: const UCHAR *p = key , and then examine our pointer p a little more closely.

We check the segment to see if there is a possible nullByte here and in that case advance the pointer (later on when we copy the values, we check that the length of the key is 5 btw ...).  Before doing this, I
have been looking more closely at the pointer:

(gdb) x/tb p
0x37aa890:	00000000  

Ok, this is the null-byte. We don't have any null-value in our key, so
this seems correct

(gdb) x/tb p+1
0x37aa891:	10000000

This, to me, is a strange mystery byte. First I believed that this byte should not be here, but after debugging the insertion of a record and exchanging some mail with Svoj, Sergei and Guilhem, this seems to be some special MySQL format to say that the value is positive.

(gdb) x/tb p+2
0x37aa892:	00000000

Ok, part of key

(gdb) x/tb p+3
0x37aa893:	00000000

Ok, part of key

(gdb) x/tb p+4
0x37aa894:	00000000

Ok, part of key

(gdb) x/tb p+5
0x37aa895:	00000101

Ok, the actual value of the key, finally... 5, as expected. This means that the key in a binary (almost) human readable (big-endian) form is (given the length is 5 as I claimed):

10000000 00000000 00000000 00000000 00000101

Then we call getSegmentValue(...) passing the ptr 'p' as an argument. Here we have the following snippet of code:

case HA_KEYTYPE_BINARY:
	if <snip>
	else if (segment->isUnsigned)
		{
		int64 number = 0;
				
		for (int n = 0; n < length; ++n)
			number = number << 8 | *ptr++;
			
		value->setValue(number);

[ (gdb) p length  $3 = 5, btw]

Here we start extracting the value from the key. This should end up with 'number' being 5, but because we have an unexpected bit (at least it does not look like Falcon expects it) at the very beginning, this will not be 5, but is instead 549755813893 (ooops, a tad too high...)
No wonder we don't find any records when we search for values greater than 549755813893.

2) Inserting a record
----------------------

In StorageInterface::encodeRecord, I had a look at field->ptr which is what the value we are inserting looks like when it is coming from the server:

(gdb) x/tb field->ptr
0x37b97d9:	10000000
(gdb) x/tb field->ptr+1
0x37b97da:	00000000
(gdb) x/tb field->ptr+2
0x37b97db:	00000000
(gdb) x/tb field->ptr+3
0x37b97dc:	00000000
(gdb) x/tb field->ptr+4
0x37b97dd:	00000101

This does indeed look familiar ... It has the same "strange" bit set. This is the way this value is treated before being inserted into Falcon:

<snip>
case MYSQL_TYPE_NEWDECIMAL:
{
	int precision = ((Field_new_decimal *)field)->precision;
	int scale = ((Field_new_decimal *)field)->dec;

	if (precision < 19)
		{
		int64 value = ScaledBinary::getInt64FromBinaryDecimal((const char *)
field->ptr,								precision, scale);   <--- This is interesting ...

dataStream->encodeInt64(value, scale);

Here Falcon does something interesting with the value pointed to by ptr. I did not quite follow the intention of everything here, but I noticed the following in a method that is called further down the stack:

uint ScaledBinary::getByte(int position, const char* ptr)
{
	char c = ptr[position];
	
	if (!(ptr[0] & 0x80))
		c = ~c;
	
	if (position == 0)
		c ^= 0x80;    <-- This one
	
	return (unsigned char) c;	
}

The marked line actually flips the mysterious bit. Falcon handles the bit when inserting the value, but not in the search key. We store values in a different format than what we search for later.

A possible way to solve this is to either:

* Handle the key the same way as the value, or
* Stop flipping the bit when insert the value in Falcon

The latter is probably the best choice.
[11 Dec 2008 8:00] Lars-Erik Bjørk
After discussing with Jim, this seems to be what is happening (slightly edited copy&paste from e-mails):

   1. The server maps "decimal unsigned" to "decimal (10, 0) unsigned"
      (why it didn't map to 64 bit binary is a mystery)

   2. StorageInterface::genTypes maps this to Falcon "numeric (10, 0)"

   3. Falcon says, "hey, numeric (10,x) fits in a 64 bit int.  Cool."

   4. StorageInterface::encodeRecord and StorageInterface::decodeRecord
      map data between the server type MYSQL_TYPE_NEWDECIMAL and the
      internal Falcon type (no problem here).

   5. When StorageDatabase::getSegmentValue tries to translate a key
      from server format to Falcon format, it sees of key of type
      HA_KEYTYPE_BINARY with a Falcon type of 64 bit binary and does the
      obvious but wrong thing.

We should get the server key decode logic out of the runtime 
(StorageDatabase::getSegmentValue) and into metadata handling 
(StorageInterface::getKeyDesc), which has the information necessary to 
figure out the actual key format.  It should map the the key into a set 
of key formats that is saved in StorageSegment and used in a single switch statement in StorageDatabase::getSegmentValue.

This, happily, will both get the right answer and be faster than the 
existing runtime test.  Unfortunately, it probably won't be sufficiently 
faster to measure
[21 Jan 2009 9:47] Lars-Erik Bjørk
Was reviewed by Jim
[17 Apr 2009 9:30] Lars-Erik Bjørk
Pushed into 6.0.11
[15 May 2009 14:16] MC Brown
An entry has been added to the 6.0.11 changelog: 

Indexes on Falcon tables using numeric columns could return incorrect information.