Bug #38304 Data contents becomes NULL if column name in Falcon table uses accented letters
Submitted: 23 Jul 2008 1:05 Modified: 9 Jan 2009 14:02
Reporter: Hema Sridharan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:mysql-6.0, 6.0.7-bzr OS:Linux
Assigned to: Lars-Erik Bjørk CPU Architecture:Any

[23 Jul 2008 1:05] Hema Sridharan
Description:
1) I create database and table with falcon storage engine.
2) I create column name in table using accented letters.
3) I insert some data contents in to the table.
4) After performing select operation, all the data contents becomes NULL.

USE test;
CREATE TABLE t5(`ë`  char(20) CHARACTER SET utf8)ENGINE=FALCON;
INSERT INTO  t5 VALUES
('á'),( 'Ë'),('Ö'),('Ä'),('Üä'),('¶'),('П'),('Ф'),('щҖ'),('βϋ');
SELECT * FROM t5;
CREATE TABLE t6(`Üä`  char(20) CHARACTER SET utf8)ENGINE=FALCON;
INSERT INTO t6 VALUES('a'),('b');
SELECT * FROM t6;

How to repeat:
mysql> USE test;
Database changed
mysql> CREATE TABLE t5(`ë`  char(20) CHARACTER SET utf8)ENGINE=FALCON;
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO  t5 VALUES
    -> ('á'),( 'Ë'),('Ö'),('Ä'),('Üä'),('¶'),('П'),('Ф'),('щҖ'),('βϋ');
Query OK, 10 rows affected (0.01 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM t5;
+------+
| ë    |
+------+
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
+------+
10 rows in set (0.00 sec)

mysql> CREATE TABLE t6(`Üä`  char(20) CHARACTER SET utf8)ENGINE=FALCON;
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO t6 VALUES('a'),('b');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM t6;
+------+
| Üä   |
+------+
| NULL |
| NULL |
+------+
2 rows in set (0.00 sec)

Note: This does not happen for other storage engines(Myisam, Innodb , Memory)
[23 Jul 2008 3:25] Valeriy Kravchuk
Thank you for a problem report. What exact version of MySQL server, 6.0.x, you had used? I see different results with 6.0.5, for example.
[25 Jul 2008 17:54] Valeriy Kravchuk
Verified just as described with recent 6.0.7 from bzr on Linux.
[4 Sep 2008 13:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53262

2814 lars-erik.bjork@sun.com	2008-09-04
      This is a fix for bug#38304 - Data contents becomes NULL if
      column name in Falcon table uses accented letters.
      
      In StorageInterface::encodeRecord the value for fields with
      international character(s) in their names are coded as
      NULL (encodeNULL). This is because there is no entry for these
      fields in the fieldMap. The values for these fields are therefore
      never written to the database.
      
      When populating the fieldMap ( in StorageInterface::mapFields),
      the fieldId of every field is looked up by calling
      StorageTableShare::getFieldId, which in turn calls
      Table::findField.
      
      Table::findField, is not able to match the field name looked up
      in the SymbolManager (SymbolManager::getSymbol), with the field
      name in the Table.
      
      This is because SymbolManager::getSymbol tries to uppercase
      every Symbol it looks up, but the uppercasing does not handle
      international characters. The name it tries to look up, and
      eventually returns, is the partial name of the field, up to the
      first international character.
      
      Because we "can't find" the correct field, a NULL entry is added
      to the fieldMap for this field.
      
      According to Jim Starkey, the fix for this would be to replace
      the call to SymbolManager::getSymbol, with a call to
      SymbolManager::getString, which does not uppercase the input.
      
      This seems to work, and I have added a test-case that tests this
      for different combinations of the SQL modes (also suggested by
      Jim)
      
      There are some twenty different SQL modes, which would result in
      millions of different combinations, so I have picked the modes
      that to the best of my knowledge, are the most interesting ones.
[8 Sep 2008 11:51] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53517

2812 lars-erik.bjork@sun.com	2008-09-08
      This is a fix for bug#38304 - Data contents becomes NULL if
      column name in Falcon table uses accented letters.
            
      In StorageInterface::encodeRecord the value for fields with
      international character(s) in their names are coded as
      NULL (encodeNULL). This is because there is no entry for these
      fields in the fieldMap. The values for these fields are therefore
      never written to the database.
            
      When populating the fieldMap ( in StorageInterface::mapFields),
      the fieldId of every field is looked up by calling
      StorageTableShare::getFieldId, which in turn calls
      Table::findField.
            
      Table::findField, is not able to match the field name looked up
      in the SymbolManager (SymbolManager::getSymbol), with the field
      name in the Table.
            
      This is because SymbolManager::getSymbol tries to uppercase
      every Symbol it looks up, but the uppercasing does not handle
      international characters. The name it tries to look up, and
      eventually returns, is the partial name of the field, up to the
      first international character.
            
      Because we "can't find" the correct field, a NULL entry is added
      to the fieldMap for this field.
            
      According to Jim Starkey, the fix for this would be to replace
      the call to SymbolManager::getSymbol, with a call to
      SymbolManager::getString, which does not uppercase the input.
            
      This seems to work, and I have added a test-case that tests this
      for different combinations of the SQL modes (also suggested by
      Jim)
            
      There are some twenty different SQL modes, which would result in
      millions of different combinations, so I have picked the modes
      that to the best of my knowledge, are the most interesting ones.
[8 Sep 2008 11:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53519

2812 lars-erik.bjork@sun.com	2008-09-08
      This is a fix for bug#38304 - Data contents becomes NULL if
      column name in Falcon table uses accented letters.
            
      In StorageInterface::encodeRecord the value for fields with
      international character(s) in their names are coded as
      NULL (encodeNULL). This is because there is no entry for these
      fields in the fieldMap. The values for these fields are therefore
      never written to the database.
            
      When populating the fieldMap ( in StorageInterface::mapFields),
      the fieldId of every field is looked up by calling
      StorageTableShare::getFieldId, which in turn calls
      Table::findField.
            
      Table::findField, is not able to match the field name looked up
      in the SymbolManager (SymbolManager::getSymbol), with the field
      name in the Table.
            
      This is because SymbolManager::getSymbol tries to uppercase
      every Symbol it looks up, but the uppercasing does not handle
      international characters. The name it tries to look up, and
      eventually returns, is the partial name of the field, up to the
      first international character.
            
      Because we "can't find" the correct field, a NULL entry is added
      to the fieldMap for this field.
            
      According to Jim Starkey, the fix for this would be to replace
      the call to SymbolManager::getSymbol, with a call to
      SymbolManager::getString, which does not uppercase the input.
            
      This seems to work, and I have added a test-case that tests this
      for different combinations of the SQL modes (also suggested by
      Jim)
            
      There are some twenty different SQL modes, which would result in
      millions of different combinations, so I have picked the modes
      that to the best of my knowledge, are the most interesting ones.
[21 Oct 2008 8:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56657

2872 lars-erik.bjork@sun.com	2008-10-21
      This is a fix for bug#38304 (YEAR '=' comparison fails when index is 
      present)
      
      According to the reference documentation, YEARSs can be declared as
      YEAR(2) or YEAR(4) to specify a display width of two or four 
      characters. It turns out that YEARs are stored differently in Falcon 
      depending on this. In the case of a YEAR(4), when searching
      an index, the search keys given where off by -1900 years, and the 
      search did therefore not find the expected result. 
      This patch ensures that both types of YEARs are stored equally 
      inside Falcon, in a format that matches the search keys given by 
      the server. The change is limited to StorageInterface::encodeRecord
      and StorageInterface::decodeRecord.
      
      A regression test is also added as a part of this commit.
[21 Oct 2008 8:59] Lars-Erik Bjørk
Ooops, wrong bug# on previous commit, please disregard it
[9 Jan 2009 14:02] MC Brown
A note has been added to the 6.0.8 changelog: 

Inserting data into columns within a Falcon table that contains columns with names containing accented characters would cause the data to be null (empty)