Bug #26058 Falcon: warnings with yen sign and overline in ujis
Submitted: 4 Feb 2007 1:08 Modified: 21 Dec 2007 10:29
Reporter: Peter Gulutzan Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:5.2.2-falcon-alpha-debug-log OS:Linux (SUSE 10.0 / 64-bit)
Assigned to: Kevin Lewis CPU Architecture:Any

[4 Feb 2007 1:08] Peter Gulutzan
Description:
I create a Falcon table with one unindexed ujis (Japanese character set) column.
I insert a yen sign U+00A5 and an overline U+203E.
This is junk, but I hope for the same results with MyISAM and Falcon.
I select from the table using a WHERE clause, any WHERE clause will do.
I see warnings.
If I say engine=myisam instead of engine=falcon, I don't see warnings.

How to repeat:
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> create table tujis (s1 varchar(5) character set ujis) engine=falcon;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into tujis values ('¥'),('‾');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select count(*) from tujis where s1 is not null;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set, 2 warnings (0.00 sec)

mysql> show warnings;
+---------+------+----------------------------------------------------------+
| Level   | Code | Message                                                  |
+---------+------+----------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\x8E\' for column 's1' at row 0 |
| Warning | 1366 | Incorrect string value: '\x8E~' for column 's1' at row 1 |
+---------+------+----------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> drop table tujis;
Query OK, 0 rows affected (0.01 sec)

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> create table tujis (s1 varchar(5) character set utf8) engine=myisam;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into tujis values ('¥'),('‾');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select count(*) from tujis where s1 is not null;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> show warnings;
Empty set (0.00 sec)
[5 Feb 2007 16:29] MySQL Verification Team
Thank you for the bug report. Verified as described.
[3 May 2007 19:17] Hakan Küçükyılmaz
Added test case falcon_bug_26058.test.
[10 May 2007 20:11] Kevin Lewis
I am not sure that this is a valid character in the ujis collation.  Please verify.

When I run the test against a MyISQM file, no errors occur.  The test case references the characters in ASCII format like this;
# '‾' == 0xE2 0x80 0xBE == U+00E2 U+20AC U+00BE
# 'Â¥'  == 0xC2 0xA5 == U+00E2 U+20AC

The MySQL results are like this;
SET NAMES utf8;
CREATE TABLE tm (a varchar(5) character set ujis) engine=myisam;
INSERT INTO tm VALUES ('Â¥');
INSERT INTO tm VALUES ('‾');
SELECT hex(a) FROM tm WHERE a IS NOT NULL;
hex(a)
8E5C
8E7E
SELECT hex(a) FROM tm;
hex(a)
8E5C
8E7E

Everything seems to work without error, but these bytes sequences are never validated in the function my_well_formed_len_ujis() in ctype-uhis.c.  When this same SQL is run against a Falcon table, the code path calls my_well_formed_len_ujis() via well_formed_copy_nchars() in sql_string.cc, via Field_varstring::store() in field.cc.

The warning originates in the following code in my_well_formed_len_ujis(), line 8276;
    if (ch == 0x8E)                 /* [x8E][xA0-xDF] */
    {
      if (*b >= 0xA0 && *b <= 0xDF)
        continue;
      *error= 1;
      return (uint) (chbeg - beg);  /* invalid sequence */
    }

According to this code, 0x8E5c and 0x8E7E are invalid ujis characters.

I do not understand how '¥' and '‾' got converted to 0x8E5c and 0x8E7E, but if that is correct, then this seems to be an invalid test case.
[14 May 2007 15:18] Peter Gulutzan
First, I must apologize for a small error in the "how to repeat" section.
I should have used 'character set ujis' in both examples.
But the error does not affect the results or the bug description.
This is the corrected "how to repeat":

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> create table tujis (s1 varchar(5) character set ujis) engine=falcon;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into tujis values ('¥'),('‾');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select count(*) from tujis where s1 is not null;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set, 2 warnings (0.00 sec)

mysql> show warnings;
+---------+------+----------------------------------------------------------+
| Level   | Code | Message                                                  |
+---------+------+----------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\x8E\' for column 's1' at row 0 |
| Warning | 1366 | Incorrect string value: '\x8E~' for column 's1' at row 1 |
+---------+------+----------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> drop table tujis;
Query OK, 0 rows affected (0.01 sec)

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> create table tujis (s1 varchar(5) character set ujis) engine=myisam;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into tujis values ('¥'),('‾');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select count(*) from tujis where s1 is not null;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> show warnings;
Empty set (0.00 sec)

Second, I must emphasize what I said in the original: "this is junk".
It is not proper to do these insertions. The complaint is solely:
if I do these insertions anyway, I should get the same results for
Falcon and MyISAM. I do not.

Third, I acknowledge that this is a "Linux only" bug.
If I can't type in these UTF8 characters, I can't reproduce.
[14 May 2007 16:11] Kevin Lewis
Well, I must appologize for not being clear myself.  I think that the warning should occur in MyISAM!

I get the exact same warning running the testcase on Windows, even though I cannot reproduce it within MYSQL.EXE.  

So the real question is; Should  values ('¥'),('‾') get converted to 0x8E5c and 0x8E7E? These are the byte sequences that the MySQL engine is warning about when Falcon calls Field_varstring::store() which MyISAM does not call.
[14 Jun 2007 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[21 Jun 2007 20:40] Peter Gulutzan
I expect that the symptoms will disappear after the fix to
Bug#28600 Yen sign and overline ujis conversion change
Then it should be possible to close this bug with no
further work.
[19 Oct 2007 18:00] Ann Harrison
Miguel,
   Now that bug 28600 has been fixed, this bug should also go away.
Would you retest it please?

Thanks,

Ann
[20 Nov 2007 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[21 Dec 2007 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[21 Dec 2007 10:29] MySQL Verification Team
I wasn't able to repeat anymore with current source.