Bug #314 GROUP BY forces functions to treat multy-byte charset values as single-byte
Submitted: 22 Apr 2003 17:01 Modified: 26 Apr 2003 4:22
Reporter: Alexander Keremidarski Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.0 OS:
Assigned to: Alexander Barkov CPU Architecture:Any

[22 Apr 2003 17:01] Alexander Keremidarski
Description:
Looks like when GROUP BY is used for some reason values are Displayed wrong assuming they are single byte while they contain correct multibyte content.

How to repeat:
Start mysqld with --default-character-set=ujis

insert into x values ('あ'),('い'),('う'),('え'),('お'),('あいうえお'),('あかさたな');

/* Same as above
insert into x values ('\xa4\xa2'),('\xa4\xa4'),('\xa4\xa6'),('\xa4\xa8'),  ('\xa4\xaa'), ('\xa4\xa2\xa4\xa4\xa4\xa6\xa4\xa8\xa4\xaa'),
('\xa4\xa2\xa4\xab\xa4\xb5\xa4\xbf\xa4\xca');
*/
insert into x values ('a'),('b'),('c'),('d'),('abcd'),('aut4hgf');

/* Observe result of following queries. Might be unreadable in HTML */

mysql> select c, length(c) as lc1, left(c,1) as c1, length(left(c,1)) as lc1, ord(left(c,1)), char(ord(left(c,1))) oc1 from x order by c;
+------------+------+------+------+----------------+-----+
| c          | lc1  | c1   | lc1  | ord(left(c,1)) | oc1 |
+------------+------+------+------+----------------+-----+
| a          |    1 | a    |    1 |             97 | a   |
| abcd       |    4 | a    |    1 |             97 | a   |
| aut4hgf    |    7 | a    |    1 |             97 | a   |
| b          |    1 | b    |    1 |             98 | b   |
| c          |    1 | c    |    1 |             99 | c   |
| d          |    1 | d    |    1 |            100 | d   |
| あ         |    2 | あ   |    2 |          42146 | あ  |
| あいうえお |   10 | あ   |    2 |          42146 | あ  |
| あかさたな |   10 | あ   |    2 |          42146 | あ  |
| い         |    2 | い   |    2 |          42148 | い  |
| う         |    2 | う   |    2 |          42150 | う  |
| え         |    2 | え   |    2 |          42152 | え  |
| お         |    2 | お   |    2 |          42154 | お  |
+------------+------+------+------+----------------+-----+
13 rows in set (0.00 sec)

mysql> select c, length(c) as lc1, left(c,1) as c1, length(left(c,1)) as lc1, ord(left(c,1)), char(ord(left(c,1))) oc1 from x group by c order by c;
+------------+------+------+------+----------------+-----+
| c          | lc1  | c1   | lc1  | ord(left(c,1)) | oc1 |
+------------+------+------+------+----------------+-----+
| a          |    1 | a    |    1 |             97 | a   |
| abcd       |    4 | a    |    1 |             97 | a   |
| aut4hgf    |    7 | a    |    1 |             97 | a   |
| b          |    1 | b    |    1 |             98 | b   |
| c          |    1 | c    |    1 |             99 | c   |
| d          |    1 | d    |    1 |            100 | d   |
| あ         |    2 |    |    2 |          42146 |   |
| あいうえお |   10 |    |    2 |          42146 |   |
| あかさたな |   10 |    |    2 |          42146 |   |
| い         |    2 |    |    2 |          42148 |   |
| う         |    2 |    |    2 |          42150 |   |
| え         |    2 |    |    2 |          42152 |   |
| お         |    2 |    |    2 |          42154 |   |
+------------+------+------+------+----------------+-----+
13 rows in set (0.00 sec)

mysql> select c, length(c) as lc, left(c,2) as c2, length(left(c,2)) as lc2, ord(left(c,2)), char(ord(left(c,2))) oc2 from x group by c order by c;
+------------+------+------+------+----------------+-----+
| c          | lc   | c2   | lc2  | ord(left(c,2)) | oc2 |
+------------+------+------+------+----------------+-----+
| a          |    1 | a    |    1 |             97 | a   |
| abcd       |    4 | ab   |    2 |             97 | a   |
| aut4hgf    |    7 | au   |    2 |             97 | a   |
| b          |    1 | b    |    1 |             98 | b   |
| c          |    1 | c    |    1 |             99 | c   |
| d          |    1 | d    |    1 |            100 | d   |
| あ         |    2 | あ   |    2 |          42146 |   |
| あいうえお |   10 | あ   |    4 |          42146 |   |
| あかさたな |   10 | あ   |    4 |          42146 |   |
| い         |    2 | い   |    2 |          42148 |   |
| う         |    2 | う   |    2 |          42150 |   |
| え         |    2 | え   |    2 |          42152 |   |
| お         |    2 | お   |    2 |          42154 |   |
+------------+------+------+------+----------------+-----+
[25 Apr 2003 9:09] Michael Widenius
This is a display problem in the mysql command line client that is probably not likely to happen in a client that support ujis.

The test case is also not repeatable as it doesn't contain all information to repeat it.
[25 Apr 2003 9:24] Alexander Keremidarski
It is not display problem only. Here is variation of above test with using HEX() values:

mysql> select HEX(c), length(c) as lc1, HEX(left(c,1)) as c1 from x order by c;
+----------------------+------+------+
| HEX(c)               | lc1  | c1   |
+----------------------+------+------+
| 61                   |    1 | 61   |
| 61626364             |    4 | 61   |
| 61757434686766       |    7 | 61   |
| 62                   |    1 | 62   |
| 63                   |    1 | 63   |
| 64                   |    1 | 64   |
| A4A2                 |    2 | A4A2 |
| A4A2A4A4A4A6A4A8A4AA |   10 | A4A2 |
| A4A2A4ABA4B5A4BFA4CA |   10 | A4A2 |
| A4A4                 |    2 | A4A4 |
| A4A6                 |    2 | A4A6 |
| A4A8                 |    2 | A4A8 |
| A4AA                 |    2 | A4AA |
+----------------------+------+------+
13 rows in set (0.01 sec)
 
mysql> select HEX(c), length(c) as lc1, HEX(left(c,1)) as c1 from x group by c order by c;
+----------------------+------+------+
| HEX(c)               | lc1  | c1   |
+----------------------+------+------+
| 61                   |    1 | 61   |
| 61626364             |    4 | 61   |
| 61757434686766       |    7 | 61   |
| 62                   |    1 | 62   |
| 63                   |    1 | 63   |
| 64                   |    1 | 64   |
| A4A2                 |    2 | A4   |
| A4A2A4A4A4A6A4A8A4AA |   10 | A4   |
| A4A2A4ABA4B5A4BFA4CA |   10 | A4   |
| A4A4                 |    2 | A4   |
| A4A6                 |    2 | A4   |
| A4A8                 |    2 | A4   |
| A4AA                 |    2 | A4   |
+----------------------+------+------+
[26 Apr 2003 4:22] Michael Widenius
Thank you for your bug report. This issue has been fixed in the latest
development tree for that product. You can find more information about
accessing our development trees at 
    http://www.mysql.com/doc/en/Installing_source_tree.html

Fixed a bug concerning RIGHT, LEFT & MID when used with multi-byte-character-sets and GROUP BY