MySQL Bugs: #29583: derived column names from ucs2 data

Bug #29583	derived column names from ucs2 data
Submitted:	5 Jul 2007 19:24	Modified:	4 Nov 2008 21:28
Reporter:	Martin Friebe (Gold Quality Contributor) (OCA)	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	MySQL Server: Views	Severity:	S3 (Non-critical)
Version:	5.1.20 5.0.44	OS:	Any
Assigned to:	Evgeny Potemkin	CPU Architecture:	Any
Tags:	names, qc

Description:
"create view" and "create table" can derive column names from the value of the column;

create table t1 select 'A'; desc t1;
+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| A     | varchar(1) | NO   |     |         |       |
+-------+------------+------+-----+---------+-------+

create view v1 as select 'A'; desc v1;
+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| A     | varchar(1) | NO   |     |         |       |
+-------+------------+------+-----+---------+-------+

If the value is given in ucs2 ( _ucs2 0x0041 ), then the column ends up with an almost random name.

In fact the name seems to be created as follows:
- takee the ucs2 string as byte sequence
- remove leading 00 bytes (usually just the first byte)
- take the remining (shifted) string as valid ucs2, and convert to utf8
- the utf8 value is the derived column name

This deosnt appear very sensible? 

In how to repeat the column name is displayed (on the client) as "?" or "??". Trying to "select `??` from v1" does of course not work

How to repeat:
set names latin1;
set character_set_connection = latin1;

drop view if exists v1;  create view v1 as select _ucs2 0x004100420043;
select * from v1;
+-----+
| ??  |
+-----+
| ABC | # this is actually a ucs2 string, but it got converted to charset_client
+-----+

hexdump -C v1.frm

00000000  54 59 50 45 3d 56 49 45  57 0a 71 75 65 72 79 3d  |TYPE=VIEW.query=|
00000010  73 65 6c 65 63 74 20 5f  75 63 73 32 5c 27 5c 5c  |select _ucs2\'\\|
00000020  30 41 5c 5c 30 42 5c 5c  30 43 5c 27 20 41 53 20  |0A\\0B\\0C\' AS |
00000030  60 e4 84 80 e4 88 80 60  0a 6d 64 35 3d 32 63 39  |`......`.md5=2c9|
00000040  62 31 36 34 32 31 62 31  34 37 66 65 33 62 66 38  |b16421b147fe3bf8|
00000050  38 61 36 36 36 32 30 65  62 36 32 61 30 0a 75 70  |8a66620eb62a0.up|
00000060  64 61 74 61 62 6c 65 3d  30 0a 61 6c 67 6f 72 69  |datable=0.algori|
00000070  74 68 6d 3d 30 0a 64 65  66 69 6e 65 72 5f 75 73  |thm=0.definer_us|
00000080  65 72 3d 72 6f 6f 74 0a  64 65 66 69 6e 65 72 5f  |er=root.definer_|
00000090  68 6f 73 74 3d 6c 6f 63  61 6c 68 6f 73 74 0a 73  |host=localhost.s|
000000a0  75 69 64 3d 32 0a 77 69  74 68 5f 63 68 65 63 6b  |uid=2.with_check|
000000b0  5f 6f 70 74 69 6f 6e 3d  30 0a 72 65 76 69 73 69  |_option=0.revisi|
000000c0  6f 6e 3d 31 0a 74 69 6d  65 73 74 61 6d 70 3d 32  |on=1.timestamp=2|
000000d0  30 30 37 2d 30 37 2d 30  35 20 32 30 3a 31 35 3a  |007-07-05 20:15:|
000000e0  30 34 0a 63 72 65 61 74  65 2d 76 65 72 73 69 6f  |04.create-versio|
000000f0  6e 3d 31 0a 73 6f 75 72  63 65 3d 73 65 6c 65 63  |n=1.source=selec|
00000100  74 20 5f 75 63 73 32 20  30 78 30 30 34 31 30 30  |t _ucs2 0x004100|
00000110  34 32 30 30 34 33 0a                              |420043.|

At 00000031: e4 84 80 e4 88 80
select hex(convert( _utf8 0xe48480e48880 using ucs2));
+------------------------------------------------+
| hex(convert( _utf8 0xe48480e48880 using ucs2)) |
+------------------------------------------------+
| 41004200                                       |
+------------------------------------------------+
# the original ucs2 string, but the 1st 00 byte is missing

drop table if exists t1;
create table t1 select _ucs2 0x00410042;
select * from t1;
show create table t1;

hexdump -C t1.frm # only relevant line
00002150  04 00 04 e4 84 80 00 04  04 04 04 00 01 00 00 00  |................|

contains  e4 84 80 as name:
select hex(convert( _utf8 0xe48480 using ucs2));
+------------------------------------------+
| hex(convert( _utf8 0xe48480 using ucs2)) |
+------------------------------------------+
| 4100                                     |
# agian, one leading zero missing, and then cut off at the end

Suggested fix:
if converting ucs2 int utf8 for a column name, do not drop the leading 00

Thank you for the bug report.

Workaround: Use 'AS' in the select clause