Description:
It is possible to insert invalid characters data in utf8 8and more probably) columns.
Server does not seem to validate at all when x'..' format is used. Not even a warning is returned.
How to repeat:
set names utf8;
CREATE TABLE `t` (
`pk` int(11) NOT NULL auto_increment,
`d` varbinary(20) default NULL,
PRIMARY KEY (`pk`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
select hex('æøå'); -- returns 'C3A6C3B8C3A5'
insert into t (d) values(x'C3A6C3B8C3A5'); -- this succeeds of course and inserts 'æøå'
insert into t (d) values(x'83A6C3B8C3A5'); -- also succeeds
-- but note that neither '83', 'A6' nor '83A6' are a valid utf8 characters
show warnings; -- empty set
select d from t;
/*returns
(how it prints will depend on the client of course - this is SQLyog)
d
------
(NULL)
æøå
��øå
*/
Suggested fix:
1) I do not think it should be posible to insert in an encoded column an invalid byte-pattern for the charset.
But I think I have asked something related before and was turned down!
Argument was : sometimes people would want to insert biary strings.
My Counter-argument: That is what (VAR)BINARY type and binary charset is for! Data in last statement are invalid in utf8! They should not be allowed there!
Also it is totally arbitrary that there is 2 invalid characters .. it might as well be one! But this is a client problem of course - and am am not sure how this involves libmysql and 3rd party code though!
2) At least a warning should be raised! (but that is a dirty hack only IMHO!)
Also see attached images from SQLyog and Query Browser.
(Command line client on windows is not appropriate for testing this)
Query Browser goes completely nuts one it fails to identify 1st character!
It displays everything as latin1/ANSI, I think! But I won't reaally blame it.
What can the poor client do when data are invalid for both character set for both connection and table ...