Bug #42081 Bug in table collation and character set conversion.
Submitted: 13 Jan 2009 13:06 Modified: 16 Jan 2009 6:17
Reporter: Jitendra Rathod Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0.27 OS:Linux
Assigned to: CPU Architecture:Any

[13 Jan 2009 13:06] Jitendra Rathod
Description:
Hello,

When I will convert my table field collation and field type from latin1 to utf8 then data from field which comes after some special character are removed from new changed field which character set is utf8.

Like my field name is `jobtitle`.
And it contains the value 'Services français – MON01547'.
When I execute below sql for changing field collation and character set.

ALTER TABLE `gjcrawling_job44` CHANGE `jobtitle` `jobtitle` varchar(255) CHARACTER SET binary  NOT NULL;
ALTER TABLE `gjcrawling_job44` CHANGE _utf8`jobtitle` `jobtitle` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

Then my that filed value change(removed the data) to 'Services fran';

Means my whole data are not coming in new converted fields.

Waiting your positive reply ASAP.

Thanks,
Jitendra

Thanks,
Jitendra

How to repeat:
Please read above
[13 Jan 2009 14:39] Sveta Smirnova
Thank you for the report.

Query "ALTER TABLE `gjcrawling_job44` CHANGE _utf8`jobtitle` `jobtitle` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;" fails with error. Please provide correct query you use.

Please also indicate accurate version of MySQL server you use.

Also please be sure you use correct client encoding after converting column. For example:

ALTER TABLE `gjcrawling_job44` CHANGE `jobtitle` `jobtitle` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
set names latin1;
select * from gjcrawling_job44;
jobtitle
Services fran###s # MON01547
set names utf8;
select * from gjcrawling_job44;
jobtitle
Services français – MON01547
[14 Jan 2009 9:33] Jitendra Rathod
Hello,
Thank for your quick reply.

Sorry for the wrong query.
This is the correct query which I used.

ALTER TABLE `gjcrawling_job44` CHANGE `jobtitle` `jobtitle` varchar(255) CHARACTER
SET utf8 COLLATE utf8_general_ci NOT NULL;

My server version is 5.0.27.

Waiting you reply.

Thanks,
Jitendra

Thanks,
Jitendra
[14 Jan 2009 9:42] Sveta Smirnova
Thank you for the feedback.

But version 5.0.27 is old and many bugs were fixed since. Please upgrade to current version 5.0.67 (or 5.0.75) and try with it. Also please be sure you issue SET NAMES utf8 before running SELECT after last conversion.
[14 Jan 2009 10:04] Jitendra Rathod
Thanks for your quick reply.

But right now I am not able to change my server version.

Also I am every time execute set names utf8; before my each statement.
Still I am not able to get the my correct whole data after converting from latin1 character set to utf8.

Thanks,
Jitendra
[14 Jan 2009 10:20] Sveta Smirnova
Thank you for the feedback.

I will set status of the report to "Need Feedback": please test with newer version when you can as I can not repeat the problem with test data. If you still can repeat with newer version please provide output of SHOW CREATE TABLE gjcrawling_job44`. Also ALTER TABLE `gjcrawling_job44` CHANGE `jobtitle` `jobtitle` varchar(255) CHARACTER SET binary  NOT NULL; is not needed, although should make no harm.
[15 Jan 2009 7:58] Jitendra Rathod
Below is the my 'SHOW CREATE TABLE gjcrawling_job44' output.

CREATE TABLE `gjcrawling_job44` (\n  `jobid` bigint(64) NOT NULL auto_increment,\n  `siteid` bigint(64) NOT NULL default '0',\n  `rssurlid` bigint(64) NOT NULL default '0',\n  `spiderbotresulturlid` bigint(64) NOT NULL default '0',\n  `md5rss` varchar(100) NOT NULL default '',\n  `referencejobid` varchar(255) character set utf8 NOT NULL default '0',\n  `refmd5` varchar(32) NOT NULL default '',\n  `titleid` bigint(64) NOT NULL default '0',\n  `titlemergeid` bigint(64) NOT NULL default '0',\n  `titledesc` varchar(255) NOT NULL,\n  `titlemergedesc` varchar(255) NOT NULL default '',\n  `jobtitle` varchar(255) NOT NULL,\n  `categoryid` bigint(64) NOT NULL default '0',\n  `categorymergeid` bigint(64) NOT NULL default '0',\n  `categorydesc` varchar(100) NOT NULL default '',\n  `categorymergedesc` varchar(100) NOT NULL default '',\n  `firmid` bigint(64) NOT NULL default '0',\n  `firmmergeid` bigint(64) NOT NULL default '0',\n  `firmname` varchar(100) NOT NULL default '',\n  `firmmergename` varchar(255) NOT NULL default '',\n  `educationid` bigint(64) NOT NULL default '0',\n  `educationmergeid` bigint(64) NOT NULL default '0',\n  `educationdesc` varchar(255) NOT NULL default '',\n  `educationmergedesc` varchar(255) NOT NULL default '',\n  `jobtypeid` bigint(64) NOT NULL default '0',\n  `jobtypemergeid` bigint(64) NOT NULL default '0',\n  `jobtype` varchar(100) NOT NULL default '',\n  `jobtypemergedesc` varchar(100) NOT NULL default '',\n  `jobdescription` text NOT NULL,\n  `joburl` varchar(255) NOT NULL default '',\n  `regionid` bigint(64) NOT NULL default '0',\n  `city` varchar(50) NOT NULL default '',\n  `state` varchar(50) NOT NULL default '',\n  `country` varchar(50) NOT NULL default '',\n  `cityid` bigint(64) NOT NULL default '0',\n  `stateid` bigint(64) NOT NULL default '0',\n  `countryid` bigint(64) NOT NULL default '0',\n  `latitude` bigint(64) NOT NULL default '0',\n  `longitude` bigint(64) NOT NULL default '0',\n  `postalcode` varchar(25) NOT NULL default '',\n  `salary` varchar(30) NOT NULL default '',\n  `experience` varchar(25) NOT NULL default '',\n  `posteddate` varchar(100) NOT NULL default '',\n  `posted_dt` datetime NOT NULL default '0000-00-00 00:00:00' COMMENT 'job posted date converted to mysql datetime',\n  `contactperson` varchar(100) NOT NULL default '',\n  `contactemail` varchar(100) NOT NULL default '',\n  `contactphone` varchar(100) NOT NULL default '',\n  `contactfax` varchar(100) NOT NULL default '',\n  `contacttype` varchar(100) NOT NULL default '',\n  `applyonlineurl` varchar(255) NOT NULL default '',\n  `closedate` varchar(100) NOT NULL default '',\n  `close_dt` datetime NOT NULL default '0000-00-00 00:00:00',\n  `created_dt` datetime NOT NULL default '0000-00-00 00:00:00',\n  `created_dt_old` datetime NOT NULL default '0000-00-00 00:00:00',\n  `updated_dt` datetime NOT NULL default '0000-00-00 00:00:00',\n  `inactivated_dt` datetime NOT NULL default '0000-00-00 00:00:00',\n  `lastcheck_dt` datetime NOT NULL default '0000-00-00 00:00:00',\n  `exported_dt` datetime NOT NULL default '0000-00-00 00:00:00',\n  `upload_status` char(1) NOT NULL default '',\n  `staffingagency` char(1) NOT NULL default '',\n  `jobsitetype` char(1) NOT NULL default '',\n  `keywords` text NOT NULL COMMENT 'Keywords for searching jobs',\n  `location` varchar(255) NOT NULL default '' COMMENT 'location for searching jobs',\n  `md5keywordtitle` varchar(32) character set utf8 NOT NULL default '' COMMENT 'md5 of title keywords',\n  `omittedjob` tinyint(2) NOT NULL default '0' COMMENT 'identifier of omitted jobs',\n  `sc_industry_type_ids` mediumtext,\n  `sc_job_category_ids` mediumtext,\n  `duprssurlid` bigint(64) NOT NULL default '0',\n  `dupjobid` bigint(64) NOT NULL default '0',\n  `sc_omittedjob` tinyint(2) NOT NULL default '0',\n  `sc_duprssurlid` bigint(64) NOT NULL default '0',\n  `sc_dupjobid` bigint(64) NOT NULL default '0',\n  PRIMARY KEY  (`jobid`),\n  UNIQUE KEY `refmd5` (`refmd5`),\n  KEY `firmname` (`firmname`),\n  KEY `categorydesc` (`categorydesc`),\n  KEY `educationdesc` (`educationdesc`),\n  KEY `jobtype` (`jobtype`),\n  KEY `titleid` (`titleid`),\n  KEY `md5keywordtitle` (`md5keywordtitle`),\n  KEY `omittedjob` (`omittedjob`),\n  KEY `city` (`city`),\n  KEY `state` (`state`),\n  KEY `country` (`country`),\n  KEY `cityid` (`cityid`),\n  KEY `stateid` (`stateid`),\n  KEY `countryid` (`countryid`),\n  KEY `lastcheck_dt` (`lastcheck_dt`),\n  KEY `spiderbotresulturlid` (`spiderbotresulturlid`)\n) ENGINE=MyISAM AUTO_INCREMENT=17 DEFAULT CHARSET=latin1

Thanks,
Jitendra
[15 Jan 2009 11:04] Sveta Smirnova
Thank you for the feedback.

Actually problem is statement ALTER TABLE `gjcrawling_job44` CHANGE `jobtitle` `jobtitle` varchar(255) CHARACTER SET binary  NOT NULL;

This statement removes character set information from the column jobtitle and later statement which convert character set to UTF8 just rejects these characters which don't exists in UTF8 as invalid:

mysql> drop table if exists t1;
Query OK, 0 rows affected (0.12 sec)

mysql> create table t1(f1 varchar(255) CHARACTER SET latin1) CHARACTER SET latin1;
Query OK, 0 rows affected (0.03 sec)

mysql> insert into t1 values('ç');
Query OK, 1 row affected (0.00 sec)

mysql> select hex(f1) from t1;
+---------+
| hex(f1) |
+---------+
| E7      | 
+---------+
1 row in set (0.03 sec)

E7 is hex code of 'ç' in latin1 character set. This code does not change after converting to binary:

mysql> alter table t1 change f1 f1 varchar(255) CHARACTER SET binary not null;
Query OK, 1 row affected (0.14 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select hex(f1) from t1;
+---------+
| hex(f1) |
+---------+
| E7      | 
+---------+
1 row in set (0.00 sec)

But the same letter is C3A7 in UTF8. This is why you get corruption error.

So I close the report as "Not a Bug".
[16 Jan 2009 6:17] Jitendra Rathod
Hello,

Thanks for reply.

But I think this is bug..

Because in this type of cases how I have to change the collation and character set from latin1 to utf8.

I have total 178K records and near about 1272 records are currpted while I will converted them to utf8 from latin1.

If this is 'Not a Bug', then why I am not able to change their collation type correctly.

Please, Can you please run with the test case with below data.

insert into t1 values('abçdef');

I have tried with this data, and its give me the output 'ab' of 'select f1 from t1'.

Thanks,
Jitendra