Bug #31740 ñ (n-tilde) not imported correctly
Submitted: 22 Oct 2007 3:47 Modified: 23 Oct 2007 6:43
Reporter: Aren Cambre Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0.38-Ubuntu_0ubuntu1.1-log OS:Any
Assigned to: CPU Architecture:Any

[22 Oct 2007 3:47] Aren Cambre
Description:
MySql 5.0 is unable to propery import a table when the key has two words that are identical except one has the letter "n" and the other has an encoded version of the letter "ñ" (n-tilde).

How to repeat:
Run this SQL:

CREATE TABLE `x` (
  `word` varchar(50) NOT NULL default '',
  `count` float default NULL,
  PRIMARY KEY  (`word`)
) TYPE=MyISAM;

INSERT INTO `x` VALUES ('pinata', 0.30103);
INSERT INTO `x` VALUES ('piñata', 0.177487);

Instead of keeping the literal text piñata, MySql converts piñata to piñata before attempting the import. This causes a key collision in the above SQL code.

What you are seeing above is part of a PhpMyAdmin-generated export for the table. The original source DB has piñata and not piñata in the key field. This happens even if I specify a CHARACTER SET:
CREATE TABLE `lpenasearch_total` (
  `word` varchar(50) NOT NULL default '',
  `count` float default NULL,
  PRIMARY KEY  (`word`)
) TYPE=MyISAM CHARACTER SET utf8;

And additionally, it still happens even if I change the entire database's collation, even to the spanish collations that purportedly differentiate n and ñ.

Suggested fix:
Don't convert piñata to piñata.
[22 Oct 2007 3:48] Aren Cambre
This is actually a serious bug that will impact international users.
[22 Oct 2007 6:31] Sveta Smirnova
Thank you for the report.

Please indicate accurate version of MySQL you use. Also please provide output of SHOW VARIABLES LIKE '%char%' and SHOW VARIABLES LIKE 'collation%'
[22 Oct 2007 13:48] Aren Cambre
Version updated. Note that the output below is taken from phpMyAdmin 2.9.1.1-Debian-2ubuntu1:

SHOW VARIABLES LIKE '%char%':
Variable_name 	Value
character_set_client 	utf8
character_set_connection 	utf8
character_set_database 	latin1
character_set_filesystem 	binary
character_set_results 	utf8
character_set_server 	latin1
character_set_system 	utf8
character_sets_dir 	/usr/share/mysql/charsets/

SHOW VARIABLES LIKE 'collation%':
Variable_name 	Value
collation_connection 	utf8_unicode_ci
collation_database 	latin1_swedish_ci
collation_server 	latin1_swedish_ci

---------------
Here's the same queries run against the database in concern:
SHOW VARIABLES LIKE '%char%':
Variable_name 	Value
character_set_client 	utf8
character_set_connection 	utf8
character_set_database 	utf8
character_set_filesystem 	binary
character_set_results 	utf8
character_set_server 	latin1
character_set_system 	utf8
character_sets_dir 	/usr/share/mysql/charsets/

SHOW VARIABLES LIKE 'collation%':
Variable_name 	Value
collation_connection 	utf8_unicode_ci
collation_database 	utf8_spanish_ci
collation_server 	latin1_swedish_ci
[23 Oct 2007 6:43] Sveta Smirnova
Thank you for the feedback.

I can not repeat described behaviour:

CREATE TABLE `x` (
`word` varchar(50) NOT NULL default '',
`count` float default NULL,
PRIMARY KEY  (`word`)
) engine=MyISAM CHARACTER SET utf8;
INSERT INTO `x` VALUES ('pinata', 0.30103);
INSERT INTO `x` VALUES ('piñata', 0.177487);
select * from x;
word    count
pinata  0.30103
piñata 0.177487

So I'll close the case as "Can't repeat"

Looks like you get these results from PhpMyAdmin. Additionally version 5.0.38 is  old. Please upgrade to current version 5.0.45, run same queries using command line MySQL client and if you can repeat the problem feel free to reopen the report.