Bug #19567 | Case insensitive multilingual unicode collations | ||
---|---|---|---|
Submitted: | 5 May 2006 17:19 | Modified: | 27 Feb 2009 7:48 |
Reporter: | Lutz Schwarz | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S4 (Feature request) |
Version: | 5.0.22, 4.1.15 | OS: | Windows (windows) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[5 May 2006 17:19]
Lutz Schwarz
[5 May 2006 17:48]
Lutz Schwarz
files for ucs2_bin_but_ci solution
Attachment: m_ctype.h (application/octet-stream, text), 18.37 KiB.
[5 May 2006 17:48]
Lutz Schwarz
files for ucs2_bin_but_ci solution
Attachment: charset-def.c (application/octet-stream, text), 6.97 KiB.
[5 May 2006 17:48]
Lutz Schwarz
files for ucs2_bin_but_ci solution
Attachment: ctype-ucs2.c (application/octet-stream, text), 41.56 KiB.
[5 May 2006 17:49]
Lutz Schwarz
files for ucs2_bin_but_ci solution
Attachment: charset-def.c (application/octet-stream, text), 6.97 KiB.
[5 May 2006 17:49]
Lutz Schwarz
files for ucs2_bin_but_ci solution
Attachment: ctype-ucs2.c (application/octet-stream, text), 41.50 KiB.
[20 Jun 2006 12:46]
Valeriy Kravchuk
Thank you for a problem report and patches. Verified just as described, also with 5.0.22 on Windows: mysql> select version(); +---------------------+ | version() | +---------------------+ | 5.0.22-community-nt | +---------------------+ 1 row in set (0.00 sec) mysql> CREATE TABLE `person` ( -> `name` varchar(255) collate latin1_general_ci NOT NULL default '' -> PRIMARY KEY (`name`) -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci; Query OK, 0 rows affected (0.05 sec) mysql> INSERT INTO `person` VALUES ('A'), ('ä'); Query OK, 2 rows affected (0.03 sec) Records: 2 Duplicates: 0 Warnings: 0 mysql> alter table person convert to CHARSET utf8 COLLATE utf8_general_ci; ERROR 1062 (23000): Duplicate entry 'ä' for key 1
[21 Jun 2006 12:08]
Valeriy Kravchuk
As you just want to add another, new collation, it is a feature request.
[22 Jun 2006 10:40]
Lutz Schwarz
A feature? Sorry, but I do not agree with that. The canonical fix is to correct behaviour of case insensitive multilingual unicode collations (namely utf8_general_ci and ucs2_general_ci) in a way, that they do what their names propose. My suggestion to introduce new collations that behave like utf8_general_ci/ucs2_general_ci should behave is a poor hack - not a feature. Since workaround is not to use unicode at all, I set severity to S3 as a compromise. In fact it is a serious bug because it prevents upgrading databases to unicode for use in international environments and markets.
[23 Jun 2006 9:09]
Valeriy Kravchuk
In fact, latin1_general_ci and utf8_general_ci are not similar at all. And this is know fact. So, utf8_general_ci is just inappropriate name for current collation as it is implemented. At least this is a bug. "Proper" implementation of "correct" utf8_general_ci is a reasonable feature request.
[28 Jun 2006 6:40]
Alexander Barkov
After discussing with the team, we agreed that: - this is not a bug. This is a feature request a new accent sensitive collation for utf8 and ucs2. - latin1_general_ci and utf8_general_ci behave differently, and this is confusing. We're sorry about that. - When we added latin1_general_ci, its name was chosen in a mistake. - we won't fix latin1_general_ci to behave accent insensitivly, like utf8_general_ci does. Changing order at this point would be confusing too.
[26 Feb 2009 20:15]
Marc Larue
I don't get what you guys are doing. This is what we need: A way to search for strings in both case in- and sensitive ways with complete UTF-8 characterset on innodb, preferably with full text index. Now how hard can that be? ;) But seriously, now I have to collate the text to the language just because you don't know how to order the characters. Who orders by text anyways, everyone orders by date!
[27 Feb 2009 7:48]
Valeriy Kravchuk
Still, this is a reasonable feature request at least, IMHO. Should be implemented in 6.x.
[3 Dec 2009 16:37]
Michael Baldwin
Any news on when this might be implemented? This thread was started two and a half years ago... I too need accent sensitivity in UTF-8...
[23 Jan 2010 21:47]
ete88 ete88
So, the real problem is that there are no accent sensitive unicode collations available for MySQL. There really should be. To stop the confusion, I suggest information about the accent sensitivity should be added to the name of the collation, e.g. utf8_german_ci_ai (case insensitive / accent insensitive).
[20 Jul 2010 23:19]
André Cruz
I too need an UTF8 collation that's case insensitive and accent sensitive.
[12 Jun 2012 10:47]
Hartmut Holzgraefe
Workaround: adding a case insensitive, accent sensitive collation to the charsets/Index.xml file: http://www.skysql.com/blogs/hartmut/adding-case-insensitive-distinct-unicode-collation
[26 Mar 2017 19:37]
monty solomon
New features will be forthcoming Sushi = Beer ?! An introduction of UTF8 support in MySQL 8.0 http://mysqlserverteam.com/sushi-beer-an-introduction-of-utf8-support-in-mysql-8-0/ New collations in MySQL 8.0.0 http://mysqlserverteam.com/new-collations-in-mysql-8-0-0/
[9 Sep 2022 21:17]
Theodore Phillips
Thanks for the new collations in MySQL 8. But I'm still not seeing any unicode collations like as_ci (accent-sensitive but case-insensitive), which is what this was all about. What version can these be found in?