Bug #50909 | utf8_general_cs collation is not case sensitive | ||
---|---|---|---|
Submitted: | 4 Feb 2010 12:04 | Modified: | 12 Feb 2016 6:24 |
Reporter: | Paul Tobias | Email Updates: | |
Status: | Won't fix | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 5.1.43 | OS: | Linux |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[4 Feb 2010 12:04]
Paul Tobias
[4 Feb 2010 12:56]
Susanne Ebrecht
Many thanks for writing a bug report. Verified as described.
[5 Feb 2010 5:11]
Alexander Barkov
Hi Paul, you're right that LIKE operator does not work well in utf8_general_cs. However, as far as this is an experimental collation, we won't do anything with it at this point. Note, the '=' operator works case sensitively as expected. As a simple workaround you can use this query: select * from collationtest where text collate utf8_general_cs like binary 'test'; or even you don't need the collate clause: select * from collationtest where text like binary 'test'; Note, instead of utf8_general_cs, you can similary use the "standard" collation utf8_general_ci, with the "binary" keyword in LIKE. It will have the same effect. It's not recommended to use utf8_general_cs, because there's no guarantee it will stay in the code in the future.
[10 Feb 2010 16:55]
Paul Tobias
Thank you for your reply. My motivation with this bugreport was to get utf8_general_cs collation out of experimental status. Because without a case sensitive utf8 collation there is no way to be case sensitive _and_ use accents at the same time. Unfortunately utf8_bin collation or binary keyword workarounds does not work in this case. I'll try to show why. Let's assume the following table: INSERT INTO collationtest VALUES ('test'),('Test'),('TEST'),('tést'),('Tést'),('TÉST'); SELECT * FROM collationtest; +-------+ | text | +-------+ | test | | Test | | TEST | | tést | | Tést | | TÉST | +-------+ 6 rows in set (0.00 sec) SELECT * FROM collationtest WHERE text collate utf8_general_ci like 'test'; Actual resuts: +-------+ | test | | Test | | TEST | | tést | | Tést | | TÉST | +-------+ Expected results: Same as actual results. SELECT * FROM collationtest WHERE text collate utf8_general_cs like 'test'; Actual results: +------+ | test | | Test | | TEST | +------+ Expected results: +------+ | test | | tést | +------+ SELECT * FROM collationtest WHERE text collate utf8_general_ci like binary 'test'; Actual results: +-------+ | test | | Test | | TEST | | tést | | Tést | | TÉST | +-------+ Expected results: +-------+ | test | +-------+ SELECT * FROM collationtest WHERE text collate utf8_general_cs like binary 'test'; Actual results: +------+ | test | | Test | | TEST | +------+ Expected results: +------+ | test | +------+ SELECT * FROM collationtest WHERE text like binary 'test'; Actual results: +------+ | text | +------+ | test | +------+ Expected results: Same as actual results. SELECT * FROM collationtest WHERE text collate utf8_general_ci = 'test'; Actual results: +-------+ | test | | Test | | TEST | | tést | | Tést | | TÉST | +-------+ Expected results: Same as actual results. SELECT * FROM collationtest WHERE text collate utf8_general_cs = 'test'; Actual results: +------+ | test | +------+ Expected results: +-------+ | test | | tést | +-------+ Currently there is no way to get the following results: +-------+ | test | | tést | +-------+ In other words be case sensitive and use accents at the same time; All the tests were performed with mysql version 5.5.1-m2.
[7 Dec 2011 5:15]
Jehan Hysseo
Hi, my comments on all this: 1/ That would be indeed good if the case sensitive UTF-8 collation went out of experimental. 2/ I don't agree with Paul on the fact that accented should be considered like the non-accented character in a comparison. If I want "test", I don't want "tést" to be returned, that's different! After obviously we may have another collation which does this kind of comparison, but for the basic utf8_general_cs and utf8_general_ci, I would prefer it to differentiate 'e' and 'é' (as for any other accented character, and any character in general).
[12 Feb 2016 6:24]
Erlend Dahl
Posted by dev manager: [9 Feb 2016 2:18] Manyi Lu The experimental code for utf8_general_cs collation will be removed from the repository. We are cleaning up dead code. Proper case and accent sensitive collations will be provided once secondary and tertiary sort is implemented. We will not use the experimental code.