Bug #18749 Normalize Decomposed Characters in FULLTEXT Indexes
Submitted: 3 Apr 2006 16:26 Modified: 18 Jan 13:06
Reporter: Chris Calender Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:4.1, 5.0, 5.1 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Triage: Triaged: D5 (Feature request)

[3 Apr 2006 16:26] Chris Calender
Description:
utf8 diacritics can be stored in two different forms: 
- composed: Ö (one UTF8 character)
- decomposed: O" (two UTF8 characters)

If you have 'decomposed' form (2-char) for some values and 'composed' for others, then you will have a mixture of composed and decomposed characters.

Them, in searches, you cannot find the 'decomposed' UTF8 characters.

This is because decomposed characters are not normalized when put to full text index.  The temporary work-around is to put normalized characters into table and/or provide decomposed characters in the query.

The customer has erquested a feature request that will automatically normalize decomposed characters when they're put in a FULLTEXT index.

How to repeat:
See above description.
[24 Apr 2006 9:45] Valeriy Kravchuk
Thank you for a reasonable feature request. I hope, it will be implemented some day.
[17 May 2006 9:12] Sergei Golubchik
This will hardly be implemented in the FULLTEXT index.
It compares strings according to collation rules.
The correct solution is to fix unicode collation to compare Ö and O" as equal.
[18 Jan 13:06] Erlend Dahl
[28 Dec 2017 0:16] Xing Z Zhang 

From 5.6, with UCA collations, normalized character compares equal with decomposed characters.