Description:
This issue relates Bug #76553.
[MySQL Bugs: #76553: Sushi-Beer issue of MySQL with utf8mb4](https://bugs.mysql.com/bug.php?id=76553)
This explains that Bug #76553's another side problem for Japanese FullText Search.
* Japanese user use some types for expressing word.
* For example, the word means "mother", is expressed "はは" (Hiragana Case) and "ハハ" (Wide-size Katakana Case) and "ハハ" (Narrow-size Katakana Case).
* They have same sound and same mean. We would like to take these as same character, in most of all case.
* utf8mb4_bin, utf8mb4_general_ci are Hiragana-Katakana Case Sensitive, utf8mb4_unicode_ci is Hiragana-Katakana Case Insensitive.
* The other hand, we had Youon Case.
* For example, we write "hospital" as "びょういん", and we write "heir salon" as "びよういん".
* They have only one difference of character "ょ" and "よ", and these two characters are very similar form.
* But, they have different sound and different mean in most of all cases. We'd like to distinct them.
* utf8mb4_bin and uft8mb4_general_ci are Youon Case Sensitive, utf8mb4_unicode_ci is Youon Case Insensitive.
* There's more expression in Japanese, we have Dakuten and Handakuten.
* As explaining in Bug #76553, "ハハ" (Plain Case) means "mother", "ババ" (Dakuten Case) means "grand mother", "パパ" (Handakuten Case) means "daddy".
* These have to be distincted.
* utf8mb4_bin and utf8mb4_general_ci are Dakuten-Handakuten Case Sensitive, utf8mb4_unicode_ci is Dakuten-Handakuten Case Insensitive.
* Last one is Wide Case and Narrow Case.
* For example, "MySQL" (Narrow Case) and "MySQL" (Wide Case), of course they have same sound and same mean.
* So we'd like to take them same character.
* utf8mb4_bin and utf8mb4_general_ci are Wide-Narrow Case Sensitive, utf8mb4_unicode_ci is Wide-Narrow Case Insensitive.
* Hiragana-Katakana cs and Wide-Narrow cs are incombinient but we can be patient.
* But Youon ci and Dakuten-Handakuten ci are critical issue on FTS query(simple query using LIKE operator too)
* Is anyone who expects search results of "hair salon"(びよういん) when they type "hospital"(びょういん) ?
* And these ci includes potential of breaking UNIQUE KEY constraint.
* When `UNIQUE KEY(relation_from_me)`, I have a mother and INSERT INTO "mother"(ハハ), and I have a father and INSERT INTO "daddy"(パパ), second query will fail because of duplicate key error.
Discussion in [Planning the defaults for MySQL 5.8 | MySQL Server Blog](http://mysqlserverteam.com/planning-the-defaults-for-mysql-5-8/) says "server default collation will be utf8mb4_unicode_520_ci".
But it will make Japanese(and other countries?) users to be confusing.
Honestly, we really need utf8mb4_japanese_ci.
But at least, utf8mb4_unicode*_ci are critical for Japanese.
| | utf8mb4_bin | utf8mb4_general_ci | utf8mb4_unicode_ci | utf8mb4_unicode_520_ci|
|--------------------|-------------|--------------------|--------------------|-----------------------|
| Hiragana-Katakana | cs (unkind) | cs (unkind) | ci (good) | ci(good) |
| Youon | cs (good) | cs (good) | ci (critical) | ci(critical) |
| Dakuten-Handakuten | cs (good) | cs (good) | ci (critical) | ci(critical) |
| Wide-Narrow | cs (unkind) | cs (unkind) | ci (good) | ci(good) |
| Sushi-Beer | cs | ci | ci | cs |
How to repeat:
See description.
Suggested fix:
If you change default value, that should not be utf8mb4_unicode*_ci, would be utf8mb4_general_ci or utf8mb4_bin.