Bug #84341 MySQL utf8.xml missing in share\charset
Submitted: 26 Dec 2016 19:45 Modified: 10 Feb 2017 14:52
Reporter: Rafael Diaz Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.6 OS:Windows (7)
Assigned to: CPU Architecture:Any
Tags: fulltext, utf8

[26 Dec 2016 19:45] Rafael Diaz
Description:
I'm following this tutorial in order to add a collation that will treat "-" (hyphen) as a word character for fulltext searches: http://dev.mysql.com/doc/refman/5.5/en/full-text-adding-collation.html

However I can't carry on from point 2, my table uses utf8 and utf8_general_ci as collation, however utf8.xml file is missing in C:\Program Files\MySQL\MySQL Server 5.6\share\charsets So is it not possible to add this collation on UTF8? or if it's possible then why this xml file is missing and how should it be generated?

How to repeat:
Install MySQL 5.6

Suggested fix:
Generate utf8.xml in order to be able to add a collation
[28 Dec 2016 8:59] MySQL Verification Team
Hello Rafael Diaz,

Thank you for the report.
Imho you can use a copy of latin1.xml as the basis for this file and create utf8.xml file in the sql/share/charsets directory. Adding collation for other character is also possible, just in the same way as latin1. But for utf8, because there are so many characters, it is not a good way to add it that way. 

Quoting from http://dev.mysql.com/doc/refman/5.7/en/adding-character-set.html 

This section discusses the procedure for adding a character set to MySQL. 
The proper procedure depends on whether the character set is simple or complex:              
 - If the character set does not need special string collating routines for sorting and does not need multibyte character support, it is simple.                    
 - If the character set needs either of those features, it is complex.  
 
Please note the word "simple", Utf8 is not simple, and doc also says if a character set is not simple, "For a complex character set, create a C source file that describes the character set properties and defines the support routines necessary to properly perform operations on the  character set:"
 So the suggested way to add a "utf8" collation is to add c/ c++ files instead of xml file.
 
After discussing with Dev's, I'm converting this issue to doc request so that the section described above is referenced in http://dev.mysql.com/doc/refman/5.6/en/full-text-adding-collation.html 

Thanks,
Umesh
[10 Feb 2017 14:52] Paul DuBois
Posted by developer:
 
Revision to http://dev.mysql.com/doc/refman/5.5/en/full-text-adding-collation.html

To add a collation for full-text indexing, use the following procedure. The
instructions here add a collation for a simple character set, which as
discussed in
http://dev.mysql.com/doc/refman/5.5/en/adding-character-set.html, can be
created using a configuration file that describes the character set
properties. For a complex character set such as Unicode, create collations
using C source files that describe the character set properties.