Bug #78278 UTF-8 string w/ BOM and w/o BOM are not equal
Submitted: 30 Aug 2015 17:52 Modified: 30 Aug 2015 17:55
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:5.7.8 OS:Any
Assigned to: CPU Architecture:Any
Tags: bom, character set, charset, utf8

[30 Aug 2015 17:52] Daniël van Eeden
Description:
MySQL should
- remove utf-8 BOM before comparing strings
- OR have some option which does that

Note that BOM is almost always on the start of a string. But that's not required.

How to repeat:
mysql-5.7.8-rc> SHOW CREATE TABLE u1\G
*************************** 1. row ***************************
       Table: u1
Create Table: CREATE TABLE `u1` (
  `name` char(100) NOT NULL,
  PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

mysql-5.7.8-rc> INSERT INTO u1 VALUES (X'c3ab');
Query OK, 1 row affected (0.00 sec)

mysql-5.7.8-rc> INSERT INTO u1 VALUES (X'efbbbfc3ab');
Query OK, 1 row affected (0.01 sec)

mysql-5.7.8-rc> SELECT * FROM u1;
+-------+
| name  |
+-------+
| ë     |
| ë    |
+-------+
2 rows in set (0.00 sec)

In Python 3:

>>> b'\xef\xbb\xbf\xc3\xab'.decode('utf-8-sig') == b'\xc3\xab'.decode('utf-8-sig')
True
[30 Aug 2015 17:55] Daniël van Eeden
Related:
Bug #66537 	BOM problem
Bug #71563 	Handling of combining characters.
Bug #71564 	Combining characters in mysql monitor
Bug #14271638   CHARACTER SET DUPLICATE HANDLING SEEMS BROKEN