MySQL Bugs: #78278: UTF-8 string w/ BOM and w/o BOM are not equal

Bug #78278	UTF-8 string w/ BOM and w/o BOM are not equal
Submitted:	30 Aug 2015 17:52	Modified:	30 Aug 2015 17:55
Reporter:	Daniël van Eeden (OCA)	Email Updates:
Status:	Open	Impact on me:	None
Category:	MySQL Server: Charsets	Severity:	S4 (Feature request)
Version:	5.7.8	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	bom, character set, charset, utf8

Description:
MySQL should
- remove utf-8 BOM before comparing strings
- OR have some option which does that

Note that BOM is almost always on the start of a string. But that's not required.

How to repeat:
mysql-5.7.8-rc> SHOW CREATE TABLE u1\G
*************************** 1. row ***************************
       Table: u1
Create Table: CREATE TABLE `u1` (
  `name` char(100) NOT NULL,
  PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

mysql-5.7.8-rc> INSERT INTO u1 VALUES (X'c3ab');
Query OK, 1 row affected (0.00 sec)

mysql-5.7.8-rc> INSERT INTO u1 VALUES (X'efbbbfc3ab');
Query OK, 1 row affected (0.01 sec)

mysql-5.7.8-rc> SELECT * FROM u1;
+-------+
| name  |
+-------+
| ë     |
| ë    |
+-------+
2 rows in set (0.00 sec)

In Python 3:

>>> b'\xef\xbb\xbf\xc3\xab'.decode('utf-8-sig') == b'\xc3\xab'.decode('utf-8-sig')
True

Related:
Bug #66537 	BOM problem
Bug #71563 	Handling of combining characters.
Bug #71564 	Combining characters in mysql monitor
Bug #14271638   CHARACTER SET DUPLICATE HANDLING SEEMS BROKEN