| Bug #78278 | UTF-8 string w/ BOM and w/o BOM are not equal | ||
|---|---|---|---|
| Submitted: | 30 Aug 2015 17:52 | Modified: | 30 Aug 2015 17:55 |
| Reporter: | Daniël van Eeden (OCA) | Email Updates: | |
| Status: | Open | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S4 (Feature request) |
| Version: | 5.7.8 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | bom, character set, charset, utf8 | ||
[30 Aug 2015 17:55]
Daniël van Eeden
Related: Bug #66537 BOM problem Bug #71563 Handling of combining characters. Bug #71564 Combining characters in mysql monitor Bug #14271638 CHARACTER SET DUPLICATE HANDLING SEEMS BROKEN

Description: MySQL should - remove utf-8 BOM before comparing strings - OR have some option which does that Note that BOM is almost always on the start of a string. But that's not required. How to repeat: mysql-5.7.8-rc> SHOW CREATE TABLE u1\G *************************** 1. row *************************** Table: u1 Create Table: CREATE TABLE `u1` ( `name` char(100) NOT NULL, PRIMARY KEY (`name`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 1 row in set (0.00 sec) mysql-5.7.8-rc> INSERT INTO u1 VALUES (X'c3ab'); Query OK, 1 row affected (0.00 sec) mysql-5.7.8-rc> INSERT INTO u1 VALUES (X'efbbbfc3ab'); Query OK, 1 row affected (0.01 sec) mysql-5.7.8-rc> SELECT * FROM u1; +-------+ | name | +-------+ | ë | | ë | +-------+ 2 rows in set (0.00 sec) In Python 3: >>> b'\xef\xbb\xbf\xc3\xab'.decode('utf-8-sig') == b'\xc3\xab'.decode('utf-8-sig') True