Bug #62620 Binary collations should not be PADSPACE
Submitted: 4 Oct 2011 18:36 Modified: 5 Oct 2011 7:03
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:any OS:Any
Assigned to: CPU Architecture:Any

[4 Oct 2011 18:36] Peter Laursen
Description:
With reference to http://bugs.mysql.com/bug.php?id=62616

and http://dev.mysql.com/doc/refman/5.1/en/char.html

"All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces."

How to repeat:
The current behavior is easily reproduced

DROP TABLES IF EXISTS `names`;
CREATE TABLE `names` (myname CHAR(10) COLLATE utf8_bin);
INSERT INTO NAMES VALUES ('Monty');
SELECT myname = 'Monty  ' FROM NAMES; -- 1
SELECT myname = 'Monty' FROM NAMES; -- 1

Suggested fix:
I find it completely wrong that binary collations ignore trailing blanks.  Binary data may be populated by random generators, industrial sampling hardware, encryption algorithms, random sampling from communication streams - whatever.  The trailing space may occur completely randomly and without any human knowledge or interference (and even programming).

Please (just this time!) try to consider the reasonability of this even though the current behavior is documented.  It is in my opinion not *not a bug* but rather a *documented bug*. *BINARY* means *BITWISE*.
[4 Oct 2011 19:00] Peter Laursen
Even if data are stored with a non-binary collation it would be nice to be able to use a "COLLATE *_bin" in a query to identify strings that only differ in trailing blanks.

DROP TABLES IF EXISTS `names`;
CREATE TABLE `names` (myname CHAR(10) COLLATE utf8_unicode_ci);
INSERT INTO `names` VALUES ('Monty');
SELECT myname = 'Monty ' COLLATE utf8_bin FROM NAMES; -- returns 1 but should rather return 0 IMHO.

Also see: http://bugs.mysql.com/bug.php?id=62621
[4 Oct 2011 19:25] Peter Laursen
I realize that BINARY/VARBINARY types would be a better option for sampled industrial data and similar.

But an obfuscation/encryption/hash algorithm and similar may very well generate a string with a trailing space where a charset still makes sense (though most such will stick to ASCII) and BINARY/VARBINARY/BLOB types are not an option
[4 Oct 2011 19:28] Peter Laursen
There is also an omission as regards TEXT in the docs:

"All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces."

>>

"All MySQL collations are of type PADSPACE. This means that all CHAR, VARCHAR *and TEXT* values in MySQL are compared without regard to any trailing spaces."
[5 Oct 2011 4:02] Valeriy Kravchuk
Manual described intentions by design (that software just implements), so "documented bug" is a feature, and you ask for different, that is, new feature.

So, I see feature request here (sounds reasonable) and small documentation request (about TEXT). What do you want to make out of this specific report?
[5 Oct 2011 6:54] Peter Laursen
I accept changing to 'feature request'.

There are also workarounds.  Use HEX() for instance:
.. WHERE HEX(column) = HEX('matchstring');
[5 Oct 2011 7:03] Peter Laursen
As per discussion in http://bugs.mysql.com/bug.php?id=62621 using LIKE operator is also a workaround.
[5 Apr 2012 15:28] Timur Rubeko
As of today the status of this issue is "Closed" and severity is "S4 (Feature request)". Does it mean it was implemented? :) It doesn't look like so according to the 5.6 docs.

So, is this request actually planned to be implemented in some future versions? What is the planned target version for it?