Bug #14131 Report PAD attribute in SHOW COLLATION and INFORMATION_SCHEMA.COLLATIONS
Submitted: 19 Oct 2005 6:06 Modified: 9 Oct 2008 12:14
Reporter: Jan Lindström Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:5.0, 5.1, 6.0 OS:Linux (any)
Assigned to: Assigned Account CPU Architecture:Any
Triage: D5 (Feature request)

[19 Oct 2005 6:06] Jan Lindström
Description:
Manual says that before 5.0.3, trailing spaces were removed when VARCHAR  values were stored, which differs from the standard SQL specification. But based on test below trailing spaces are still removed when VARCHAR values with trailing spaces are used in colum prefix indexes:

jplindst@innobase06:~/mysql-5.0/client$ ./mysql test -u root
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2 to server version: 5.0.15-debug

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> drop table if exists t1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> drop table if exists t2;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> create table t1(a int not null, b varchar(10), c varchar(10), primary key (a,b(2)), unique key(a,c(2))) engine= innodb default charset=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> create table t2(a int not null, b varchar(10), c varchar(10), primary key (a,b(2)), unique key(a,c(2))) engine= innodb default charset=ucs2;
Query OK, 0 rows affected (0.29 sec)

mysql> insert into t1 values(1,'c','c');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values(1,'c ','c ');
ERROR 1062 (23000): Duplicate entry '1-c ' for key 1
mysql> insert into t1 values(2,'d','d');
Query OK, 1 row affected (0.01 sec)

mysql> insert into t1 values(2,'q','d ');
ERROR 1062 (23000): Duplicate entry '2-d ' for key 2
mysql> insert into t2 values(1,'c','c');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t2 values(1,'c ','c ');
ERROR 1062 (23000): Duplicate entry '1-' for key 1
mysql> insert into t2 values(2,'d','d');
Query OK, 1 row affected (0.02 sec)

mysql> insert into t2 values(2,'q','d ');
ERROR 1062 (23000): Duplicate entry '2-' for key 2

Similarly in MyIsam:

mysql> create table t1(a int not null, b varchar(10), c varchar(10), primary key (a,b(2)), unique key(a,c(2))) engine= myisam default charset=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> create table t2(a int not null, b varchar(10), c varchar(10), primary key (a,b(2)), unique key(a,c(2))) engine= myisam default charset=ucs2;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t1 values(1,'c','c');
Query OK, 1 row affected (0.01 sec)

mysql> insert into t1 values(1,'c ','c ');
ERROR 1062 (23000): Duplicate entry '1-c ' for key 1
mysql> insert into t1 values(2,'d','d');
Query OK, 1 row affected (0.01 sec)

mysql> insert into t1 values(2,'q','d ');
ERROR 1062 (23000): Duplicate entry '2-d ' for key 2
mysql> insert into t2 values(1,'c','c');
Query OK, 1 row affected (0.01 sec)

mysql> insert into t2 values(1,'c ','c ');
ERROR 1062 (23000): Duplicate entry '1-' for key 1
mysql> insert into t2 values(2,'d','d');
Query OK, 1 row affected (0.01 sec)

mysql> insert into t2 values(2,'q','d ');
ERROR 1062 (23000): Duplicate entry '2-' for key 2

Regards,
    Jan

How to repeat:
drop table if exists t1;
drop table if exists t2;
create table t1(a int not null, b varchar(10), c varchar(10), primary key (a,b(2)), unique key(a,c(2))) engine= innodb default charset=utf8;
create table t2(a int not null, b varchar(10), c varchar(10), primary key (a,b(2)), unique key(a,c(2))) engine= innodb default charset=ucs2;
insert into t1 values(1,'c','c');
insert into t1 values(1,'c ','c ');
insert into t1 values(2,'d','d');
insert into t1 values(2,'q','d ');
insert into t2 values(1,'c','c');
insert into t2 values(1,'c ','c ');
insert into t2 values(2,'d','d');
insert into t2 values(2,'q','d ');
drop table if exists t1;
drop table if exists t2;
create table t1(a int not null, b varchar(10), c varchar(10), primary key (a,b(2)), unique key(a,c(2))) engine= myisam default charset=utf8;
create table t2(a int not null, b varchar(10), c varchar(10), primary key (a,b(2)), unique key(a,c(2))) engine= myisam default charset=ucs2;
insert into t1 values(1,'c','c');
insert into t1 values(1,'c ','c ');
insert into t1 values(2,'d','d');
insert into t1 values(2,'q','d ');
insert into t2 values(1,'c','c');
insert into t2 values(1,'c ','c ');
insert into t2 values(2,'d','d');
insert into t2 values(2,'q','d ');

Suggested fix:
In file ha_innodb.cc at function innobase_mysql_cmp replace (at least for column prefix index case) call 

               ret = charset->coll->strnncollsp(charset,
                                  a, a_length,
                                                 b, b_length, 0);

with call

               ret = charset->coll->strnncollsp(charset,
                                  a, a_length,
                                                 b, b_length, 1);

And test that this works for all of following column types in both utf8 and ucs2:

        case MYSQL_TYPE_BIT:
        case MYSQL_TYPE_STRING:
        case MYSQL_TYPE_VAR_STRING:
        case FIELD_TYPE_TINY_BLOB:
        case FIELD_TYPE_MEDIUM_BLOB:
        case FIELD_TYPE_BLOB:
        case FIELD_TYPE_LONG_BLOB:
        case MYSQL_TYPE_VARCHAR:
[19 Oct 2005 9:22] Jan Lindström
Changed synopsis because column prefix index on normal varchar fields causes also 'Duplicate key error':

mysql> create table t1 (a int, b varchar(10), c varchar(10), filler varchar(10), primary key(a, b(2)),unique key (a, c(2))) engine = innodb;
Query OK, 0 rows affected (0.01 sec)

mysql> create table t2 (a int, b varchar(10), c varchar(10), filler varchar(10), primary key(a, b(2)),unique key (a, c(2))) engine = myisam;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t1 values(1,'c','c','c');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t2 values(1,'c','c','c');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t2 values(1,'c ','c ','c ');
ERROR 1062 (23000): Duplicate entry '1-c ' for key 1
mysql> insert into t1 values(1,'c ','c ','c ');
ERROR 1062 (23000): Duplicate entry '1-c ' for key 1

SQL 2003 standard says:

The comparison of two character string expressions depends on collation used for the comparison. When values of unequal size are compared, if the collation of the comparison has the NOPAD characteristic and the sorter value is equal to some prefix of the longer value, then the sorter value is considered less than the longer value. If the collation for the comparison has the PADSPACE characteristic,for the purpose of comparison,  the shorter values is effectively extended to the length of the longer by concatenation of <spcace>s on the right.

Manual could state which character set collations have NOPAD or PADSPACE characteristics. In addition to this SHOW FULL COLUMNS FROM x could also print this information.

Regards,
    Jan
[19 Oct 2005 13:24] Valeriy Kravchuk
Thank you for a problem report. Verified on latest 5.0.15 (ChangeSet@1.2007.4.1, 2005-10-18 18:51:07-07:00, patg@krsna.patg.net) just as described in the last comment.
[21 Oct 2005 8:43] Alexander Barkov
Jan, I'm sorry I didn't understand what's the problem?
Everything seems to work as expected. All MySQL
collations are  "PAD SPACE collations".

I think it is a good idea to add "PAD SPACE" reporting into "SHOW" output.
Thanks for the idea! "SHOW COLLATION" seems to be the best place.
We also can add missing PAD_ATTRIBUTE column
into INFORMATION_SCHEMA.COLLATIONS, defined by the standard.

I'm going to put this bug into "Documenting" status to let the doc
team check whether something should be added into the manual
about "PAD SPACE", and ask them to move this report into
"Feature request" afterwards, for extending SHOW and INFORMATION_SCHEMA.

What do you think? Thanks!
[21 Oct 2005 8:52] Jan Lindström
IMHO if you have varchar field then 'c' != 'c ' because their length is different. This should be true also when comparing keys. i.e.

create table t1(a varchar(2)) engine=innodb;
insert into t1 values('c'),('c ');
select * from t1 where a = 'c';

should return one row not two

similarly 
select * from t1 where a = 'c ';

and
select * from t1 where a = 'c  '; // this should not return any rows, now returns 2

Regards,
    Jan
[21 Oct 2005 9:30] Alexander Barkov
According to the standard, comparison of CHAR/VARCHAR fields
behaviour depends solely on collation PADSPACE/NOPAD attribute.
Data type doesn't matter.

All MySQL collations are PADSPACE.

This is not a bug. Everything works as expected.
[21 Oct 2005 9:37] Alexander Barkov
Dear documentation team:

Please check if the manial mentions that MySQL collations
are "PAD SPACE" collations, and thus trailing spaces
in CHAR/VARCHAR do not matter during comparison.
This is currently true for all collations.

Note, for  BINARY/VARBINARY datatypes trailing spaces do matter.

In the future we'll possibly also have NOPAD collations for CHAR/VARCHAR.

Afterwards, please change status of this bug to feature request.
It is a good idea to report PAD attribute in SHOW COLLATION
and INFORMATION_SCHEMA.

Thanks!
[27 Oct 2005 8:49] Jon Stephens
Thank you for your bug report. This issue has been addressed in the
documentation. The updated documentation will appear on our website
shortly, and will be included in the next release of the relevant
product(s).

Additional info:

Updated the CHAR/VARCHAR section of the Column Types chapter in the 4.1/5.0/5.1 versions of the Manual with this info after I verified the behaviour in 4.0.25, 5.0.15, and 5.1.2-bk. Changed status to S3/Fix Later.

Please advise if this behaviour is not true for any version prior to 4.0 and I'll amend the 4.1 Manual accordingly. Thanks!