Bug #24424 CHAR data-type not space-padding
Submitted: 19 Nov 2006 4:28 Modified: 4 Jan 2008 15:08
Reporter: Michael Austin Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Data Types Severity:S1 (Critical)
Version:5.1.11 OS:Any (ALL)
Assigned to: Assigned Account CPU Architecture:Any
Tags: cast, char, datatype

[19 Nov 2006 4:28] Michael Austin
Description:
CHAR data-type should space-fill when data is less than the defined length.
"CAST"ing from varchar to char does not produce expected results.

FROM THE DOCS:
 The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR  values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.

Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535. (The maximum effective length of a VARCHAR is determined by the maximum row size and the character set used. The maximum column length is subject to a row size of 65,532 bytes.) 

How to repeat:
mysql> create table b (b char(10));
Query OK, 0 rows affected (4.08 sec)

mysql> insert into b values ('ABC');
Query OK, 1 row affected (0.20 sec)

mysql> select '~'||b||'~' from b;
+-------------+
| '~'||b||'~' |
+-------------+
| ~ABC~       |<<<-- should be ~ABC       ~
+-------------+
1 row in set (0.01 sec)

create table a (a varchar(10));
insert into a values ('ABC');
insert into a values(cast('ABC' as char(10)));
select '~'||a||'~' from a;

RESULTS:
mysql> select '~'||a||'~' from a;
+-------------+
| '~'||a||'~' |
+-------------+
| ~ABC~       |
| ~ABC~       |<<<-- should be ~ABC       ~
+-------------+
2 rows in set (0.01 sec)

mysql> select '~'||cast(a as char(10))||'~' from a;
+-------------------------------+
| '~'||cast(a as char(10))||'~' |
+-------------------------------+
| ~ABC~                         |<<<-- should be ~ABC       ~
| ~ABC~                         |<<<-- should be ~ABC       ~
+-------------------------------+

Oracle Rdb: (First ANSI-STANDARD COMPLIANT database engine)
$ sql
SQL> att'fi testdb';
SQL> create table b (b char(10));
SQL> insert into b values ('ABC');
1 row inserted
SQL> select '~'||b||'~' from b;

 ~ABC       ~
1 row selected
SQL>

SQL> insert into a values ('ABC');
1 row inserted
SQL> insert into a values(cast('ABC' as char(10)));
1 row inserted
SQL> select '~'||a||'~' from a;

 ~ABC~
 ~ABC       ~
2 rows selected

SQL>  select '~'||cast(a as char(10))||'~' from a;

 ~ABC       ~
 ~ABC       ~
2 rows selected

Suggested fix:
I am not a developer to that level... Make CAST work as it was intended?
[20 Nov 2006 14:47] Valeriy Kravchuk
Thank you for a problem report. Sorry, but it is not a bug, as manual (http://dev.mysql.com/doc/refman/5.1/en/char.html) says:

"The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR  values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed."

and, later:

"VARCHAR values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL."

So, it is documented behaviour.
[20 Nov 2006 20:52] Michael Austin
I clearly know what data should be returned when using char vs. varchar data (DBA and Field Engineering Support for Rdb (Version 5 timeframe).

As you can see from my example - On this platform, the CHAR datatype is clearly NOT space-padding as documented and as defined by the  ANSI Standard.  this also affects the CAST function. Is this not the behavior of other platforms?  

On what basis is failing to space-pad a CHAR field not considered a bug?

mysql> create table b (b char(10));
Query OK, 0 rows affected (4.08 sec)

mysql> insert into b values ('ABC');
Query OK, 1 row affected (0.20 sec)

mysql> select '~'||b||'~' from b;
+-------------+
| '~'||b||'~' |
+-------------+
| ~ABC~       |<<<-- should be ~ABC       ~
+-------------+

It is NOT Documented behavior and is WRONG!
[21 Nov 2006 10:39] Valeriy Kravchuk
I agree that the behaviour is counter-intuitive and different from other RDBMSes, so I mark this report as verified feature request (you want the results as in Oracle and other databases). But current behaviour is documented:

"When CHAR values are retrieved, trailing spaces are removed."

Have you read that in the quote I gave you? Look:

mysql> set sql_mode=oracle;
Query OK, 0 rows affected (0.00 sec)

mysql> select '~'||'abc   '||'~';
+--------------------+
| '~'||'abc   '||'~' |
+--------------------+
| ~abc   ~           |
+--------------------+
1 row in set (0.00 sec)

So, if we have trailing spaces in the value you are inserting to the database, for example, they are preserved. But when you RETRIEVE CHAR value from the table, they are removed, as described in the manual:

mysql> create table d(b char(10));
Query OK, 0 rows affected (0.03 sec)

mysql> insert into d values('ABC');
Query OK, 1 row affected (0.00 sec)

mysql> select length(b) from d;
+-----------+
| length(b) |
+-----------+
|         3 |
+-----------+
1 row in set (0.00 sec)

mysql> select '~'||b||'~' from d;
+-------------+
| '~'||b||'~' |
+-------------+
| ~ABC~       |
+-------------+
1 row in set (0.00 sec)

But they are STORED in the table (look at the row length):

mysql> show table status like 'd'\G
*************************** 1. row ***************************
           Name: d
         Engine: MyISAM
        Version: 10
     Row_format: Fixed
           Rows: 1
 Avg_row_length: 11
    Data_length: 11
Max_data_length: 3096224743817215
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2006-11-21 12:12:57
    Update_time: 2006-11-21 12:13:04
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

So, this behaviour (also strange for me, who started to work with Oracle in 1994) can not be declared a bug because it is explicitly documented. I do not know the exact reasons why MySQL work with CHAR columns in this way, though. So, for me, it is a reasonable feature request (at least, in ORACLE mode char columns should behave like in Oracle!).
[21 Nov 2006 21:47] Michael Austin
My sincerest apologies -- I misread this statement: "When CHAR values are retrieved, trailing spaces are removed."

Now that I have read them, I will have to say that even though this is documented behavior - it is NOT ANSI-Compliant and therefore incorrect. This would mean that there is no real difference in CHAR vs. VARCHAR data.  The two data-types were created for specific purposes. Because of this, there is no real workaround for the problem.

This was discovered while attempting to show someone who needed to have records where the fields needed to be fixed-length for exporting data to external systems.  (looks like this would be input data for FORTRAN or COBOL applications)

example:
select concat(cast(a as char(10)),cast(b as char(20))...etc...) from tablex;
producing records where the fields are fixed-length.

Or the ANSI-Compliant syntax:

select cast(a as char(10))||cast(b as char(20)) from tablex;

I would therefore propose that the development team sit down and revisit why they chose to remove those "trailing" spaces.  In my professional opinion, it makes no sense to have a definition that does not meet ANSI-standards and be different from every other database with regards to data-types.
[22 Nov 2006 18:27] Jerry Stuckle
I have to agree with Michael on this one. 

I've been working with RDB's for around 20 years, and the SQL standard has always been that CHAR columns are returned with trailing blanks.  If the user wants to remove the blanks, that is their option.

Unfortunately, I do not have a copy of the SQL standards available at this time, so I can't quote you the chapter and verse.  But I'm sure if you look at it you will find this is true.
[19 Dec 2006 1:30] Michael Austin
I consider a datatype behavior that does not adhere to any of the SQL standards for those datatypes to be a bug. Period.  How do you expect anyone who knows databases to take you seriously.

This is a bug pure and simple.  According to ALL of the SQL standards since 1989, CHAR is a FIXED-LENGTH datatype. When any "real" database developer, administrator, designer uses this datatype they expect to retrieve fixed-length data of the length defined by the database designer/developer.  Even if they insert 4 characters in a 20 character field, they expect to retrieve 20 characters.  If they want to retrieve variable length data, they would use VARCHAR.

If you want to be a world-class database engine, should you not follow the standards that have been in place since SQL was invented?
[25 Sep 2007 18:04] Mark Callaghan
I wish that char were changed to not strip trailing blanks. The current behavior surprises everyone.
[4 Jan 2008 15:08] Michael Austin
Since it has been more than a year since I submitted this, I just noticed this was changed from a "bug" to a "feature request" because it is "documented".  I will reiterate - EVERYONE who knows anything about databases and the proper use of data types would consider this erroneous behavior - documented or not.
[10 Oct 2008 11:21] Alexander Barkov
See sql_mode=PAD_CHAR_TO_FULL_LENGTH 
http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html
[28 May 2009 2:47] Mulyadi Pasaribu
How's the progress of this bug http://itmagazines.info