Bug #4048 problem with spaces in RPAD in query that uses 5+ tables
Submitted: 8 Jun 2004 0:21 Modified: 17 Jan 2005 14:46
Reporter: Martijn Korse Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:3.23 OS:Linux (Linux)
Assigned to: Guilhem Bichot CPU Architecture:Any

[8 Jun 2004 0:21] Martijn Korse
Description:
I've tried to use RPAD to add spaces when the length of a value from a column was less then 30 characters.
This, however, doesn't seem to work when there's 5 or more tables involved in the query.

This:
***
		SELECT
			RPAD(C.Lastname, 30, ' ')
		FROM Booking AS B, FlightDetails AS FD
			LEFT JOIN Bill ON Bill.BookingID=B.BookingID
			LEFT JOIN Clients AS C ON C.ClientID=Bill.ClientID
		WHERE
			B.FlightDetailsID = FD.FlightDetailsID
		AND FD.DateTo >= NOW()
		ORDER BY B.BookingID
***
works. It gives the last name and adds spaces until it is a string of 30 characters.

This however:
***
		SELECT
			RPAD(C.Lastname, 30, ' ')
		FROM Booking AS B, FlightDetails AS FD, Destinations AS D
			LEFT JOIN Bill ON Bill.BookingID=B.BookingID
			LEFT JOIN Clients AS C ON C.ClientID=Bill.ClientID
		WHERE
			B.FlightDetailsID = FD.FlightDetailsID AND
			B.DestinationID = D.DestinationID
		AND FD.DateTo >= NOW()
		ORDER BY B.BookingID
***
doesn't work. It just gives the last name, without any spaces.

- the problem seems to occur with the joining of any other table
- the problem seems to occur only with spaces (' ')

How to repeat:
See above

Suggested fix:
should be clear
[8 Jun 2004 0:43] Matthew Lord
Hi Marty,

Could you provide me with your table structure so that I can population some dummy rows and see if 
it's different than other tables with the same type of rpad usage?  I also want to test against newer 
versions of mysql.  I haven't seen any similar bugs in the DB but it could have been noticed and fixed 
internally.

Thank You!
[8 Jun 2004 7:53] Martijn Korse
No Problem:

********************
#
# Table structure for table 'Bill'
#

CREATE TABLE Bill (
  BillID bigint(20) unsigned NOT NULL auto_increment,
  BookingID bigint(20) unsigned default NULL,
  ClientID bigint(20) unsigned default NULL,
  PRIMARY KEY  (BillID),
  UNIQUE KEY BillID (BillID),
  KEY BookingID (BookingID,ClientID)
) TYPE=MyISAM;

#
# Dumping data for table 'Bill'
#

INSERT INTO Bill VALUES("1345", "1388", "1321");
INSERT INTO Bill VALUES("1346", "1389", "1322");
INSERT INTO Bill VALUES("1347", "1390", "1323");
INSERT INTO Bill VALUES("1348", "1391", "1324");
INSERT INTO Bill VALUES("1350", "1392", "1326");
INSERT INTO Bill VALUES("1351", "1393", "1327");
INSERT INTO Bill VALUES("1352", "1394", "1328");
INSERT INTO Bill VALUES("1353", "1395", "1329");

#
# Table structure for table 'Booking'
#

CREATE TABLE Booking (
  BookingID bigint(20) unsigned NOT NULL auto_increment,
  DestinationID mediumint(8) unsigned default NULL,
  ClientID bigint(20) unsigned default NULL,
  FlightDetailsID int(10) unsigned default NULL,
  HotelID smallint(5) unsigned default NULL,
  PRIMARY KEY  (BookingID),
  UNIQUE KEY BookingID (BookingID),
  KEY DestinationID (DestinationID,ClientID),
  KEY FlightDetailsID (FlightDetailsID),
  KEY HotelID (HotelID)
) TYPE=MyISAM;

#
# Dumping data for table 'Booking'
#

INSERT INTO Booking VALUES("1388", "3", NULL, "954", "3");
INSERT INTO Booking VALUES("1389", "3", NULL, "955", "3");
INSERT INTO Booking VALUES("1390", "3", NULL, "956", "3");
INSERT INTO Booking VALUES("1391", "3", NULL, "957", "3");
INSERT INTO Booking VALUES("1392", "3", NULL, "958", "3");
INSERT INTO Booking VALUES("1393", "3", NULL, "959", "3");
INSERT INTO Booking VALUES("1394", "3", NULL, "960", "3");
INSERT INTO Booking VALUES("1395", "3", NULL, "961", "3");

#
# Table structure for table 'Clients'
#

CREATE TABLE Clients (
  ClientID bigint(20) unsigned NOT NULL auto_increment,
  BookingID bigint(20) unsigned NOT NULL default '0',
  Sex enum('m','v') NOT NULL default 'm',
  FirstName varchar(50) default NULL,
  LastName varchar(60) default NULL,
  PRIMARY KEY  (ClientID),
  UNIQUE KEY ClientID (ClientID)
) TYPE=MyISAM;

#
# Dumping data for table 'Clients'
#

INSERT INTO Clients VALUES("1320", "1388", "v", "Chasey", "Lain");
INSERT INTO Clients VALUES("1321", "1388", "v", "Heather", "Hunter");
INSERT INTO Clients VALUES("1322", "1389", "v", "Janine", "Lindemuller");
INSERT INTO Clients VALUES("1323", "1390", "v", "Jenna", "Jameson");
INSERT INTO Clients VALUES("1324", "1391", "v", "Heather", "Brooke");
INSERT INTO Clients VALUES("1326", "1392", "v", "Nici", "Sterling");
INSERT INTO Clients VALUES("1327", "1393", "m", "Brianna", "Banks");
INSERT INTO Clients VALUES("1328", "1394", "v", "Chloe", "Jones");
INSERT INTO Clients VALUES("1329", "1395", "v", "Asia", "Carrera");

#
# Table structure for table 'Destinations'
#

CREATE TABLE Destinations (
  DestinationID mediumint(8) unsigned NOT NULL auto_increment,
  Destination varchar(100) default NULL,
  DS char(2) default NULL,
  PRIMARY KEY  (DestinationID),
  UNIQUE KEY DestinationID (DestinationID)
) TYPE=MyISAM;

#
# Dumping data for table 'Destinations'
#

INSERT INTO Destinations VALUES("1", "Algarve", "PA");
INSERT INTO Destinations VALUES("2", "Porto Santo", "PV");
INSERT INTO Destinations VALUES("3", "Turkije", "TT");

#
# Table structure for table 'FlightDetails'
#

CREATE TABLE FlightDetails (
  FlightDetailsID int(10) unsigned NOT NULL auto_increment,
  DateTo date default NULL,
  PRIMARY KEY  (FlightDetailsID),
  UNIQUE KEY FlightDetailsID (FlightDetailsID)
) TYPE=MyISAM;

#
# Dumping data for table 'FlightDetails'
#

INSERT INTO FlightDetails VALUES("954", "2004-06-15");
INSERT INTO FlightDetails VALUES("955", "2004-06-29");
INSERT INTO FlightDetails VALUES("956", "2004-06-29");
INSERT INTO FlightDetails VALUES("957", "2004-09-07");
INSERT INTO FlightDetails VALUES("958", "2004-06-08");
INSERT INTO FlightDetails VALUES("959", "2004-06-22");
INSERT INTO FlightDetails VALUES("960", "2004-06-22");
INSERT INTO FlightDetails VALUES("961", "2004-06-29");

#
# Table structure for table 'Hotels'
#

CREATE TABLE Hotels (
  HotelID mediumint(8) unsigned NOT NULL auto_increment,
  DestinationID mediumint(8) unsigned default NULL,
  Hotel varchar(100) default NULL,
  PRIMARY KEY  (HotelID),
  UNIQUE KEY HotelID (HotelID),
  KEY DestinationID (DestinationID)
) TYPE=MyISAM;

#
# Dumping data for table 'Hotels'
#

INSERT INTO Hotels VALUES("1", "2", "Vila Baleira");
INSERT INTO Hotels VALUES("2", "3", "Tashan");
INSERT INTO Hotels VALUES("3", "3", "Antiphellos");
INSERT INTO Hotels VALUES("4", "1", "Ria Park");
********************

btw, here's the same query, with the table Hotels joined to it (instead of Destinations). Same bug appears.

***
		SELECT
			RPAD(C.Lastname, 30, ' ')
		FROM Booking AS B, FlightDetails AS FD, Hotels AS H
			LEFT JOIN Bill ON Bill.BookingID=B.BookingID
			LEFT JOIN Clients AS C ON C.ClientID=Bill.ClientID
		WHERE
			B.FlightDetailsID = FD.FlightDetailsID AND
			H.HotelID=B.HotelID
		AND FD.DateTo >= NOW()
		ORDER BY B.BookingID
***
[10 Jun 2004 9:57] Martijn Korse
Someone suggested to try this:

   LEFT(CONCAT(C.Lastname, '                              '),30)

instead of RPAD().

The same bug appears!!
[10 Jun 2004 16:12] Matthew Lord
Marty,

I cannot repeat this in any current version of MySQL.  You are using a very old version.  Is there a 
reason that you cannot upgrade to 4.0.20 or at least 3.23.58?  There have been many, many bugs fixed 
between 3.23.41 and 3.23.58.  The only reason those releases were made (42-58) were for bug fixes.

Best Regards
[10 Jun 2004 16:30] Sjors van der Pluijm
Upgrading won't help since I'm running V 4.0.2 and having the same results.
The first query gives the names with spaces, the second only gives the names.
[10 Jun 2004 16:44] Matthew Lord
Hi,

Again, 4.0.2 is a very old version.  I cannot repeat this in 4.0.20.  Here is what I got below:

mysql> SELECT
    ->                         RPAD(C.Lastname, 30, ' ')
    ->                 FROM Booking AS B, FlightDetails AS FD, Destinations AS D
    ->                         LEFT JOIN Bill ON Bill.BookingID=B.BookingID
    ->                         LEFT JOIN Clients AS C ON C.ClientID=Bill.ClientID
    ->                 WHERE
    ->                         B.FlightDetailsID = FD.FlightDetailsID AND
    ->                         B.DestinationID = D.DestinationID
    ->                 AND FD.DateTo >= NOW()
    ->                 ORDER BY B.BookingID;
+---------------------------+
| RPAD(C.Lastname, 30, ' ') |
+---------------------------+
| Hunter                    |
| Lindemuller               |
| Jameson                   |
| Brooke                    |
| Banks                     |
| Jones                     |
| Carrera                   |
+---------------------------+
7 rows in set (0.41 sec)

Best Regards
[10 Jun 2004 23:41] Sjors van der Pluijm
sorry, I lost a 0 in my previous post. I'm running 4.0.20, latest production release at this time
[10 Jun 2004 23:56] Matthew Lord
Hi,

I went through the steps that you listed to repeat the problem and I couldn't in 4.0.20.  Could you go 
through the steps that you listed in 4.0.20 so that the tables and everything are created from scratch in 
your 4.0.20 instance.

Thanks
[11 Jun 2004 7:52] Martijn Korse
Hi Matthew

You wrote that the output of your query is:

+---------------------------+
| RPAD(C.Lastname, 30, ' ') |
+---------------------------+
| Hunter                    |
| Lindemuller               |
| Jameson                   |
| Brooke                    |
| Banks                     |
| Jones                     |
| Carrera                   |
+---------------------------+

But i think that's prove that the bug appears in your version too. If you count the amount of characters in "RPAD(C.Lastname, 30, ' ')" -> that's 25 and not 30. If the output had been 30 indeed, it would've looked like this:

+--------------------------------+
| RPAD(C.Lastname, 30, ' ')      |
+--------------------------------+
| Hunter                         |
| Lindemuller                    |
| Jameson                        |
| Brooke                         |
| Banks                          |
| Jones                          |
| Carrera                        |
+--------------------------------+

Otherwise, just try it with 150 instead of 30 -> that should make it more clear.

btw, the reason i'm not using a more recent mysql-version is that my own server is down and i depend on hosts at the moment :(
[11 Jun 2004 16:21] Matthew Lord
Hi Marty,

You were most correct!  It was an oversight on my point as the 30 ended so visually close to the length 
of the column header.

+----------------------------+
| RPAD(C.Lastname, 230, ' ') |
+----------------------------+
| Hunter                     |
| Lindemuller                |
| Jameson                    |
| Brooke                     |
| Banks                      |
| Jones                      |
| Carrera                    |
+----------------------------+

Thank you very much for your submission!
[4 Aug 2004 16:24] Guilhem Bichot
verified again with 4.0.20. Assigning it to me.
[4 Aug 2004 23:58] Guilhem Bichot
The fact that it happens only with > 2 tables is probably because the query execution plan changes starting from 3 tables:
explain
		SELECT
			RPAD(C.Lastname, 30, ' ')
		FROM Booking AS B, FlightDetails AS FD
			LEFT JOIN Bill ON Bill.BookingID=B.BookingID
			LEFT JOIN Clients AS C ON C.ClientID=Bill.ClientID
		WHERE
			B.FlightDetailsID = FD.FlightDetailsID
		AND FD.DateTo >= NOW()
		ORDER BY B.BookingID
;
yields:
table   type    possible_keys   key     key_len ref     rows    Extra
B       ALL     FlightDetailsID NULL    NULL    NULL    8       Using filesort
FD      eq_ref  PRIMARY,FlightDetailsID PRIMARY 4       B.FlightDetailsID       1       Using where
Bill    ref     BookingID       BookingID       9       B.BookingID     9       Using index
C       eq_ref  PRIMARY,ClientID        PRIMARY 8       Bill.ClientID   1

whereas
explain
		SELECT
			RPAD(C.Lastname, 30, ' ')
		FROM Booking AS B, FlightDetails AS FD, Hotels AS H
			LEFT JOIN Bill ON Bill.BookingID=B.BookingID
			LEFT JOIN Clients AS C ON C.ClientID=Bill.ClientID
		WHERE
			B.FlightDetailsID = FD.FlightDetailsID AND
			H.HotelID=B.HotelID
		AND FD.DateTo >= NOW()
		ORDER BY B.BookingID
yields:

table   type    possible_keys   key     key_len ref     rows    Extra
H       index   PRIMARY,HotelID PRIMARY 3       NULL    4       Using index; Using temporary; Using filesort
B       ALL     FlightDetailsID,HotelID NULL    NULL    NULL    6       Using where
FD      eq_ref  PRIMARY,FlightDetailsID FlightDetailsID 4       B.FlightDetailsID       1       Using where
Bill    ref     BookingID       BookingID       9       B.BookingID     9       Using index
C       eq_ref  PRIMARY,ClientID        PRIMARY 8       Bill.ClientID   1

So in the 3-table case a temporary table is used. It could be then the content of the temporary table is read with end-space trimming. I'll check.
[5 Aug 2004 1:32] Guilhem Bichot
When I remove the ORDER BY, problem goes away; that's consistent with the idea that use of tmp table by mysqld to resolve the query is what triggers the problem.
I noticed that when the query fails, the RPAD() field which is sent to client is of type Field_string, hence the end space trimming. When query succeeds, it is of type Item_func_rpad. When it fails, tmp table is used: to copy the data to the tmp table, copy_funcs() calls Item_result_field::save_in_result_field() which calls Item::save_in_field() which calls Item_func_rpad::val_str(). So the rpadded row goes into the tmp table (of type HEAP, so no space truncation when row is stored). Then sort is done and result is ready, somewhere in select_send::send_data() the field to send is Item_field (which means "column of table") and so space is trimmed. Whereas the field to send should be "function" and so not space-trimmed. We should change that when tmp table is used internally by SELECT, it is possible to not space-trim columns upon sending.
To put it another way:
SELECT RPAD(...) FROM ... should not be treated like:
CREATE TEMPORARY TABLE tmp SELECT RPAD(...) FROM ...;
SELECT * FROM tmp;
[5 Aug 2004 16:00] Guilhem Bichot
A workaround:
ALTER TABLE Clients MODIFY LastName TEXT;
then it works.
[8 Nov 2004 10:37] Guilhem Bichot
Need to have Monty look at the patch.
[8 Nov 2004 10:38] Guilhem Bichot
I note here that however, this bad bug is listed in the known deficiencies in our manual:
"When using the @code{RPAD()} function in a query that has to be
resolved by using a temporary table, all resulting strings will
have rightmost spaces removed."
[16 Jan 2005 22:21] Guilhem Bichot
In 5.0 we support VARCHAR in HEAP tables, so this bug may be solvable in 5.0 - need to check.
[17 Jan 2005 14:46] Guilhem Bichot
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Indeed it's fixed in 5.0.3: I can repeat the bug on 4.1.10 but not in 5.0.3. 5.0.3 uses temp table to resolve the query like 4.1.10 does, so it's not a plan change, it's really fixed :)
This is because in 5.0.3, HEAP/MEMORY tables support VARCHAR (and btw in 5.0.3 VARCHAR can contain up to 65k bytes instead of 255 in older versions).