Bug #47047 incorrect return type length
Submitted: 1 Sep 2009 15:36 Modified: 5 Oct 2009 15:52
Reporter: Matthew Lord Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S2 (Serious)
Version:5.1.37, 5.0.85 OS:Any
Assigned to: Paul DuBois CPU Architecture:Any

[1 Sep 2009 15:36] Matthew Lord
Description:
The wrong data type length is returned when the same column is used multiple times in nested functions.  This sometimes results in the type becoming TEXT rather than VARCHAR in the various connectors and can cause problems
there (not to mention the unnecessary memory usage and network traffic). 
Here's an example:

mysql -u root --column-type-info test 
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.1.37-log MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select SUBSTRING( c1,1, 6 ) as col1 from btest;
Field   1:  `col1`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  latin1_swedish_ci (8)
Length:     6                               <--------------------------
Max_length: 6
Decimals:   31
Flags:      

+--------+
| col1   |
+--------+
| foobar | 
+--------+
1 row in set (0.00 sec)

mysql> select SUBSTRING( c1,1, (INSTR(c1, 'mike')-1) ) as col1 from btest;
Field   1:  `col1`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  latin1_swedish_ci (8)
Length:     254                       <-------------------------------
Max_length: 6
Decimals:   31
Flags:      

+--------+
| col1   |
+--------+
| foobar | 
+--------+
1 row in set (0.00 sec)

----------------------------------

It seems like the full column length is used, rather than the length
of the actual result, when we reference the column multiple times in nested functions.

How to repeat:
mysql -u root --column-type-info test

create table btest (c1 varchar(254) DEFAULT NULL);

insert into btest values ("foobarmike");

select SUBSTRING( c1,1, 6 ) as col1 from btest;

select SUBSTRING( c1,1, (INSTR(c1, 'mike')-1) ) as col1 from btest;

Suggested fix:
We should return a datatype that has the length of the actual result in
both cases.  For this specific example both functions should return a data type
with a length of 6.
[9 Sep 2009 10:16] Sergei Glukhov
According to manual
(http://dev.mysql.com/doc/refman/5.1/en/c-api-datatypes.html):

# unsigned long length
The width of the field. This corresponds to the display length, in bytes.

#unsigned long max_length

The maximum width of the field for the result set (the length in bytes of the longest field value for the rows actually in the result set).
---------

In your example Length=254, Max_length=6. So display length is 254,
it's correct as field length is 254 and Max_length has correct value too
as maximum width of the field value in result set is 6 in your example.

if you add new record, for example:
insert into t1 values ("foobarfoobarmike");
Max_length becomes 12 as maximum width of the field value is 12 for 
"foobarfoobarmike".

It's expected behaviour as described in manual.
[9 Sep 2009 13:02] Matthew Lord
Hi Sergey,

Well, either way the two statements should be consistent.  Based on what you've said then the bug actually lies in this return type length:

mysql> select SUBSTRING( c1,1, 6 ) as col1 from btest;
Field   1:  `col1`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  latin1_swedish_ci (8)
Length:     6                               <--------------------------
Max_length: 6
Decimals:   31
Flags:      

+--------+
| col1   |
+--------+
| foobar | 
+--------+
1 row in set (0.00 sec)

The problem is in the inconsistency.  both of these functions should return
the same result and data type lengths as the connectors use these to map
the results to local data types.
[9 Sep 2009 14:42] Sergei Golubchik
Matt, perhaps I'm missing something, but for

   SUBSTRING( c1,1, 6 )

MySQL can tell in advance, without looking at the data, that the length of the result will be at most 6. It could be shorter, though.

But for

   SUBSTRING( c1,1, (INSTR(c1, 'mike')-1) )

the length of the result is not known in advance.
[9 Sep 2009 15:31] Matthew Lord
How can we say that this isn't a bug because it follows the behavior in the manual.  Then say that it's not a bug when it doesn't actually follow
the behavior in the manual?

According to manual
(http://dev.mysql.com/doc/refman/5.1/en/c-api-datatypes.html):

# unsigned long length
The width of the field. This corresponds to the display length, in bytes.

#unsigned long max_length

The maximum width of the field for the result set (the length in bytes of the longest
field value for the rows actually in the result set).

-----------------------------

How does knowing the value of max_length in advance change the value
for the length column?  Either the docs need to be updated or the behavior
needs to be made consistent.
[10 Sep 2009 14:59] Paul DuBois
If connectors are using the metadata to determine local data types, they won't be acting any different from how the server itself uses the metadata:

DROP TABLE IF EXISTS t;
CREATE TABLE t (c1 CHAR(254));
INSERT INTO t SET c1 = 'abcdefghij';
INSERT INTO t SET c1 = 'abcmikehij';
DROP TABLE IF EXISTS t2;
CREATE TABLE t2
SELECT
  SUBSTRING(c1,1,6) AS a1,
  SUBSTRING(c1,1,INSTR(c1,'mike')-1) AS a2
FROM t;
SHOW CREATE TABLE t2\G

Result:

*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `a1` varchar(6) DEFAULT NULL,
  `a2` varchar(254) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

If you believe that a change to the documentation is necessary, please specify what change you would like to see. Thanks.
[5 Oct 2009 15:52] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

Added to length description:

The server determines the length value before it generates the result
set, so this is the minimum length required for a data type capable
of holding the largest possible value from the result column, without
knowing in advance the actual values that will be produced by the
query for the result set.