Bug #6399 getColumnDisplaySize wrong
Submitted: 3 Nov 2004 11:46 Modified: 14 Jun 2013 0:11
Reporter: [ name withheld ] Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server Severity:S4 (Feature request)
Version:4.1.x OS:Any (All)
Assigned to: Rafal Somla CPU Architecture:Any

[3 Nov 2004 11:46] [ name withheld ]
Description:
I use mysql 4.1.7 on redhat 7.3
I converted my existing database to utf8 with

ALTER DATABASE Database CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE Table CONVERT TO CHARACTER SET utf8;

I changed the JDBC Connection String to

jdbc:mysql://192.168.7.11/Database?autoReconnect=true&characterEncoding=utf8&useUnicode=true

Now the getColumnDisplaySize from ResultSetMetaData ist wrong.
For some columns I get the right size, for some I get the size multiplied by 3.
If I enter a german umlaut in a column with right size, it returns the wrong size the next time.

I don't know if this is a JDBC Connector or a server problem.

How to repeat:
as described.
[5 Nov 2004 16:35] Mark Matthews
Currently, the server returns the 'byte[]' length of columns. The protocol needs to be extended so that both values (byte[] and char[]) are returned.
[26 Dec 2004 20:37] Mark Matthews
This is fixed in 3.0.17/3.1.2, however it really is a server-shortcoming and should be addressed there, as it is returning incorrect metadata.
[24 Mar 2005 13:21] Konstantin Osipov
Mark,
there are three length attributes of a character string:
- number of bytes (SQL function LENGTH())
- number of characters (SQL function CHAR_LENGTH())
- number of display spaces (curently unexistent DISPLAY_LENGTH() and DISPLAY_WIDTH()) function.

Currently result set metadata contains only the maximum number of bytes
a column can occupy. You also can evaluate number of bytes in the column
when reading the row. Apart from that, the metadata doesn't contain any of the 
three mentioned above characteristics of result set rows, as these lengths
depend on actual content of the result set columns.
So, generally speaking, LENGTH(), CHAR_LENGTH() and DISPLAY_LENGTH()
are not result set metadata, but rather describe contents of every column/field
in the result set.
It's possible to extend the client-server protocol so that upon client's request it
sends such rows descriptions in addition to the actual contents of the rows. 
We may consider adding that. The question is, however, what's the purpose
of having it in the protocol when the client can evaluate it using SQL 
when it's needed?
[24 Mar 2005 13:44] Mark Matthews
Konstantin,

I figured out a workaround, but the reason it's needed are twofold:

1). Because many specifications require it (JDBC/ODBC, probably the SQL CLI)
2.) Because MySQL often rewrites column types and sizes when performing queries (by creating 'approximate' temporary tables), you can't always get the _correct_ value after issuing the query.
[24 Mar 2005 13:53] Alexander Barkov
Perhaps a new function in libmysqlclient can solve this,
something like mysql_fetch_display_lengths().
What do you thinks?
[24 Mar 2005 13:53] Alexander Barkov
Perhaps a new function in libmysqlclient can solve this,
something like mysql_fetch_display_lengths().
What do you think?
[24 Mar 2005 23:14] Konstantin Osipov
Sorry for putting my foot in it, but I personally can't think up how a function in
libmysql can help a JDBC driver problem. This is why I am on the position of not having character-set related functionality in the client. Alexander knows that already (a point of our disagreement ;)
[25 Apr 2005 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[21 Jul 2010 12:32] Ulf Wendel
As I see it the underlying problem exists still today. We've run into it for C/C++ and its getColumnDisplaySize() method.

There is no safe and water-proof way for a driver to return the column display size in characters (not bytes) for a certain result set. The C API meta data field one would like to use for computing the display width in characters is max_length. The max_length value tells you how many bytes there are for a given result set. And getColumnDisplaySize is per given result set.

"The maximum width of the field for the result set (the length in bytes of the longest field value for the rows actually in the result set).",
http://dev.mysql.com/doc/refman/5.1/en/c-api-data-structures.html

Unfortunately max_length gives the number of bytes. And there is no easy way to get from bytes to number of characters. You must not divide max_length by the maximum number of bytes per character for the active character. 

Example - take a uft8 value consisting of three characters: two cyrillic and one ASCII. max_len = (2 cyrillic chars x <number_of_bytes_per_cyrillic_char>) + (1 ascii char x <number_of_bytes_per_acsii_char>) = 2 * 2 + 1 = 5.

There is no way for the driver to get from 5 to 3 characters.

Could the driver use length from http://dev.mysql.com/doc/refman/5.1/en/c-api-data-structures.html . Length gets us a little closer to a solution but it is no solution. Because length is not from the actual result set:

" The width of the field. This corresponds to the display length, in bytes.

The server determines the length value before it generates the result set, so this is the minimum length required for a data type capable of holding the largest possible value from the result column, without knowing in advance the actual values that will be produced by the query for the result set. ",
http://dev.mysql.com/doc/refman/5.1/en/c-api-data-structures.html

However, getColumnDisplaySize() clearly is about the result set. It is not about the column specification. 

So, in my eyes, the server delivers insufficient meta data. Six years after the issue had been reported.
[24 Aug 2010 16:22] Ulf Wendel
There seem to be different things being discussed, if I am not mistaken.

Let's stick to the very subject of the bug report. The starting point of the discussion has been the question how a client shall determine the display length (maximum with in characters [not bytes] for a column). The JDBC specification requests that value. And JDBC specs come from ODBC.

Consequently this affects at least:

  Connector/J                                 --> Java, implements protocol
  Connector/ODBC                              --> C,    wraps libmysql
  Connector/C++ (which minics the JDBC API)   --> C++,  wraps libmysql

Whatever solution there is, it must not be limited to libmysql and the C API. It must be a solution based on protocol data or SHOW/I_S, preferrably protocol data because SHOW/I_S means extra calls.

The proposal to add/expose a C API function is nice but not a solution. It does not help us with Connector/J. Connector/J does not use the C library.

The proposal to have the client loop over a result set seems not to address the actual problem. I fail to see how a client can compute a proper display length for any of the two examples using existing meta data:

First example problem to solve:

"For example, for some Japanese characters in utf8 context:
- byte_length= 3
- char_length= 1
- display_length= 2 (one character occupies two screen cells)."

Second example problem to solve:

"Example - take a uft8 value consisting of three characters: two cyrillic and one ASCII. max_len = (2 cyrillic chars x <number_of_bytes_per_cyrillic_char>) + (1 ascii char x<number_of_bytes_per_acsii_char>) = 2 * 2 + 1 = 5.

There is no way for the driver to get from 5 to 3 characters [display length]."

Even if I would be able to see how the client could compute display length when looping over a result set and inspecting existing meta data, I would strongly discourage doing it. Given a sufficiently large result set it will be slow. 

To sum up: either no solution has been proposed so far or I fail to understand it. 

Ulf
[25 Aug 2010 15:14] Alexander Barkov
I Ulf,

I fail to understand the requirement.

Imagine we have a table with 3 records:

CREATE TABLE t1 (a VARCHAR(30) CHARACTER SET utf8);
INSERT INTO t1 VALUES ('a'),('й'),('㐁');

-- U+0061 LATIN SMALL LETTER A           (1 byte, 1 screen cell)
-- U+0439 CYRILLIC SMALL LETTER SHORT I  (2 bytes, 1 screen cells)
-- U+3401 CJK UNIFIED IDEOGRAPH          (3 bytes, 2 screen cells)

Now we do:
SELECT a FROM t1;

+-----+
| a   |
+-----+
| a   | 
| й   | 
| 㐁  | 
+-----+
3 rows in set (0.00 sec)

Please explain what should getColumnDisplaySize() return:

1. Maximum number of characters in result set, according to data type (i.e. 30)
2. Real maximum number of characters in this particular result set, (i.e. 1)
3. Maximum number of screen cells, according to data type (i.e. 60).
4. Real maximum number of screen cells in this particular result set, (i.e. 2)
[25 Aug 2010 17:46] Ulf Wendel
You are close to it. It goes like this:

Imagine we have a table with *one* record:

CREATE TABLE t1 (a VARCHAR(30) CHARACTER SET utf8);
INSERT INTO t1 VALUES ('aй㐁');

-- U+0061 LATIN SMALL LETTER A           (1 byte, 1 screen cell)
-- U+0439 CYRILLIC SMALL LETTER SHORT I  (2 bytes, 1 screen cells)
-- U+3401 CJK UNIFIED IDEOGRAPH          (3 bytes, 2 screen cells)

                                         --------------------------
                                         6 bytes, 4 screen cells, 3 "characters"

Now we do:
SELECT a FROM t1;

How do we get from 6 bytes to 3 "characters", which getColumnDisplaySize() is supposed to report in this example?

The client can only make an estimation. 

If the client does not loop over the result set meta data, the estimation can be rather rough. A C based client could, for example, use "length" from MYSQL_FIELD C API structure. Trouble with "length" is that it is an estimation from the server according to http://dev.mysql.com/doc/refman/5.1/en/c-api-data-structures.html . 

A C client gets closer to the solution and the "real" value for actual result set, if he uses max_length from MYSQL_FIELD C API structure and a loop, as you proposed earlier.

"unsigned long max_length

The maximum width of the field for the result set (the length in bytes of the longest field value for the rows actually in the result set). If you use mysql_store_result() or mysql_list_fields(), this contains the maximum length for the field. If you use mysql_use_result(), the value of this variable is zero. ", http://dev.mysql.com/doc/refman/5.1/en/c-api-data-structures.html . 

(As an aside note: see what the docus say about "mysql_use_result()". I think Mark got puzzled about it. However, its not relevant for the bug at the moment.).

Because both "length" and "max_length" are measured in bytes we need to take the character set into account to get from bytes to number of characters. You propose we divide the number of bytes by the maximum lenght of a character in bytes for the given character set.

  length / cs->mbmaxlen resp. max_length / cs -> mbmaxlen

Now, with the nasty example insert the client should get:

 length        = 30 (bytes)
 max_length    =  6 (bytes)
 cs->mbmaxlen  =  3 (a utf8 character can require 1..3 bytes, we need max)

And there you have the problem:

  30 / 3 -> 10 characters
   6 / 3  -> 2 characters 

... but not 3 "characters", which is what getColumnDisplaySize() should return.

How do we get to "3"? 

We need it for many drivers not only the C based ones. For example, we need it for Connector/J which has its own client server protocol implentation, just like Connector/NET or mysqlnd have. 

And, none of us really wants to loop over potentially huge result sets, to calculate meta data which the server could potentially collect in a central place as the result set gets created.
[25 Aug 2010 18:09] Alexander Barkov
Does getColumnDisplaySize() return size for the current record in fetch cursor?

I thought it is SQL result metadata. I found an example here:

http://developer.mimer.com/documentation/latest_jdbcguide_html/programming.html

Statement stmt;
 ResultSet rs;
 ResultSetMetaData rsmd;
 
 stmt = con.createStatement();
 
 rs = stmt.executeQuery("SELECT *"
                      + "   FROM mimer_store.currencies");
 
 rsmd = rs.getMetaData();
 for (int i = 1; i <= rsmd.getColumnCount(); i++) {
     System.out.println(rsmd.getColumnName(i));
     System.out.println(" Type: " + rsmd.getColumnTypeName(i));
     System.out.println(" Size: " + rsmd.getColumnDisplaySize(i));
 }

I interpret it as follows:

In case of our example, it should return "30", because
datatype is VARCHAR(30). It should not return 3!
The datatype only matters. Real data is not important.
[25 Aug 2010 18:18] Alexander Barkov
> And, none of us really wants to loop over potentially huge result sets,
> to calculate meta data which the server could potentially collect in a
> central place as the result set gets created.

Server does not collect any data in central place.
It sends results row by row, as soon as a new row is found.

So complete results statistics (like maximum lengths) can be
known only after all rows are sent to the client side
and are fetched by client.
[26 Aug 2010 9:07] Ulf Wendel
Even if the server does not collect data in a central place: question from Mark remains if you want to do the very same math in a one logical "central place " (the server) or in multiple places, duplicating the match logic in multiple places (the drivers).

I leave the other question to Mark. Mark probably knows better about the JDBC requirement than I do.
[2 Sep 2010 6:31] Alexander Barkov
Java documentation at
http://download.oracle.com/javase/1.3/docs/api/java/sql/ResultSetMetaData.html?is-external...
says:

> int getColumnDisplaySize(int column)
> Indicates the designated column's normal maximum width in characters.

It says *characters*. It does not mention screen cells.
I interpret it as: in case of a varchar(30) it should return 30.

This is how this function should look like:

int getColumnDisplaySize(int column)
{
  MYSQL_FIELD *field= mysql_fetch_field(mysql_result, column);
  CHARSET_INFO *cs= get_charset(field);

  return
    (field->type == MYSQL_TYPE_STRING || field->type == MYSQL_TYPE_VAR_STRING)?
      mysql_field->length / cs->mbmaxlen : /* convert byte length to character length */
      mysql_field->length; /* This part is for numbers, and what's important - for blobs */
}
[9 Nov 2010 13:12] Lars Thalmann
Rafal will investigate this.
[17 Nov 2010 16:26] Rafal Somla
Both Oracle and Microsoft (http://msdn.microsoft.com/en-us/library/ms378162.aspx) JDBC specifications for getColumnDisplaySize() say only that it should return: "normal maximum width, in characters, for the designated column".

I do not really understand what is "normal maximum witdth" of a column, but at least it is clear (as Bar noted) that it should be in characters, not bytes or screen cells.

From reading the specs it is not clear to me whether the maximum is taken over the given result set or it is the maximum width for the column type as Bar suggests (i.e., for VARCHAR(30) it is always 30).

I guess that nobody knows what it really means, so it should be OK to go with the 2nd alternative, which is easier to implement. I will dig a little bit more before proposing final solution.

Note: The ODBC specs from Microsoft do not make it more clear. I guess that the semantics of getColumnDisplaySize() should correspond to SQL_DESC_DISPLAY_SIZE descriptor field of a result set, which can be read with ODBC function SQLColAttribute (http://msdn.microsoft.com/en-us/library/ms713558(v=VS.85).aspx). There we read:

  SQL_DESC_DISPLAY_SIZE (ODBC 1.0) 
  Maximum number of characters required to display data from the column. For 
  more information about display size, see Column Size, Decimal Digits, Transfer 
  Octet Length, and Display Size in Appendix D: Data Types.

A related paragraph from the appendix does not explain to me whether the maximum is taken over the result set or the whole type:

  Descriptor fields describe the characteristics of a result set. Descriptor 
  fields do not contain valid values about data before statement execution. 
  (...)
  Likewise, in its result set, SQLColAttribute returns the column size, decimal 
  digits, and transfer octet length of columns at the data source; these values 
  are not necessarily the same as the values in the SQL_DESC_PRECISION, 
  SQL_DESC_SCALE, and SQL_DESC_OCTET_LENGTH descriptor fields.
[23 Nov 2010 17:37] Rafal Somla
After digging a bit more in the Internet, it is pretty clear that no one expects getColumnDisplaySize() to return maximal column width over a particular result set, but rather the maximum appropriate for the data type of the column.

That is, if column a of table t has type VARCHAR(30) then getColumnDisplaySize() for the result of SELECT a FROM t should always return 30, even if all values stored in t are shorter than 10 characters.

Still, few questions remain open and seems that many implementations get confused about these:

- what should getColumDisplaySize() return for columns of type DATE, or INTEGER (see discussion at Oracle: http://forums.oracle.com/forums/thread.jspa?threadID=518259)

- what should it return for result columns which do not correspond to any column in the table but are either constant or computed from an expression (see discusion at Postgress: http://archives.postgresql.org/pgsql-hackers/2007-06/msg00428.php)

- probably more...
[1 Dec 2010 20:25] Rafal Somla
PROPOSED SEMANTICS OF GetColumnDisplaySize()
============================================

Here is a proposition which defines what is unspecified in the JDBC standard. I think this proposition is consistent with the official specifications and will meet most expectations.

1. The value returned is in characters for most types of values. It is in bytes only for binary types (see below).

2. The value returned does not depend on the values in the result set but only on the type of the column.

3. Below is a table proposing what should be the returned value for each data type documented in MySQL 5.5 manual (http://dev.mysql.com/doc/refman/5.5/en/data-types.html).

4. Note that for very large blob types I propose to return 0, which according to JDBC specs means "unknown width".

5. This proposition deliberately ignores the question of how to obtain the correct value and whether it is possible to do so without changing client-server protocol etc. The idea is as follow: let's first agree what we would like GetColumnDisplaySize() to return in an ideal world where all is possible. Once we agree on that, let confront it with the reality of what information is present in the protocol and what we can actually compute. If some proposed semantics is impossible to implement at the moment, we will change it, understanding that this is a deviation from what we would like to have. We can also add long term plan to introduce changes which will enable us to implement the semantics we want.

String Types
------------
CHAR              1
CHAR(N)           N
VARCHAR(N)        N

BINARY            1
BINARY(N)         N
VARBINARY(N)      N

TINY{BLOB,TEXT}   0 or 255  (a,b)
*{BLOB,TEXT}      0         (a)

ENUM(s1,...,sK)   max(N1,..,NK)         (c)
SET(s1,...,sK)    (K-1) + N1 + ... + NK (c)

--
a) According to JDBC specs, value of 0 means "unknown width"
b) For TINYBLOB we can return 0 as for other *BLOB types or
   return 255 since this maximum width also makes sense
c) Where Ni is the length of the i-th string si.

Numeric Types
-------------
BIT(1)           4     (b'x')
BIT(N)         3+N     (b'x...x')

Note: for the following types 1 can be subtracted
from display width for UNSIGNED variants (no sign)

TINYINT          4     (sign + 3 digits)
SMALLINT         6     (sign + 5 digits)
MEDIUMINT        8     (sign + 7 digits)
INTEGER         11     (sign + 10 digits)
BIGINT          20     (sign + 19 digits)
*INT*(N)       1+N     (sign + N digits)

FLOAT            8     (sign + dot + 6 digits)  (d)
DOUBLE          19     (sign + dot + 17 digits) (e)
REAL            19 or 8                              (f)
FLOAT(N,0)     1+N     (sign + N digits)
FLOAT(N,M)     2+N     (sign + dot + N digits)
DECIMAL         11
DECIMAL(N)     1+N     (sign + N digits)
DECIMAL(N,0)   1+N     (sign + N digits)
DECIMAL(N,M)   2+N     (sign + dot + N digits)

NUMERIC*       -"-     (as DECIMAL)

--
d) 4 byte single-precision, IEEE number - see http://en.wikipedia.org/wiki/IEEE_754-2008
e) 8 byte double-precision IEEE number
f) Normally REAL=DOUBLE, if  REAL_AS_FLOAT SQL mode is set, REAL=FLOAT

Date and Time Types
-------------------

DATE           10   (YYYY-MM-DD)
TIME            8   (HH:MM:SS)
TIMESTAMP      19   (YYYY-MM-DD HH:MM:SS)
DATETIME       19   (-"-)
YEAR            4   (YYYY)
[3 Feb 2011 9:48] Tonci Grgin
Maybe specs for TIMESTAMP should include microseconds part too. After all, that's most wanted feature by our users.
[9 Apr 2019 18:50] Yanlong He
Hi
  Is there any update on this Bug? I try to update driver version to 8.0, however, it looks like bug still existed.