Bug #69630 MYSQL_FIELD structure has incorrect information.
Submitted: 1 Jul 2013 15:15 Modified: 22 Jul 2013 17:48
Reporter: Roland Bouman Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S1 (Critical)
Version:5.6.10 OS:Any
Assigned to: Paul DuBois CPU Architecture:Any

[1 Jul 2013 15:15] Roland Bouman
Description:
I am on MySQL 5.6.10 using the MySQL C API.

I have this:

set names utf8;

mysql> show variables like '%char%';
+--------------------------+------------------------------------------------------------------------+
| Variable_name            | Value                                                                  |
+--------------------------+------------------------------------------------------------------------+
| character_set_client     | utf8                                                                   |
| character_set_connection | utf8                                                                   |
| character_set_database   | latin1                                                                 |
| character_set_filesystem | binary                                                                 |
| character_set_results    | utf8                                                                   |
| character_set_server     | latin1                                                                 |
| character_set_system     | utf8                                                                   |
| character_sets_dir       | /home/rbouman/mysql/mysql-5.6.10-linux-glibc2.5-x86_64/share/charsets/ |
+--------------------------+------------------------------------------------------------------------+

create table charset(
  t1 varchar(5) charset ascii,
  t2 varchar(10) charset utf8
);

mysql> select * from information_Schema.columns where table_schema = schema() and table_name = 'charset' \G
*************************** 1. row ***************************
           TABLE_CATALOG: def
            TABLE_SCHEMA: test
              TABLE_NAME: charset
             COLUMN_NAME: t1
        ORDINAL_POSITION: 1
          COLUMN_DEFAULT: NULL
             IS_NULLABLE: YES
               DATA_TYPE: varchar
CHARACTER_MAXIMUM_LENGTH: 5
  CHARACTER_OCTET_LENGTH: 5
       NUMERIC_PRECISION: NULL
           NUMERIC_SCALE: NULL
      DATETIME_PRECISION: NULL
      CHARACTER_SET_NAME: ascii
          COLLATION_NAME: ascii_general_ci
             COLUMN_TYPE: varchar(5)
              COLUMN_KEY: 
                   EXTRA: 
              PRIVILEGES: select,insert,update,references
          COLUMN_COMMENT: 
*************************** 2. row ***************************
           TABLE_CATALOG: def
            TABLE_SCHEMA: test
              TABLE_NAME: charset
             COLUMN_NAME: t2
        ORDINAL_POSITION: 2
          COLUMN_DEFAULT: NULL
             IS_NULLABLE: YES
               DATA_TYPE: varchar
CHARACTER_MAXIMUM_LENGTH: 10
  CHARACTER_OCTET_LENGTH: 30
       NUMERIC_PRECISION: NULL
           NUMERIC_SCALE: NULL
      DATETIME_PRECISION: NULL
      CHARACTER_SET_NAME: utf8
          COLLATION_NAME: utf8_general_ci
             COLUMN_TYPE: varchar(10)
              COLUMN_KEY: 
                   EXTRA: 
              PRIVILEGES: select,insert,update,references
          COLUMN_COMMENT: 
2 rows in set (0.00 sec)

Now I use the C API to do this query:

SELECT * FROM charset;

I am using mysql_store_result. Then when I use mysql_fetch_field, this is the relevant info I get back:

 {
  "name": "t1",
  "type": "varstring",
  "charsetnr": 8,
  "org_name": "t1",
  "table": "charset",
  "org_table": "charset",
  "schema": "test",
  "decimals": 0,
  "length": 5,
  "max_length": 0,
  "maybe_null": true,
  "primary_key": false,
  "unique_key": false,
  "multiple_keys": false,
  "unsigned": false,
  "zerofill": false,
  "binary": false,
  "auto_increment": false,
  "enum": false,
  "set": false,
  "numeric": false
 },
 {
  "name": "t2",
  "type": "varstring",
  "charsetnr": 8,
  "org_name": "t2",
  "table": "charset",
  "org_table": "charset",
  "schema": "test",
  "decimals": 0,
  "length": 10,
  "max_length": 0,
  "maybe_null": true,
  "primary_key": false,
  "unique_key": false,
  "multiple_keys": false,
  "unsigned": false,
  "zerofill": false,
  "binary": false,
  "auto_increment": false,
  "enum": false,
  "set": false,
  "numeric": false
 }

(please don't mind the JSON representation)

The length field should give us the "display length". I am assuming this is equal to the character length (at least the numbers 5 and 10 are compatible with the declared string length in the DDL.

However, the charsetnr is 8 in both cases. 8 turns out to be the collation id for latin1.

This is incorrect. My columns are utf8 and ascii - there is no collation with id 8 that corresponds to these character sets. The charsetnrs should have been 11 (ascii_general_ci) and 33 (utf8_general_ci) respectively.

In addition, the max_length is 0 in both cases. This is not correct. According to the documentation, the maximum number of bytes should be given when mysql_store_result is used. The expected numbers should have been 5 (5 ascii chars * 1byte octet length) and 30 (10 utf8 characters * 3 bytes octet length) respectively

How to repeat:
Please see attached example. 

Suggested fix:
Please make sure the fields return the right metadata.
[1 Jul 2013 15:27] Peter Laursen
So it seems it returning length or server or database charset default for string columns even though they use another charset.

Good catch!
[1 Jul 2013 15:54] Roland Bouman
Simple test C program source.

Attachment: test_mysql_c_api.c (text/x-csrc), 2.66 KiB.

[1 Jul 2013 15:55] Roland Bouman
Ok - it turns out my standalone sample programs gives different output, but the problems are still evident. Output:

Field:
------
c1
length: 15
max_length: 0
charsetnr: 33
Field:
------
c2
length: 30
max_length: 0
charsetnr: 33

Expected output:

Field:
------
c1
length: 5
max_length: 5
charsetnr: 11
Field:
------
c2
length: 10
max_length: 30
charsetnr: 33
[2 Jul 2013 5:26] Davi Arnaut
The MYSQL_FIELD structure describes a field in the result set. Hence, the charset will be the one specified in @@character_set_results if not null, otherwise the original (column) charset.

As for max_length, the documentation clearly states: "The maximum width of the field for the result set (the length in bytes of the longest field value for the rows actually in the result set)". Hence, if the result set is empty, it shall be zero.
[2 Jul 2013 7:12] Roland Bouman
Davi, thanks! 

Thanks for pointing out the issue with max_length. You're right; silly of me not to catch that. I verified that the correct lengths are returned when there is data.

Regarding the charset, I didn't quite understand your remark "the charset will be the one specified in @@character_set_results if not null, otherwise the original (column) charset".

Do you mean the collation id declared in the table DDL is returned when @@character_set_results is set to NULL? If not, under what condition will the field structure return the collation id declared in the table DDL?
[2 Jul 2013 7:15] Roland Bouman
It seems to me that if the charsetnr really reflects the setting of a server or session variable, it is both useless and wasteful. Useless since it does not provide information about the field itself, and wasteful because of redundantly sending a value that could much more efficiently be obtained with a single specific request.
[2 Jul 2013 15:01] Davi Arnaut
I meant that the original column collation ID is returned if there is no conversion of the result set. If conversion is not wanted, set character_set_results to NULL. For example (using mysql --column-type-info ):

mysql> CREATE TABLE t1 (c1 VARCHAR(30) CHARACTER SET ucs2);
Query OK, 0 rows affected (0.02 sec)

mysql> SELECT * FROM t1;
Field   1:  `c1`
Catalog:    `def`
Database:   `test`
Table:      `t1`
Org_table:  `t1`
Type:       VAR_STRING
Collation:  utf8_general_ci (33)
Length:     90
Max_length: 0
Decimals:   0
Flags:      

mysql> SET @@character_set_results = NULL;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM t1;
Field   1:  `c1`
Catalog:    `def`
Database:   `test`
Table:      `t1`
Org_table:  `t1`
Type:       VAR_STRING
Collation:  ucs2_general_ci (35)
Length:     60
Max_length: 0
Decimals:   0
Flags:
[9 Jul 2013 13:41] Georgi Kodinov
How nice of Davi to actually provide the explanation(s) !
All I need to do now is close the bug.
I don't really see anything that can be added to the documentation either. Feel free to re-open as a docs bug together with your motives if you disagree.
[9 Jul 2013 14:47] Roland Bouman
Georgi, 

thanks! I'm re-opening as documentation bug, since the documentation about MYSQL_FIELD.charsetnr currently does not include any of the information mentioned by Davi. 

The description reads:

"An ID number that indicates the character set/collation pair for the field."

I think that in the light of the info provided by Davi, this is not correct. Or at least, it is incomplete since it the actual value returned is a matter of both the declared collation at the column level and the current setting of character_set_results.
[9 Jul 2013 14:48] Roland Bouman
Forgot to re-open
[9 Jul 2013 14:50] Peter Laursen
And I have the concern that (maybe?) the ID for a specific collation is not always the same with different server versions.  I hope it is not the case!
[17 Jul 2013 18:23] Sveta Smirnova
Thank you, Roland!

Verified as documentation bug.
[22 Jul 2013 17:48] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

Text to be added to http://dev.mysql.com/doc/refman/5.6/en/c-api-data-structures.html
(and other versions):

Normally, character values in result sets are converted to the
character set indicated by the character_set_results system variable.
In this case, charsetnr corresponds to the character set indicated by
that variable. Character set conversion can be suppressed by setting
character_set_results to NULL. In this case, charsetnr corresponds to
the character set of the original table column or expression. See
also http://dev.mysql.com/doc/refman/5.6/en/charset-connection.html.