Bug #100401 MYSQL_FIELD length different when MySQL client and server version do not match
Submitted: 31 Jul 2020 13:59 Modified: 13 Aug 2020 9:29
Reporter: Sebastien FLAESCH Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: describe length

[31 Jul 2020 13:59] Sebastien FLAESCH
Description:
When connecting with a MySQL client 8.0 to a MySQL server 5.7 or 5.6, the length attribute returned in the MYSQL_FIELD column description structure returns a number of characters instead of a number of bytes.

With a UTF-8 database, MySQL client 8.0 + server 8.0, the length attribute for a VARCHAR(25) column is 100 (25 * 4 where 4 is the mbmaxlen)

But with MySQL client 8.0 + server 5.7.19 or 5.6.16, the length is 25 ...

As long as the client and server match, the length is ok (100) ...

I assume this is a bug.

How to repeat:
Take an existing C program in a MySQL 5.7 environment (client and server), that executs a SELECT returning values from a VARCHAR(n) column, and display the MYSQL_FIELD.length value returned by mysql_fetch_field() for that column.

Now change the client environment to 8.0, relink the client program and still connect to the 5.7 server, and check the new length...

Suggested fix:
The length attribute should return the same value (using the same unit = bytes) no matter the combination of client/server versions.
[31 Jul 2020 14:25] MySQL Verification Team
Hi Mr. Flaesch,

Thank you for your report.

However, what you describe is fully documented behaviour for our client / server protocol.

You can find it in the Reference Manual and in the Release Notes.
[31 Jul 2020 14:38] Sebastien FLAESCH
Thanks for the quick answer, but in such case, could you provide a link where I can find this information?

I am also missing the documentation page that give all details about the MYSQL_FIELD structure...

I mean, the real details not just that "length" returns contains the length ...
[31 Jul 2020 14:52] Sebastien FLAESCH
Do you confirm that the MYSQL_FIELD.length attribute uses a different unit (byte or char), when the MySQL client is 8.0 and the server is 5.7 or 5.6?

That's weird to me (without any details from a clear documentation)...

I really wonder, because when connecting a MySQL client 5.7 to a 5.7 server, I get a number of bytes (100)

Please reopen this bug until it's clarified / identified withing the doc by a link...
[3 Aug 2020 12:53] MySQL Verification Team
Hi Mr. Flaesch,

There is a possibility that this particular detail is not mentioned in our Reference Manual.

For the start , you can read the following chapters:

https://dev.mysql.com/doc/c-api/5.7/en/c-api-data-structures.html
https://dev.mysql.com/doc/c-api/5.7/en/mysql-fetch-field.html

and the same chapters in the 8.0 manual.

Also, on our Documentation page, there is "Internals manual", so you can look at it as well.

We have couple of people looking at it right now. If it is not documented, we shall definitely verify this report as a documentation bug.
[3 Aug 2020 13:19] Sebastien FLAESCH
Hello,

I knew the doc pages you provided about API structures and mysql_fetch_field() function.

I will try to find the information I am looking for in the "Internals manual"... if this is public I would have appreciated a link to these pages too ;-)

Anyway, I appreciate that your team takes this into account.

Seb
[3 Aug 2020 14:03] MySQL Verification Team
Hi Mr. Flaesch,

Internals manual is available publicly, here:

https://dev.mysql.com/doc/internals/en/

Also, please check both previous links on 8.0 as well.
[4 Aug 2020 12:48] Sebastien FLAESCH
I finally found some more details about the MYSQL_FIELD.length attribute in the 8.0 user doc, which says:

----
unsigned long length

The width of the field. This corresponds to the display length, in bytes.

The server determines the length value before it generates the result set, so this is the minimum length required for a data type capable of holding the largest possible value from the result column, without knowing in advance the actual values that will be produced by the query for the result set.
----

So the unit of this length is clearly in bytes, but there is nothing about receiving different sizes or using different units, when the client and server versions do not match.

I am still convinced that there is some bug.

Seb
[4 Aug 2020 12:55] MySQL Verification Team
Hi Mr. Flaesch,

Yes, there is definitely a bug. It is in the missing documentation.

Verified as a documentation bug.
[4 Aug 2020 13:24] Sebastien FLAESCH
If this is a documentation bug instead of a software bug, then it means that the client app has to implement code to adapt the allocated memory according to the version of the server...?

I am not sure that this is the purpose of an API: It should be as simple as possible to write client code, without exceptions...

But maybe I am missing something...

I am looking forward to see how this will be documented.

In the meantime, if you have some detailed information, I would appreciate that you post this here please.

Seb
[6 Aug 2020 16:37] Paul DuBois
The bug report identifies the locus of the problem as a mismatch between client and server version. The observed behavior is really an issue of whether the current connection charset is single-byte or multi-byte. You can in fact observe the difference in results within a single session. Run the mysql client with the --column-type-info and try this:

mysql> SET NAMES latin1; SELECT 'a';
Query OK, 0 rows affected (0.00 sec)

Field   1:  `a`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  latin1_swedish_ci (8)
Length:     1
Max_length: 1
Decimals:   31
Flags:      NOT_NULL

+---+
| a |
+---+
| a |
+---+
1 row in set (0.00 sec)

mysql> SET NAMES utf8mb4; SELECT 'a';
Query OK, 0 rows affected (0.00 sec)

Field   1:  `a`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  utf8mb4_general_ci (45)
Length:     4
Max_length: 1
Decimals:   31
Flags:      NOT_NULL

+---+
| a |
+---+
| a |
+---+
1 row in set (0.00 sec)

You'll get a similar result in 5.5, 5.6, 5.7, and 8.0. (And even earlier if you use utf8 rather than utf8mb4, though the length will be 3 rather than 4.)

Conclusion: The length value is the number of *bytes* to represent the result. As the docs say.

The bug report says:

"
When connecting with a MySQL client 8.0 to a MySQL server 5.7 or 5.6, the
length attribute returned in the MYSQL_FIELD column description structure
returns a number of characters instead of a number of bytes.
"

No, it doesn't. It returns bytes in all cases.

The bug report further says:

"
The length attribute should return the same value (using the same unit =
bytes) no matter the combination of client/server versions.
"

In fact, that's what it does.
[6 Aug 2020 17:17] Sebastien FLAESCH
Hello Paul,

Seems I have overlooked the fact that my database in older MySQL versions uses latin1 charset, while the V 8.0 database is using utf8mb4 ...

mysql> show variables like "character_set_database";
+------------------------+--------+
| Variable_name          | Value  |
+------------------------+--------+
| character_set_database | latin1 |
+------------------------+--------+
1 row in set (0.01 sec)

I will re-check all of this, sorry for the noise.

Thanks!
Seb
[7 Aug 2020 10:12] Paul DuBois
Posted by developer:
 
I modified https://dev.mysql.com/doc/c-api/8.0/en/c-api-data-structures.html
to mention that mysql --column-type-info can be used to check MYSQL_FIELD
values, and to add this to the length member description:

For string columns, the length value varies on the connection character set. For example, if the character set is latin1, a single-byte character set, the length value for a SELECT 'abc' query is 3. If the character set is utf8mb4, a multibyte character set in which characters take up to 4 bytes, the length value is 12.
[7 Aug 2020 12:44] Sebastien FLAESCH
Hi,

I think I have been confused by the client charset vs database charset...

In fact:

  show variables like "character_set_database";

Always returns "utf8mb4", no matter the 8.0/5.7 Client/Server combination.

But check this out and tell me if it's expected:

My ~/.my.cnf contains:

[client]
default-character-set="utf8mb4"

No matter if I set LC_ALL to POSIX or en_US.utf8
(I guess MySQL client does not read C locale settings)

1) Connecting with MySQL client 5.7 to my MySQL 5.7 server:

show variables like "character_set_client";
... returns "utf8mb4"

2) Connecting with MySQL client 8.0 to my MySQL 8.0 server:

show variables like "character_set_client";
... returns "utf8mb4"

All good to me...

3) Connecting with MySQL client 8.0 to my MySQL 5.7 server:

show variables like "character_set_client";
... returns "latin1"

Not expected...

Could you just try to reproduce?

Seb
[7 Aug 2020 12:46] Sebastien FLAESCH
With column info (using MySQL client 8.0):

sf@toro:/opt3/dbs/mys$ mysql test1 --column-type-info --protocol=tcp --host=localhost --port=3308 -u mysuser -p
Enter password: 
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 26
Server version: 5.7.19 MySQL Community Server (GPL)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select 'abc';
Field   1:  `abc`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  latin1_swedish_ci (8)
Length:     3
Max_length: 3
Decimals:   31
Flags:      NOT_NULL 

+-----+
| abc |
+-----+
| abc |
+-----+
1 row in set (0.00 sec)

mysql> show variables like "character_set_client";
Field   1:  `Variable_name`
Catalog:    `def`
Database:   ``
Table:      `session_variables`
Org_table:  `session_variables`
Type:       VAR_STRING
Collation:  latin1_swedish_ci (8)
Length:     64
Max_length: 20
Decimals:   0
Flags:      NOT_NULL NO_DEFAULT_VALUE 

Field   2:  `Value`
Catalog:    `def`
Database:   ``
Table:      `session_variables`
Org_table:  `session_variables`
Type:       VAR_STRING
Collation:  latin1_swedish_ci (8)
Length:     1024
Max_length: 6
Decimals:   0
Flags:      

+----------------------+--------+
| Variable_name        | Value  |
+----------------------+--------+
| character_set_client | latin1 |
+----------------------+--------+
1 row in set (0.01 sec)
[7 Aug 2020 13:00] Sebastien FLAESCH
I have checked what mysql_get_character_set_info() returns in my C program,
with MySQL Client 8.0 connecting to MySQL server 5.7:

In my MY_CHARSET_INFO structure:

- name contains "utf8mb4_0900_ai_ci"
- mbmaxlen contains 4

So far so good, but when executing:

 show variables like 'character_set_client'

from my program, it returns 'latin1' (like with previous mysql command line test).

And as reported before, mysql_fetch_field() sets the length attribute of MYSQL_FIELD to 3 (bytes), when executing SELECT 'abc' ...

Something is not consistent here... do you agree?

Can you try to repro?

Seb
[13 Aug 2020 9:18] Sebastien FLAESCH
Since this ticket has been treated as a documentation bug and is not close, and I got no answer to my last comments, I have created a new bug report:

https://bugs.mysql.com/bug.php?id=100511
[13 Aug 2020 9:29] Sebastien FLAESCH
Sorry typo in my last comment

... and is not close, ...

should read:

... and is now closed ...

Pffffff.... I wish I could edit comments.

.... and spend less time trying to convince MySQL people that something is wrong.

PLEASE CHECK 100511, TRY TO REPRODUCE, AND TELL ME IF THERE IS A BUG YES OR NO.
[13 Aug 2020 12:29] MySQL Verification Team
This report has been analysed by the experts that are working on the code and it was concluded that it is not a code bug, but documentation bug.

Hence, since documentation is updated, this bug is closed.