Bug #117608 Connector/C 9.1 reports wrong sizes for TEXT columns
Submitted: 2 Mar 17:10 Modified: 23 Mar 7:13
Reporter: Sruli Ganor Email Updates:
Status: Verified Impact on me:
None 
Category:Connector / C Severity:S1 (Critical)
Version:9.1 OS:Any
Assigned to: CPU Architecture:Any

[2 Mar 17:10] Sruli Ganor
Description:
We read MySQL as raw data without charset conversions.
I.e. we set character_set_client=binary, character_set_connection=binary, character_set_results=NULL.
We do the conversion ourselves, so we must know the columns' metadata including column and display sizes.

In version 9.1, the Connector/C reports wrong sizes for these types if the database's charset is utf8mb4:
TINYTEXT:   Column Size: 63,      Display Size: 255
MEDIUMTEXT: Column Size: 4194303, Display Size: 16777215
LONGTEXT:   Column Size: 4GB,     Display Size: 4GB

These sizes are wrong and incompatible with the metadata returned by connector 8.0 for the same database:
TINYTEXT:   Column Size: 255,      Display Size: 63
MEDIUMTEXT: Column Size: 16777215, Display Size: 4194303
LONGTEXT:   Column Size: 4GB,      Display Size: 1GB

This seems to be a bug and it critical for us.
Thanks
 

How to repeat:
We have a test program that reproduces the issue.
I'll be glad to provide it uopn request.

In summary, the programs connecst to a MySQL utf8mb4 database, sets binary mode, and gets the metadata of the various TEXT columns as follows:

SQLDescribeColW(hStmt, colIndex, colName,..., &dataType, &columnSize, ...); 
SQLColAttributeW(hStmt, colIndex, SQL_DESC_DISPLAY_SIZE, ..., &displaySize);

Then it prints columnSize and displaySize.

When running the same program on the same database, while only changing the driver version, the printed sizes are different.

Suggested fix:
Make the size compatible with previous connector version.
[4 Mar 10:41] Sruli Ganor
The issue occurs in both Windows and Linux.
[4 Mar 10:47] Sruli Ganor
We tried connector 9.2 and got the same results.
[6 Mar 16:59] MySQL Verification Team
Hi,

I am checking if this is improperly documented incompatible change or a bug.

If you can share the test source code it will speed things up, thanks.
[23 Mar 7:13] Sruli Ganor
Hi, could you please update what is the status of this issue? It's still blocking us from using versions 9.x of the Connector. On the other hand, we can't continue using the earlier versions due to their known vulnerabilities.
Thanks
[23 Mar 10:45] MySQL Verification Team
Behavior is verified but connector team has not decided how to go about it as these changes are related to standardization and this change was intentional. The connector team will update the report when they have the decision. Nothing about it will happen quickly.
[24 Mar 10:35] Rafal Somla
Posted by developer:
 
We agree there is a regression in column size reporting. This is most likely due to a recent change in how the driver handles translation of character data to the connection charset. We changed it to do the conversions in the server rather than by the driver but, apparently, it has these unwanted side effects. We will look into how to fix this.

Note that connection charset should be configured using CHARSET connection option, not by setting character_set_xxx session variables (or by SET NAMES). Changing these session variables can produce inconsistent results.

Note also that connection charset can be configured only for the ANSI variant of the driver. For Unicode variant the connection charset is always UTF8 and can not be modified (the driver converts it to UTF16 for wide character strings as needed). If you want to receive raw character data without any conversions you must use ANSI driver with CHARSET=binary option and bind your buffers as binary.
[24 Mar 10:38] Rafal Somla
Posted by developer:
 
Note: We also believe that if you follow the advice and use the ANSI driver with CHARSET=binary then the reported column sizes should be more as you expect. So this could be a workaround for you.