Bug #118752 | Connector/C 9.x ANSI reports wrong data types | ||
---|---|---|---|
Submitted: | 30 Jul 11:21 | Modified: | 30 Jul 20:50 |
Reporter: | Sruli Ganor | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | Connector / C | Severity: | S1 (Critical) |
Version: | 9.4 | OS: | Windows |
Assigned to: | CPU Architecture: | Any |
[30 Jul 11:21]
Sruli Ganor
[30 Jul 12:25]
MySQL Verification Team
Hello Sruli Ganor, Thank you for the bug report. Regards, Ashwini Patil
[30 Jul 15:42]
Rafal Somla
Posted by developer: We are in the process of a more systematic review of meta-data reported by our driver. Part of this effort is answering questions like "what ODBC type should the driver report for table column of type X?". Once we have that settled consistently across all types and possible scenarios (like Unicode/ANSI) we'll be ready to fix any issues present in the current implementation. Without settling on any particular requirements yet, I am not sure if the expectation that for given table column ANSI driver should report the same meta-data as Unicode driver is the right one. If 8.0 does that, it might be a bug, not a feature. In your particular example of VARCHAR column, for Unicode driver for which we fix the encoding to UTF-8 (and user should not change that) the reported ODBC type should be SQL_WVARCHAR. But for ANSI driver user can set the connection encoding and in your example you do that, setting charset=BINARY. This has the effect of instructing the driver to send all data "as is" without interpreting it as text (a). Consequently, driver reports text columns as SQL_VARBINARY which seems to be right, because the driver is going to present raw byte blobs for these columns (b). For different settings of `charset` option the reported type might be different -- for example if something like `charset=latin1` is used the reported type might be SQL_VARCHAR (again different from SQL_WVARCHAR reported by Unicode driver). In short, I doubt that it is correct to always report the same type disregarding things like Unicode/ANSI distinction and different possible `charset` settings. However, we keep this bug open because we need to re-asses it once we have the meta-data expectations fully clarified. Note (a): With the understanding that application will do any data transformations, such as charset conversions, if needed. Note (b): Assuming that default UTF-8 charset is used on the server, these byte blobs will happen to be UTF-8 strings, but for the driver they are still just binary blobs because of charset=BINARY setting.
[30 Jul 20:50]
Sruli Ganor
Hello Rafal, Thanks for your quick response. Unfortunately, it's quite disappointing. First, we, a leading software vendor with thousands of large customers around the world, rely that we use DB drivers that are stable, consistent and backwards compatible. However, we have found that connector version 9 behaves differently than 8 in various aspects, like removing the support of charset=binary in the Unicode version and other metadata changes described in bug 117608. Plus the changes you made to ANSI data types without notice or proper documentation. Such changes may look trivial to you but have severe impacts on us. Second, I can't agree that the connector reports data types as it likes. According to the SQL specs, SQLDescribeCol should report how the column in defined in DDL, not as the driver chooses to handle it. Here is one example why it's critical for us: our system reads many MySQL tables as raw (binary) data. It then processes each column data according to its type, char set etc. Now, if the connector tells me that a column is VARBINARY although it's defined in the DDL as VARCHAR, I'll not process it as a character string and the result will be garbage. Therefore, we must know exactly what the real type of each column is. We could of course use INFORMATION_SCHEMA.COLUMNS to get the real data type. But this will require us to make major and risky changes to our system. I don't think we should do this effort just because you decided to change the connector's behavior without considering existing systems working in production. Please also note that we are obliged to work with the ANSI driver because the changes in the Unicode connector, made it unusable for us. Now we can’t use either of them. Hope you understand my point.
[31 Jul 12:38]
Rafal Somla
Posted by developer: Hi Sruli, I think I agree with most of what you are saying. Yes, it is important to keep the driver stable -- even if previous version reported meta-data that is not entirely correct we should be careful when changing the existing behavior. The observed changes in ANSI driver behavior compared to version 8.0 were mostly side effects of other changes that went under our radar. We are aware of the situation and we currently work on reviewing all the meta-data reported by our driver to ensure that it is consistent across all types and driver configurations. You are probably right that information about data source objects, such as table column types should be the same regardless of Unicode/ANSI driver type (and in principle follow the data from the information schema). I said "probably" because in the end the driver decides how it presents given data source to the user (via ODBC APIs) and there might be good reasons for doing some tweaks here and there. But the principle is as you described. However, there are subtleties that you might be missing. For example `SQLDescribeCol()` is a function that returns information about result set columns, not about table columns (for the latter the `SQLColumns()` API is used). The effects of `charset` setting I was talking about earlier apply mainly to the result set meta-data (as given by `SQLDescribeCol()`) rather than the table column meta-data (as given by `SQLColumns()`). You should not assume that result set type information is the same as for the corresponding table column because things like connection charset can influence the former. In particular if you set `charset=BINARY` you are basically saying "give me strings as raw byte blobs ignoring any encoding information" and then reporting the result set column as VARBINARY looks like an appropriate thing to do in that situation (but as I said we will be reviewing these things). It is also worth noting that even though you could set `charset=X` in version 8.0 of the Unicode driver this setting had no visible effect because the driver was still presenting all string data as wide strings (UTF-16). Setting `charset=BINARY` could have any visible effect only for ANSI driver (this is the main reason why we decided to remove support for that option for Unicode driver, to avoid that confusion). As I said we are working now on extensive review of the meta-data information returned by our driver. Definitely we should not report wrong information, like the wrong lengths observed in bug#117608 (which was the main reason why we talk about ANSI driver and `charset=BINARY` option). Eventually these things should be cleaned up -- we just need some time to do it consistently across all types and ensure we catch such problems in the future.