Bug #114470 returned data that does not match expected data length for column
Submitted: 25 Mar 2024 5:48 Modified: 18 Dec 2024 2:10
Reporter: Bruce Young Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / ODBC Severity:S2 (Serious)
Version:8.3.0 OS:Windows (Windows Server 2022 Datacenter)
Assigned to: MySQL Verification Team CPU Architecture:Any

[25 Mar 2024 5:48] Bruce Young
Description:
On the SQL Server I have 4 lines of SQL:
drop table if  exists sugar.CQU_APPLICATION_CSTM_TEMP ;
drop table if  exists sugar.CQU_APPLICATION_CSTM_OLD ;
SELECT * INTO sugar.CQU_APPLICATION_CSTM_TEMP from openquery(SUGARCRM,'select * from cqu_application_cstm limit 0') ;
INSERT INTO sugar.CQU_APPLICATION_CSTM_TEMP SELECT * from openquery(SUGARCRM,'select * from cqu_application_cstm') ;
Running these 4 lines, the first 3 succeed. The fourth line fails with:
Msg 7347, Level 16, State 1, Line 4
OLE DB provider 'MSDASQL' for linked server 'SUGARCRM' returned data that does not match expected data length for column '[MSDASQL].cqu_event_id_c'. The (maximum) expected data length is 72, while the returned data length is 68.

Completion time: 2024-03-25T15:37:46.9972243+10:00

Bogus error as 68 is smaller than 72, yet the error suggests its too big.

On the mysql side I see that that column is only 36 at maximum. Now 72 is fine as there can be differences.
mysql> select max(char_length(cqu_event_id_c)) from cqu_application_cstm;
+----------------------------------+
| max(char_length(cqu_event_id_c)) |
+----------------------------------+
| 36 |
+----------------------------------+
1 row in set (2.41 sec)

But how do I get the data across? Please note that this is the only table that fails. There are around 200 tables all correctly returning data.

Where the error appears: Windows Server - SQL Server.
Source database: Red Hat Enterprise Linux release 8.9 (Ootpa)
MySQL Version: mysql  Ver 8.0.36 for Linux on x86_64 (Source distribution) 

How to repeat:
This fails every time on this table without success.

Suggested fix:
I am at a loss as to how this has happened. I do need it to work though.
[11 Apr 2024 4:12] Bruce Young
This may be useful:
mysql> desc cqu_application_cstm;
+--------------------------+--------------+------+-----+---------+-------+
| Field                    | Type         | Null | Key | Default | Extra |
+--------------------------+--------------+------+-----+---------+-------+
| id_c                     | char(36)     | NO   | PRI | NULL    |       |
| area_of_study_c          | varchar(100) | YES  |     | NULL    |       |
| lifecycle_c              | varchar(100) | YES  |     | NULL    |       |
| product_c                | varchar(100) | YES  |     | NULL    |       |
| school_graduation_year_c | varchar(100) | YES  |     | NULL    |       |
| success_flag_c           | tinyint(1)   | YES  |     | 0       |       |
| accepted_as_of_date_c    | datetime     | YES  |     | NULL    |       |
| admitted_as_of_date_c    | datetime     | YES  |     | NULL    |       |
| applicant_as_of_date_c   | datetime     | YES  |     | NULL    |       |
| converted_as_of_date_c   | datetime     | YES  |     | NULL    |       |
| likely_intake_c          | varchar(100) | YES  |     | NULL    |       |
| offerred_as_of_date_c    | datetime     | YES  |     | NULL    |       |
| prospect_as_of_date_c    | datetime     | YES  |     | NULL    |       |
| cancelled_as_of_date_c   | datetime     | YES  |     | NULL    |       |
| cqu_event_id_c           | char(36)     | YES  |     | NULL    |       |
| inactive_as_of_date_c    | datetime     | YES  |     | NULL    |       |
| lead_phase_c             | varchar(100) | YES  |     | NULL    |       |
| lead_step_c              | varchar(100) | YES  |     | NULL    |       |
| source_c                 | varchar(100) | YES  |     | NULL    |       |
| prospect_status_c        | varchar(100) | YES  |     | NULL    |       |
| status_reason_c          | varchar(100) | YES  |     | NULL    |       |
| prospect_status_date_c   | datetime     | YES  |     | NULL    |       |
| marketo_check_status_c   | varchar(100) | YES  |     | NULL    |       |
| marketo_id_c             | varchar(255) | YES  |     | NULL    |       |
| ln_testing_date_c        | datetime     | YES  |     | NULL    |       |
| ln_result_1_c            | varchar(255) | YES  |     | NULL    |       |
| ln_result_2_c            | varchar(255) | YES  |     | NULL    |       |
| send_to_bksb_c           | tinyint(1)   | YES  |     | 0       |       |
| bksb_complete_c          | tinyint(1)   | YES  |     | 0       |       |
| steps_testing_sent_c     | date         | YES  |     | NULL    |       |
| steps_testing_return_c   | date         | YES  |     | NULL    |       |
| steps_interview_c        | date         | YES  |     | NULL    |       |
| steps_study_plan_c       | tinyint(1)   | YES  |     | 0       |       |
| steps_fast_track_c       | varchar(100) | YES  |     | NULL    |       |
| steps_computer_type_c    | varchar(100) | YES  |     | NULL    |       |
| send_to_bksb_date_c      | datetime     | YES  |     | NULL    |       |
| utm_source_c             | varchar(255) | YES  |     |         |       |
| utm_medium_c             | varchar(255) | YES  |     |         |       |
| utm_campaign_c           | varchar(255) | YES  |     |         |       |
| utm_content_c            | varchar(255) | YES  |     |         |       |
+--------------------------+--------------+------+-----+---------+-------+
40 rows in set (0.01 sec)
[22 Apr 2024 12:20] MySQL Verification Team
Hello Bruce Young,

Thank you for the bug report.
Discussed your issue with ODBC developer and concluded that length of the character is probably 2 bytes, thus the max length is 72 = 36*2
The column in question is this:
cqu_event_id_c           | char(36)
It cannot be a variable length because char is not varchar. Hence, the char(N) needs to be padded with the spaces.
The data is evidently shorter, so it does not match the exact expected length of 72.

There is an option to do the padding for situations like this:
1. Via driver GUI dialog the following option has to be checked: Details >> Cursors/Results >> [x] Pad CHAR to full length with space
2. Add PAD_SPACE=1 directly to the ODBC connection string

Let us know if it resolved the problem.

Regards,
Ashwini Patil
[29 Apr 2024 6:15] Bruce Young
Thanks for the update, but the variability issue of this data has been present for years, but the failure to extract only started this year.

Further to this I altered the table on the SQL server side to be varchar(255) and the error occurs unchanged.

Details of the variable data in the table:
mysql> select distinct char_length(cqu_event_id_c), count(char_length(cqu_event_id_c)) from cqu_application_cstm group by char_length(cqu_event_id_c);
+-----------------------------+------------------------------------+
| char_length(cqu_event_id_c) | count(char_length(cqu_event_id_c)) |
+-----------------------------+------------------------------------+
|                        NULL |                                  0 |
|                          36 |                             475014 |
|                           0 |                              10088 |
|                          23 |                                 59 |
|                          13 |                                108 |
|                          35 |                                  8 |
|                           5 |                                446 |
|                          27 |                                 67 |
|                           1 |                                  9 |
|                           3 |                                 98 |
|                          15 |                                 35 |
|                          33 |                                 15 |
|                          20 |                                133 |
|                          12 |                                 17 |
|                          19 |                                 29 |
|                           9 |                                  1 |
+-----------------------------+------------------------------------+
16 rows in set (0.76 sec)

I don't doubt there is a data issue here, but don't know how to find it with the provided error.
[23 May 2024 6:52] MySQL Verification Team
Hello Bruce Young,

Thank you for the details. However this is not enough to reproduce the issue at our end.
Can you please let us know if simply running this query causes the error (without INSERT INTO sugar.CQU_APPLICATION_CSTM_TEMP ...)?

SELECT * from openquery(SUGARCRM,'select * from cqu_application_cstm');

Also, please provide CREATE TABLE statement for `cqu_application_cstm`.
Does it fails with the empty table or a particular record should be present?

Regards,
Ashwini Patil
[24 Jun 2024 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[25 Jul 2024 4:27] Bruce Young
Apologies for the delay.

For the select statement without the insert it starts to work and displays rows. After about 55 seconds it fails with this error:
Msg 7347, Level 16, State 1, Line 1
OLE DB provider 'MSDASQL' for linked server 'SUGARCRM' returned data that does not match expected data length for column '[MSDASQL].cqu_event_id_c'. The (maximum) expected data length is 72, while the returned data length is 68.

Completion time: 2024-07-25T14:23:27.9333404+10:00
----------------------------------

It creates the table with correct structure without error. The table creation runs with zero rows returned without issue.

This still does not help me find the row or rows where the data is triggering this.
[25 Jul 2024 4:31] Bruce Young
The select returns over 70000 rows before the error occurs.

The source table in the Sugar MySQL database has 1422768 rows at this time.
[26 Jul 2024 12:38] MySQL Verification Team
Hello Bruce Young,

Thank you for the details.
Could you please provide create table statement? In order to continue we need the DDL for the table from MySQL Server (not for the linked table in SQL Server)

Regards,
Ashwini Patil
[30 Jul 2024 1:05] Bruce Young
from show create table cqu_application_cstm;

 CREATE TABLE `cqu_application_cstm` (
  `id_c` char(36) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
  `area_of_study_c` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `lifecycle_c` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `product_c` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `school_graduation_year_c` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `success_flag_c` tinyint(1) DEFAULT '0',
  `accepted_as_of_date_c` datetime DEFAULT NULL,
  `admitted_as_of_date_c` datetime DEFAULT NULL,
  `applicant_as_of_date_c` datetime DEFAULT NULL,
  `converted_as_of_date_c` datetime DEFAULT NULL,
  `likely_intake_c` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `offerred_as_of_date_c` datetime DEFAULT NULL,
  `prospect_as_of_date_c` datetime DEFAULT NULL,
  `cancelled_as_of_date_c` datetime DEFAULT NULL,
  `cqu_event_id_c` char(36) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `inactive_as_of_date_c` datetime DEFAULT NULL,
  `lead_phase_c` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `lead_step_c` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `source_c` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `prospect_status_c` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `status_reason_c` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `prospect_status_date_c` datetime DEFAULT NULL,
  `marketo_check_status_c` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `marketo_id_c` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `ln_testing_date_c` datetime DEFAULT NULL,
  `ln_result_1_c` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `ln_result_2_c` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `send_to_bksb_c` tinyint(1) DEFAULT '0',
  `bksb_complete_c` tinyint(1) DEFAULT '0',
  `steps_testing_sent_c` date DEFAULT NULL,
  `steps_testing_return_c` date DEFAULT NULL,
  `steps_interview_c` date DEFAULT NULL,
  `steps_study_plan_c` tinyint(1) DEFAULT '0',
  `steps_fast_track_c` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `steps_computer_type_c` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `send_to_bksb_date_c` datetime DEFAULT NULL,
  `utm_source_c` varchar(255) DEFAULT '',
  `utm_medium_c` varchar(255) DEFAULT '',
  `utm_campaign_c` varchar(255) DEFAULT '',
  `utm_content_c` varchar(255) DEFAULT '',
  PRIMARY KEY (`id_c`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=COMPRESSED
[5 Sep 2024 0:46] Bruce Young
Is there anything else I can provide to be of assistance here?
[10 Oct 2024 1:24] Bruce Young
I provided the additional required data at the end of July.
It is now October. Almost 2.5 months without any update.
If I can be of more assistance, happy to do that.
I just need a way forward to extract this data.
[12 Nov 2024 0:26] Bruce Young
I provided the last update requested at the end of July. It is now 3 and a half months since then. No updates, no requests for further details from me.
Status is still "Analyzing".

I need a solution. I am happy to provide any additional support to solving this issue.

What can I do to move this forward?
[12 Nov 2024 2:10] Bogdan Degtyariov
This is not a trivial issue to repeat.

We tried generating random 100,000 rows for the table cqu_application_cstm, but the queries worked:

(0 rows affected)
(104001 rows affected)
Completion time: 2024-11-02T13:04:35.2484591+11:00

Does the issue happen on the same row?
Can this row be identified and isolated?
If we know the exact data, which causes the issue it will help to reproduce and fix the problem.
[19 Nov 2024 1:28] Bruce Young
Does the issue happen on the same row?
I do not know. Probably but can't guarantee.

Can this row be identified and isolated?
Probably, but not a trivial task. Is there any available "RowID" type selecting that could help me find it?

If we know the exact data, which causes the issue it will help to reproduce and fix the problem.
Totally agree with this one, but finding it is the hard bit. Any assistance you could provide would be appreciated.
[19 Nov 2024 4:34] Bruce Young
ok, found a row that is causing the issue.

Have created a table with just that row and it also fails to select across the link.

I've exported that table and will attach the file next. Please try with that.
[19 Nov 2024 10:59] Bogdan Degtyariov
Hi Bruce,

Thank you for providing the row data.
I managed to repeat the problem with the Unicode version of the driver.

With ANSI it worked, which could be a temporary workaround for you:

drop table if  exists testing..CQU_APPLICATION_CSTM_TEMP ;
drop table if  exists testing..CQU_APPLICATION_CSTM_OLD ;
SELECT * INTO testing..CQU_APPLICATION_CSTM_TEMP from openquery(MYSQL_ODBC_9_1_0_ANSI,'select * from bruce_testing_two limit 0') ;
INSERT INTO testing..CQU_APPLICATION_CSTM_TEMP SELECT * from openquery(MYSQL_ODBC_9_1_0_ANSI,'select * from bruce_testing_two') ;

(0 rows affected)

(1 row affected)

Completion time: 2024-11-19T21:50:56.0358517+11:00

This bug is verified.
[27 Nov 2024 23:49] Bruce Young
Tested again from a new source SQL server.
Installed latest mySQL ODBC drivers (9.1).
In this case it does not work with either Unicode or ANSI driver versions.

I then tested with 8.3.

Neither Unicode or ANSI worked.

In my case neither 8.3 drivers and neither 9.1 drivers work.

In your case I accept that your testing showed the ANSI driver as an option, but my validation was not successful.

The only good news was that I tested the latest 9.1 releases of the drivers and found they were also failing in this specific case.
[5 Dec 2024 11:03] Bogdan Degtyariov
Posted by developer:
 
The problem occurs in fixed-length columns such as CHAR(N) where the string data
is padded with spaces if its length is less than N.

Also, in case of MS SQL the issue occurs when text data contains characters
that are coded with more than one byte in UTF8.

For instance, the string in the column `cqu_event_id_c` contains such
character, which looks like a minus ('–' vs '-') but it is a long dash.
Replacing it by a minus makes the script work.

What happens is this: the connector gets data with the length of N bytes.
All characters except one are encoded in UTF8 as single byte.
The multi-byte character is coded in UTF8 as two bytes.
The driver converts single byte UTF8 characters to wide (2 or 4-byte)
characters. The multi-byte character is also converted to the same size
wide character. Hence from N bytes in the data there are N-1 wide characters.
The driver reports the number of written bytes as 
(N-1) * SIZEOF(UTF16_CHARACTER) instead of N * SIZEOF(UTF16_CHARACTER).

This reported number is inconsistent with the result given by SQLDescribeCol().
Hence, MS SQL Server reports the error.
[10 Dec 2024 0:13] Bruce Young
I have confirmed that replacing the "En dash" with a normal dash/hyphen has resolved the issue. I note that there are both "En dash" and "Em dash" which would likely cause an issue in this context. I note that Microsoft Word does auto extend a normal hyphen into a longer dash and when copied into Sugar CRM this is what has caused this issue.

I'll be enabling a check on this table going forward as this issue may present again into the future.

Thanks for your assistance in finding this issue and determining a way forward.
[13 Dec 2024 3:54] Bogdan Degtyariov
Posted by developer:
 
The bug is fixed by setting the session SQL_MODE on the server to PAD_CHAR_TO_FULL_LENGTH and enable the server to pad CHAR columns data to their full length.
Driver is setting is automatically at the connection stage when PAD_SPACE connection option is enabled for the driver.
For this solution to work the client must not clear SQL_MODE of PAD_CHAR_TO_FULL_LENGTH value when changing it manually.

NOTE: When changing SQL_MODE it is best to use the 
      approach when new SQL_MODE setting is appended to 
      the existing one:
      
      SET sql_mode=concat(@@sql_mode, '..New Mode..')
      
      or if one of other modes needs to be cleared the client
      has to ensure that the new SQL_MODE value still contains
      PAD_CHAR_TO_FULL_LENGTH
      
ODBC Specification does not have SET and ENUM types that exist in MySQL Server. For this reason the MySQL ODBC driver reports these types as CHAR by returning SQL_CHAR (SQL_WCHAR for Unicode) type IDs. This means that the padding with spaces can also be done for SET/ENUM. However, because the server does not pad these types the ODBC driver does the padding instead.

NOTE: Padding of SET/ENUM will still have issues with multibyte characters. This is expected and will not be fixed in this patch.

git revision hash: 46ae3379e858a968742a531592e8fc39207543a1
[18 Dec 2024 2:10] Philip Olson
Posted by developer:
 
Fixed as of the upcoming MySQL Connector/ODBC 9.2.0 release, and here's the proposed changelog entry from the documentation team:

For fixed column types, such as CHAR, having these fields contain unexpected
multi-byte characters could cause a "returned data that does not match
expected data length for column" error.

The connector now enables the PAD_CHAR_TO_FULL_LENGTH sql_mode.

Thank you for the bug report.