Bug #57709 | c-api-data-structures information now incomplete | ||
---|---|---|---|
Submitted: | 25 Oct 2010 10:59 | Modified: | 9 Aug 2013 13:53 |
Reporter: | Peter Laursen (Basic Quality Contributor) | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server: C API (client library) | Severity: | S3 (Non-critical) |
Version: | 4.1+ | OS: | Any |
Assigned to: | Georgi Kodinov | CPU Architecture: | Any |
Tags: | qc |
[25 Oct 2010 10:59]
Peter Laursen
[25 Oct 2010 10:59]
Peter Laursen
Fixed typo in synopsis.
[25 Oct 2010 11:04]
Peter Laursen
BTW: Please clarify ASAP: clarify how is this "the largest possible value from the result column" evaluated? Using charset used for storage, character_set_connection or character_set_client (or a combination of those)?
[5 Nov 2010 19:30]
MySQL Verification Team
this reminded me of not a bug 42488 I filed for prepared statements...
[5 Nov 2010 20:32]
Peter Laursen
I am mystified! I got a notification "Updated by: Shane Bester" but I do not see any new content added. What happened? If Shane added some internal consideration I cannot see, I should not have notification (but I think it would be more correct to let the original reporter see all communication) - but this is a hypothesis only of course!
[6 Nov 2010 13:33]
Peter Laursen
I think simply one 'piece of information' is missing for every (var)char column in the result set, namely 'number of bytes used to represent a character'. It is non-trivial (and may be slow) to send additional queries to the server when the result set is populated from many tables. (ie. "SET NAMES latin1;" and read data stored as utf8. Will a char(10) report 10 or 30 as length for the coulumn (answer is '30')? Next read data stored as latin1. Same question (answer is '10'). Result set and client has no information about the "max. possible byte-width of storage" - only the server has. Client will have to assume (var)char(30) utf8-case - what is wrong.) I suggest you simply add this 'missing piece of information' to the data structure. It will be '1' then always for non-(var)char columns and for all single-byte charset with (var)char as well. But more important right now: could somebody please tell the logic that the server uses now? Does it simply set byte_length = 3*character_length for data stored as utf8? It looks like it - but character_set_connection and/or character_set client may be 'narrower' - so the client cannot handle it properly without information from the server about 'number of bytes used to represent a character' (at least I cannot see any way to do).
[12 Nov 2010 19:25]
Sveta Smirnova
Thank you for the report. "clarify how is this "the largest possible value from the result column" evaluated? Using charset used for storage, character_set_connection or character_set_client (or a combination of those)? In my understanding it should be character_set_client." verified as documentation issue.
[12 Nov 2010 19:29]
Peter Laursen
This is more than a documentation issue. The problem is that *c-api-data-structures information now incomplete* (from 4.1). Documentation cannot change that!
[16 Nov 2010 8:21]
Stefan Hinz
As the bug reporter says, this is not a documentation bug. Please fix and/or indicate what exactly we should document (as a clarification/limitation/workaround).
[8 Sep 2012 9:24]
Peter Laursen
Any chance that this could be fixed soon? This report is now 2 years old. :-(
[1 Jul 2013 13:40]
Peter Laursen
there is a related discussion here: http://bugs.mysql.com/bug.php?id=69627
[1 Jul 2013 15:57]
Roland Bouman
Ok - I filed a separate bug for my findings: http://bugs.mysql.com/bug.php?id=69630&thanks=2¬ify=67 I'm not entirely sure I have full understanding of Peter's issue, but the metadata seem messed up to me.
[9 Jul 2013 13:53]
Georgi Kodinov
Peter, Provided the discussion in Ronald's bug (69630) do you see anything that isn't clear with how the field is filled ? In general, when it calculates the possible number of bytes it assumes the worst case : all characters taking the longest byte sequences as defined in the collation, e.g. for UTF8 that would be 3 bytes for each character (http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-utf8.html). The precise formula used to calculate the max length is in Protocol::send_result_set_metadata : /* For TEXT/BLOB columns, field_length describes the maximum data length in bytes. There is no limit to the number of characters that a TEXT column can store, as long as the data fits into the designated space. For the rest of textual columns, field_length is evaluated as char_count * mbmaxlen, where character count is taken from the definition of the column. In other words, the maximum number of characters here is limited by the column definition. When one has a LONG TEXT column with a single-byte character set, and the connection character set is multi-byte, the client may get fields longer than UINT_MAX32, due to <character set column> -> <character set connection> conversion. In that case column max length does not fit into the 4 bytes reserved for it in the protocol. */ max_length= (field.type >= MYSQL_TYPE_TINY_BLOB && field.type <= MYSQL_TYPE_BLOB) ? field.length / item->collation.collation->mbminlen : field.length / item->collation.collation->mbmaxlen; field_length= char_to_byte_length_safe(max_length, thd_charset->mbmaxlen);
[10 Aug 2013 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".