Bug #28875 | A conversion between ASCII and LATIN1 charsets does not function | ||
---|---|---|---|
Submitted: | 4 Jun 2007 14:13 | Modified: | 25 Aug 2007 13:54 |
Reporter: | SINISA MILIVOJEVIC | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S2 (Serious) |
Version: | 5.0BK, 5.1BK | OS: | Any |
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
Tags: | bfsm_2007_06_21, bfsm_2007_06_28, regression |
[4 Jun 2007 14:13]
SINISA MILIVOJEVIC
[5 Jun 2007 2:46]
Chris Calender
It would also be nice if this would work for UTF8 as well as ASCII. At least when you use some simple collation, like swedish, then utf8_swedish_ci is just a superset of latin1_swedish_ci. I realize there are other UTF8 combinations where conversions are not possible. All you would need to do is add 'set names utf8;' to the beginning of the test case above.
[13 Jun 2007 12:19]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28650 ChangeSet@1.2518, 2007-06-13 17:18:21+05:00, bar@mysql.com +14 -0 Bug#28875 Conversion between ASCII and LATIN1 charsets does not function (Regression, caused by a patch for the bug 22646). Problem: when result type of date_format() was changed from binary string to character string, mixing date_format() with a ascii column in CONCAT() stopped to work. Fix: - adding "repertoire" flag into DTCollation class, to mark items which can return only pure ASCII strings. - allow character set conversion from pure ASCII to other character sets.
[18 Jun 2007 11:07]
Alexander Barkov
Thanks to Peter for indicating a small problem with the fix: Swedish 7bit character set "swe7" is not a super set for ASCII. Small additional piece of code is required to handle "swe7" correctly. All other character sets are super-sets for ASCII.
[22 Jun 2007 5:52]
Alexander Barkov
See also: http://bugs.mysql.com/bug.php?id=27534
[1 Aug 2007 11:28]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/31951 ChangeSet@1.2518, 2007-08-01 16:25:51+05:00, bar@mysql.com +19 -0 Bug#28875 Conversion between ASCII and LATIN1 charsets does not function (Regression, caused by a patch for the bug 22646). Problem: when result type of date_format() was changed from binary string to character string, mixing date_format() with a ascii column in CONCAT() stopped to work. Fix: - adding "repertoire" flag into DTCollation class, to mark items which can return only pure ASCII strings. - allow character set conversion from pure ASCII to other character sets.
[1 Aug 2007 11:32]
Alexander Barkov
Unmarking "review done" checkboxes. New version committed.
[3 Aug 2007 10:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/32057 ChangeSet@1.2518, 2007-08-03 15:25:23+05:00, bar@mysql.com +20 -0 Bug#28875 Conversion between ASCII and LATIN1 charsets does not function (Regression, caused by a patch for the bug 22646). Problem: when result type of date_format() was changed from binary string to character string, mixing date_format() with a ascii column in CONCAT() stopped to work. Fix: - adding "repertoire" flag into DTCollation class, to mark items which can return only pure ASCII strings. - allow character set conversion from pure ASCII to other character sets.
[3 Aug 2007 12:51]
Alexander Barkov
Additional review made by Serg. Pushed into 5.0.44 Pushed into 5.1.21
[4 Aug 2007 1:18]
Bugs System
Pushed into 5.1.21-beta
[4 Aug 2007 1:49]
Bugs System
Pushed into 5.0.48
[6 Aug 2007 17:15]
Paul DuBois
Noted in 5.0.48, 5.1.21 changelogs. Using the result from DATE_FORMAT() and an ASCII value in CONCAT() could produce incorrect results.
[7 Aug 2007 10:48]
Alexander Barkov
Paul, the ChangeLog item is too small. Please consider mentioning the following changes somewhere in the manual: All string expressions now have a new feature: repertoire. Repertoire can currently have two values: - ASCII - means that the expression can return characters only in Unicode range U+0000..U+007F - UNICODE - means that the expression can return any characters in the range U+0000..U+FFFF What benefits does the new "repertoire" feature provide? It can allow character set conversion in many cases, where older versions of MySQL returned "illegal mix of collations". Examples of expressions, their repertoires, and behavior changes: 1. String constants: set names utf8; select 'string'; select _utf8'string'; select N'string'; Note, although character set is utf8 in all the above cases, the strings do not really have any characters outside ASCII range! So their repertoire is ASCII. 2. Functions with one string argument inherit repertoire from their arguments: UPPER(_utf8'string') - has ASCII repertoire, because its argument has ASCII repertoire. 3. Columns having character set "ascii" have repertoire ASCII, just because of their character set. CREATE TABLE t1 (c1 char(1) character set ascii) c1 has ASCII repertoire. Behavior change example: drop table if exists t1; create table t1 (c1 char(1) character set latin1, c2 char(1) character set ascii); insert into t1 values ('a','b'); select concat(c1,c2) from t1; drop table t1; Old version: ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (ascii_general_ci,IMPLICIT) for operation 'concat' New version: Allows subset to superset conversion (ascii to latin1), and returns a result: +---------------+ | concat(c1,c2) | +---------------+ | ab | +---------------+ 1 row in set (0.00 sec) 4. Functions that don't have string input and use character_set_connection as the result character set: format(numeric_column, 4); If character_set_connection is "ascii", then repertoire is ASCII, otherwise it is UNICODE. Behavior change example: set names ascii; drop table if exists t1; create table t1 (a int, b varchar(10) character set latin1); insert into t1 values (1,'b'); select concat(format(a, 4), b) from t1; drop table t1; Old version: ERROR 1267 (HY000): Illegal mix of collations (ascii_general_ci,COERCIBLE) and (latin1_swedish_ci,IMPLICIT) for operation 'concat' New version: +-------------------------+ | concat(format(a, 4), b) | +-------------------------+ | 1.0000b | +-------------------------+ 1 row in set (0.00 sec) 5. Functions with two or more string arguments use the widest repertoire from its arguments: CONCAT(_ucs2 0x0041, _ucs2 0x0042) - repertoire is ASCII, because both arguments are inside the ASCII range. CONCAT(_ucs2 0x0041, _ucs2 0x00C2) - repertoire is UNICODE, because the right argument is outside the ASCII range. 6. Functions with additional arguments mix repertoires of only those string arguments, which affect the result: if(column1 < column2, 'smaller', 'greater') Its repertoire is ASCII, because the two string arguments (the second argument and the third argument) both have ASCII repertoire. The first argument does not matter for the result repertoire. 7. Tricky functions: DATE_FORMAT(now(), '%W, %d %M') has ASCII repertoire, but only when lc_time_names is en_US, because: the format string is pure ASCII, and month names in English can consist of only basic Latin letters, which are all inside the ASCII range. DATE_FORMAT(now(), '%W, %d %M') has UNICODE repertoire if lc_time_names is something else than en_US, because month names in non-English languages can have extended letters, outside the ASCII range. DATE_FORMAT(now(), '%Y-%m-%d') Special case - the format string does not have week or month names. So it could have ASCII repertoire, independently from lc_time_names settings. But DATE_FORMAT() implementation is not smart enough yet to detect this special case, so: - it depends on repertoire of the format string - it depends on repertoire of the locale (lc_time_names setting) - it currently does not depend on format itself
[7 Aug 2007 10:51]
Alexander Barkov
Paul, I found that my patch fixed one regression, but introduced another regression, which should be fixed before the next 5.x release. Marking the bug as a show-stopper.
[7 Aug 2007 10:53]
Alexander Barkov
The new problem can be demonstrated by this script: drop table if exists t1; create table t1 (a char(1) character set latin1); insert into t1 values ('a'); set names ascii; select * from t1 where a='a'; It worked fine and returned results, until the fix for bug#28875 has broken it: ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (ascii_general_ci,COERCIBLE) for operation '='
[7 Aug 2007 11:31]
Alexander Barkov
Paul, the build team asked to mention this flaw in 5.1.21 release notes.
[7 Aug 2007 12:57]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/32197 ChangeSet@1.2519, 2007-08-07 17:56:33+05:00, bar@mysql.com +3 -0 Reversing additional change suggested by Serg under terms of bug#28875 for better performance. The change appeared to require more changes in item_cmpfunc.cc, which is dangerous in 5.0.
[7 Aug 2007 14:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/32201 ChangeSet@1.2502, 2007-08-07 19:25:45+05:00, bar@mysql.com +3 -0 Reversing additional change suggested by Serg under terms of bug#28875 for better performance. The change appeared to require more changes in item_cmpfunc.cc, which is dangerous in 5.0. Conversion between a latin1 column and an ascii string constant stopped to work.
[9 Aug 2007 17:30]
Paul DuBois
Updated changelog entry (might need further revision depending on further work done for this bug). Coercion of ASCII values to character sets that are a superset of ASCII sometimes was not done, resulting in illegal mix of collations errors. These cases now are resolved using repetoire, a new string expression attribute. This patch results in an outstanding problem that remains to be fixed, illustrated by the following example that results in an illegal mix of collations error: CREATE TABLE t1 (a CHAR(1) CHARACTER SET latin1); INSERT INTO t1 VALUES ('a'); SET NAMES ascii; SELECT * FROM t1 WHERE a='a'; (Leaving bug report in Patch queued status.)
[18 Aug 2007 0:05]
Bugs System
Pushed into 5.0.48
[18 Aug 2007 0:15]
Bugs System
Pushed into 5.1.22-beta
[25 Aug 2007 13:54]
Paul DuBois
The "outstanding problem" mentioned previously applied only for the initial patch in 5.0.48. That has now been resolved, so the changelog entry for 5.0.48 no longer mentions any outstanding problem. For 5.1, everything was fixed in 5.1.21; the push into 5.1.22 presumably was a null-merge that adds no functional changes.
[24 Nov 2008 13:36]
Alexander Barkov
Bug#25454 was marked as duplicate for this bug.