Bug #28875 A conversion between ASCII and LATIN1 charsets does not function
Submitted: 4 Jun 2007 14:13 Modified: 25 Aug 2007 13:54
Reporter: SINISA MILIVOJEVIC Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0BK, 5.1BK OS:Any
Assigned to: Alexander Barkov CPU Architecture:Any
Tags: bfsm_2007_06_21, bfsm_2007_06_28, regression

[4 Jun 2007 14:13] SINISA MILIVOJEVIC
Description:
When two fields with different coercibility levels have to be resolved, of which one is ASCII and the otehr is LATIN1, the error is returned instead of doing a fast and very simple conversion.

There is yet another problem, which is that function MIN() returns a wrong charset. It is not returning a charset of it's operand.

How to repeat:
1) CREATE TABLE `incidents` ( `ref_no` varchar(15) character set ascii NOT NULL, UNIQUE KEY `incidents$ref_no` (`ref_no`) ) ENGINE=InnoDB;

2) insert into incidents values ('070514-000000');

3) select CONCAT(incidents.ref_no,IFNULL(MIN( date_format(now(), '%Y-%m-%d')), ULL')) from incidents limit 1;

This error is returned:

ERROR 1267 (HY000): Illegal mix of collations (ascii_general_ci,IMPLICIT) and (latin1_swedish_ci,COERCIBLE) for operation 'concat'

If MIN() function is removed, it returns without error.

Suggested fix:
Make a very simple ascii -> latin1 and opposite conversion and fix charset returned by MIN() function.
[5 Jun 2007 2:46] Chris Calender
It would also be nice if this would work for UTF8 as well as ASCII.  At least when you use some simple collation, like swedish, then utf8_swedish_ci is just a superset of latin1_swedish_ci.

I realize there are other UTF8 combinations where conversions are not possible.

All you would need to do is add 'set names utf8;' to the beginning of the test case above.
[13 Jun 2007 12:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28650

ChangeSet@1.2518, 2007-06-13 17:18:21+05:00, bar@mysql.com +14 -0
  Bug#28875 Conversion between ASCII and LATIN1 charsets does not function
  (Regression, caused by a patch for the bug 22646).
  Problem: when result type of date_format() was changed from
  binary string to character string, mixing date_format()
  with a ascii column in CONCAT() stopped to work.
  Fix:
  - adding "repertoire" flag into DTCollation class,
  to mark items which can return only pure ASCII strings.
  - allow character set conversion from pure ASCII to other character sets.
[18 Jun 2007 11:07] Alexander Barkov
Thanks to Peter for indicating a small problem with the fix:

Swedish 7bit character set "swe7" is not a super set for ASCII.
Small additional piece of code is required to handle "swe7" correctly.

All other character sets are super-sets for ASCII.
[22 Jun 2007 5:52] Alexander Barkov
See also:

http://bugs.mysql.com/bug.php?id=27534
[1 Aug 2007 11:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/31951

ChangeSet@1.2518, 2007-08-01 16:25:51+05:00, bar@mysql.com +19 -0
  Bug#28875 Conversion between ASCII and LATIN1 charsets does not function
  (Regression, caused by a patch for the bug 22646).
  Problem: when result type of date_format() was changed from
  binary string to character string, mixing date_format()
  with a ascii column in CONCAT() stopped to work.
  Fix:
  - adding "repertoire" flag into DTCollation class,
  to mark items which can return only pure ASCII strings.
  - allow character set conversion from pure ASCII to other character sets.
[1 Aug 2007 11:32] Alexander Barkov
Unmarking "review done" checkboxes.
New version committed.
[3 Aug 2007 10:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/32057

ChangeSet@1.2518, 2007-08-03 15:25:23+05:00, bar@mysql.com +20 -0
  Bug#28875 Conversion between ASCII and LATIN1 charsets does not function
  (Regression, caused by a patch for the bug 22646).
  Problem: when result type of date_format() was changed from
  binary string to character string, mixing date_format()
  with a ascii column in CONCAT() stopped to work.
  Fix:
  - adding "repertoire" flag into DTCollation class,
  to mark items which can return only pure ASCII strings.
  - allow character set conversion from pure ASCII to other character sets.
[3 Aug 2007 12:51] Alexander Barkov
Additional review made by Serg.

Pushed into 5.0.44
Pushed into 5.1.21
[4 Aug 2007 1:18] Bugs System
Pushed into 5.1.21-beta
[4 Aug 2007 1:49] Bugs System
Pushed into 5.0.48
[6 Aug 2007 17:15] Paul DuBois
Noted in 5.0.48, 5.1.21 changelogs.

Using the result from DATE_FORMAT() and an ASCII value in CONCAT()
could produce incorrect results.
[7 Aug 2007 10:48] Alexander Barkov
Paul, the ChangeLog item is too small.
Please consider mentioning the following changes somewhere in the manual:

All string expressions now have a new feature: repertoire.

Repertoire can currently have two values:
- ASCII  - means that the expression can return
           characters only in Unicode range U+0000..U+007F

- UNICODE - means that the expression can return
            any characters in the range U+0000..U+FFFF

What benefits does the new "repertoire" feature provide?

It can allow character set conversion in many cases, where
older versions of MySQL returned "illegal mix of collations".

Examples of expressions, their repertoires,
and behavior changes:

1. String constants:

set names utf8; select 'string';
select _utf8'string';
select N'string';

   Note, although character set is utf8 in all the above cases,
   the strings do not really have any characters outside ASCII range!
   So their repertoire is ASCII.

2. Functions with one string argument inherit repertoire from their arguments:
   
   UPPER(_utf8'string') - has ASCII repertoire,
   because its argument has ASCII repertoire.

3. Columns having character set "ascii"
   have repertoire ASCII, just because of their character set.

CREATE TABLE t1 (c1 char(1) character set ascii)
c1 has ASCII repertoire.

Behavior change example:

drop table if exists t1;
create table t1 (c1 char(1) character set latin1, c2 char(1) character set ascii);
insert into t1 values ('a','b');
select concat(c1,c2) from t1;
drop table t1;

Old version:

ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT)
and (ascii_general_ci,IMPLICIT) for operation 'concat'

New version:

Allows subset to superset conversion (ascii to latin1),
and returns a result:
+---------------+
| concat(c1,c2) |
+---------------+
| ab            |
+---------------+
1 row in set (0.00 sec)

4. Functions that don't have string input and use
character_set_connection as the result character set:

   format(numeric_column, 4);

   If character_set_connection is "ascii", then
   repertoire is ASCII, otherwise it is UNICODE.

Behavior change example:

set names ascii;
drop table if exists t1;
create table t1 (a int, b varchar(10) character set latin1);
insert into t1 values (1,'b');
select concat(format(a, 4), b) from t1;
drop table t1;

Old version:

ERROR 1267 (HY000):
Illegal mix of collations (ascii_general_ci,COERCIBLE) and
(latin1_swedish_ci,IMPLICIT) for operation 'concat'

New version:

+-------------------------+
| concat(format(a, 4), b) |
+-------------------------+
| 1.0000b                 |
+-------------------------+
1 row in set (0.00 sec)

5. Functions with two or more string arguments
   use the widest repertoire from its arguments:

   CONCAT(_ucs2 0x0041, _ucs2 0x0042) - repertoire is ASCII, because
   both arguments are inside the ASCII range.

   CONCAT(_ucs2 0x0041, _ucs2 0x00C2) - repertoire is UNICODE, because
   the right argument is outside the ASCII range.

6. Functions with additional arguments mix repertoires
   of only those string arguments, which affect the result:

   if(column1 < column2, 'smaller', 'greater')

   Its repertoire is ASCII, because the two string arguments
   (the second argument and the third argument) both have ASCII 
   repertoire. The first argument does not matter for the result
   repertoire.

7. Tricky functions:

   DATE_FORMAT(now(), '%W, %d %M')
   has ASCII repertoire, but only when lc_time_names is en_US,
   because: the format string is pure ASCII, and month names
   in English can consist of only basic Latin letters, which
   are all inside the ASCII range.

   
   DATE_FORMAT(now(), '%W, %d %M')
   has UNICODE repertoire if lc_time_names is something else than
   en_US, because month names in non-English languages can have
   extended letters, outside the ASCII range.

   DATE_FORMAT(now(), '%Y-%m-%d')
   Special case - the format string does not have week or month names.
   So it could have ASCII repertoire, independently from lc_time_names
   settings. But DATE_FORMAT() implementation is not smart enough yet
   to detect this special case, so:
   - it depends on repertoire of the format string
   - it depends on repertoire of the locale (lc_time_names setting)
   - it currently does not depend on format itself
[7 Aug 2007 10:51] Alexander Barkov
Paul, I found that my patch fixed one regression,
but introduced another regression, which
should be fixed before the next 5.x release.

Marking the bug as a show-stopper.
[7 Aug 2007 10:53] Alexander Barkov
The new problem can be demonstrated by this script:

drop table if exists t1;
create table t1 (a char(1) character set latin1);
insert into t1 values ('a');
set names ascii;
select * from t1 where a='a';

It worked fine and returned results, until the fix for bug#28875 has broken it:

ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (ascii_general_ci,COERCIBLE) for operation '='
[7 Aug 2007 11:31] Alexander Barkov
Paul, the build team asked to mention this flaw in 5.1.21 release notes.
[7 Aug 2007 12:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/32197

ChangeSet@1.2519, 2007-08-07 17:56:33+05:00, bar@mysql.com +3 -0
  Reversing additional change suggested by Serg
  under terms of bug#28875 for better performance.
  The change appeared to require more changes in item_cmpfunc.cc,
  which is dangerous in 5.0.
[7 Aug 2007 14:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/32201

ChangeSet@1.2502, 2007-08-07 19:25:45+05:00, bar@mysql.com +3 -0
  Reversing additional change suggested by Serg
  under terms of bug#28875 for better performance.
  The change appeared to require more changes in item_cmpfunc.cc,
  which is dangerous in 5.0.
      
  Conversion between a latin1 column and an ascii string constant
  stopped to work.
[9 Aug 2007 17:30] Paul DuBois
Updated changelog entry (might need further revision depending on further work done for this bug).

Coercion of ASCII values to character sets that are a superset of
ASCII sometimes was not done, resulting in illegal mix of collations
errors. These cases now are resolved using repetoire, a new string
expression attribute.

This patch results in an outstanding problem that remains to be
fixed, illustrated by the following example that results in an
illegal mix of collations error: 

CREATE TABLE t1 (a CHAR(1) CHARACTER SET latin1);
INSERT INTO t1 VALUES ('a');
SET NAMES ascii;
SELECT * FROM t1 WHERE a='a';

(Leaving bug report in Patch queued status.)
[18 Aug 2007 0:05] Bugs System
Pushed into 5.0.48
[18 Aug 2007 0:15] Bugs System
Pushed into 5.1.22-beta
[25 Aug 2007 13:54] Paul DuBois
The "outstanding problem" mentioned previously applied only for the initial patch in 5.0.48. That has now been resolved, so the changelog entry for 5.0.48 no longer mentions any outstanding problem. For 5.1, everything was fixed in 5.1.21; the push into 5.1.22 presumably was a null-merge that adds no functional changes.
[24 Nov 2008 13:36] Alexander Barkov
Bug#25454 was marked as duplicate for this bug.