Bug #56639 Character Euro (0x88) not converted from cp1251 to utf8
Submitted: 8 Sep 2010 10:32 Modified: 22 Dec 2010 19:20
Reporter: Anton Reznikov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.1.45 and later OS:Any
Assigned to: Alexander Barkov CPU Architecture:Any
Tags: cp1251, Euro
Triage: Triaged: D2 (Serious)

[8 Sep 2010 10:32] Anton Reznikov
Description:
If you have 0x88 character (Euro sign) in your cp1251-encoded string. You will not get 0x20AC character, when try to convert this string to utf8-encoding. You will get 0x88, that represented as '?'

How to repeat:
Look at position 136 of any 'to_uni_cp1251*' array in strings/ctype-extra.c file, you will see '0x0000' code instead of '0x20AC'. (see http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT)

Suggested fix:
Please replase 136 element of any 'to_uni_cp1251*' array in strings/ctype-extra.c file from '0x0000' to '0x20AC'.
[10 Sep 2010 21:58] Sveta Smirnova
Thank you for the report.

But there is no Euro sign in cp1251 chart at http://www.collation-charts.org/mysql60/mysql604.cp1251_general_ci.html This means this is illegal character for version of cp1251 which MySQL supports.
[13 Sep 2010 9:48] Anton Reznikov
Sveta, you are right if MySQL using it's own variant of cp1251 encoding instead of conventional.

See:
http://msdn.microsoft.com/ru-ru/goglobal/cc305144%28en-us%29.aspx
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT
[29 Sep 2010 15:51] Sveta Smirnova
Thank you for the feedback.

We discussed this internally and decided this is a bug.
[12 Nov 2010 16:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/123745

3505 Alexander Barkov	2010-11-12
      Bug#56639 Character Euro (0x88) not converted from cp1251 to utf8
      Problem: MySQL cp1251 did not support 'U+20AC EURO SIGN'
      which was assigned a few years ago to 0x88.
      
      Fix: adding mapping: 0x88 <-> U+20AC 
      
        @ mysql-test/include/ctype_8bit.inc
        New shared file to test 8bit character sets.
      
        @ mysql-test/r/ctype_cp1251.result
        @ mysql-test/t/ctype_cp1251.test
        Adding tests
      
        @ sql/share/charsets/cp1251.xml
        Adding mapping
      
        @ strings/ctype-extra.c
        Regenerating ctype-extra.c using strings/conf_to_src
        according to new cp1251.xml
[26 Nov 2010 14:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/125159

3522 Alexander Barkov	2010-11-26
      Bug#56639 Character Euro (0x88) not converted from cp1251 to utf8
      
      Problem: MySQL cp1251 did not support 'U+20AC EURO SIGN'
      which was assigned a few years ago to 0x88.
      
      Fix: adding mapping: 0x88 <-> U+20AC 
      
        @ mysql-test/include/ctype_8bit.inc
        New shared file to test 8bit character sets.
      
        @ mysql-test/r/ctype_cp1251.result
        @ mysql-test/t/ctype_cp1251.test
        Adding tests
      
        @ sql/share/charsets/cp1251.xml
        Adding mapping
      
        @ strings/ctype-extra.c
        Regenerating ctype-extra.c using strings/conf_to_src
        according to new cp1251.xml
[26 Nov 2010 14:30] Alexander Barkov
Pushed into mysql-5.1-bugteam [5.1.54]
Pushed into mysql-5.5-bugteam [5.5.8]
[26 Nov 2010 14:44] Alexander Barkov
Pushed into mysql-trunk-bugfixing [5.6.1-m5]
[5 Dec 2010 12:36] Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (version source revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (merge vers: 5.6.1) (pib:23)
[11 Dec 2010 17:24] Paul Dubois
Bug does not appear in any released 5.6.x version.

Setting report to Need Merge pending push to 5.1.x, 5.5.x.
[17 Dec 2010 12:47] Bugs System
Pushed into mysql-5.1 5.1.55 (revid:georgi.kodinov@oracle.com-20101217124435-9imm43geck5u55qw) (version source revid:mats.kindahl@oracle.com-20101201193331-1c07sjno2g7m46ix) (merge vers: 5.1.55) (pib:24)
[17 Dec 2010 12:54] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:georgi.kodinov@oracle.com-20101217124733-p1ivu6higouawv8l) (version source revid:georgi.kodinov@oracle.com-20101126153433-4dbn9nhn2fzehejo) (merge vers: 5.5.8) (pib:24)
[22 Dec 2010 19:20] Paul Dubois
Noted in 5.1.55, 5.5.9 changelogs.

The cp1251 character set did not properly support the Euro sign
(0x88). For example, converting a string containing this character to
utf8 resulted in '?' rather than the utf8 Euro sign.