Bug #24037 Lossy Hebrew to Unicode conversion
Submitted: 7 Nov 2006 9:36 Modified: 2 Feb 2007 2:54
Reporter: Domas Mituzas Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.1-bk & friends OS:
Assigned to: Alexey Kopytov CPU Architecture:Any
Tags: bfsm_2006_12_07, character set, hebrew, Unicode

[7 Nov 2006 9:36] Domas Mituzas
Description:
FD is left-to-right mark (U+200E) and FE is right-to-left mark (U+200F), as specified in a newer amendment as ISO/IEC 8859-8:1999.

Our uint16 to_uni_hebrew_bin[] = {

defines those as 0x0000,0x0000

This results in hebrew character set unusable in unicode contexts, as well as default mysqldump output. 

How to repeat:
Due to lossy character set issues, testcase will be attached as file.

Suggested fix:
Use 'binary' as default mysqldump character set?
Implement ISO/IEC 8859-8:1999 changes in 'hebrew' character set.
[7 Nov 2006 9:37] Domas Mituzas
Testcase, revealing lossing conversion of direction characters

Attachment: hebrew.sql (application/octet-stream, text), 211 bytes.

[21 Dec 2006 15:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/17265

ChangeSet@1.2558, 2006-12-21 18:16:46+03:00, kaa@polly.local +5 -0
  Fix for the bug #24037 "Lossy Hebrew to Unicode conversion".
  
  Added definitions for the following Hebrew characters as specified by the ISO/IEC 8859-8:1999:
  
  LEFT-TO-RIGHT EMBEDDING (LRE)
  RIGHT-TO-LEFT EMBEDDING (RLE)
  LEFT-TO-RIGHT MARK (LRM)
  RIGHT-TO-LEFT MARK (RLM)
[22 Dec 2006 12:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/17324

ChangeSet@1.2558, 2006-12-22 15:30:37+03:00, kaa@polly.local +5 -0
  Fix for the bug #24037 "Lossy Hebrew to Unicode conversion".
  
  Added definitions for the following Hebrew characters as specified by the ISO/IEC 8859-8:1999:
  
  LEFT-TO-RIGHT MARK (LRM)
  RIGHT-TO-LEFT MARK (RLM)
[22 Dec 2006 12:49] Alexander Barkov
The patch http://lists.mysql.com/commits/17324 is ok to push.
[31 Jan 2007 19:11] Chad MILLER
Available in 4.1.23, 5.0.36, 5.1.15-beta.
[2 Feb 2007 2:54] Paul DuBois
Noted in 4.1.23, 5.0.36, 5.1.15 changelogs.