| Bug #24037 | Lossy Hebrew to Unicode conversion | ||
|---|---|---|---|
| Submitted: | 7 Nov 2006 9:36 | Modified: | 2 Feb 2007 2:54 | 
| Reporter: | Domas Mituzas | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S2 (Serious) | 
| Version: | 5.1-bk & friends | OS: | |
| Assigned to: | Alexey Kopytov | CPU Architecture: | Any | 
| Tags: | bfsm_2006_12_07, character set, hebrew, Unicode | ||
   [7 Nov 2006 9:37]
   Domas Mituzas        
  Testcase, revealing lossing conversion of direction characters
Attachment: hebrew.sql (application/octet-stream, text), 211 bytes.
   [21 Dec 2006 15:17]
   Bugs System        
  A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/17265 ChangeSet@1.2558, 2006-12-21 18:16:46+03:00, kaa@polly.local +5 -0 Fix for the bug #24037 "Lossy Hebrew to Unicode conversion". Added definitions for the following Hebrew characters as specified by the ISO/IEC 8859-8:1999: LEFT-TO-RIGHT EMBEDDING (LRE) RIGHT-TO-LEFT EMBEDDING (RLE) LEFT-TO-RIGHT MARK (LRM) RIGHT-TO-LEFT MARK (RLM)
   [22 Dec 2006 12:30]
   Bugs System        
  A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/17324 ChangeSet@1.2558, 2006-12-22 15:30:37+03:00, kaa@polly.local +5 -0 Fix for the bug #24037 "Lossy Hebrew to Unicode conversion". Added definitions for the following Hebrew characters as specified by the ISO/IEC 8859-8:1999: LEFT-TO-RIGHT MARK (LRM) RIGHT-TO-LEFT MARK (RLM)
   [22 Dec 2006 12:49]
   Alexander Barkov        
  The patch http://lists.mysql.com/commits/17324 is ok to push.
   [31 Jan 2007 19:11]
   Chad MILLER        
  Available in 4.1.23, 5.0.36, 5.1.15-beta.
   [2 Feb 2007 2:54]
   Paul DuBois        
  Noted in 4.1.23, 5.0.36, 5.1.15 changelogs.


Description: FD is left-to-right mark (U+200E) and FE is right-to-left mark (U+200F), as specified in a newer amendment as ISO/IEC 8859-8:1999. Our uint16 to_uni_hebrew_bin[] = { defines those as 0x0000,0x0000 This results in hebrew character set unusable in unicode contexts, as well as default mysqldump output. How to repeat: Due to lossy character set issues, testcase will be attached as file. Suggested fix: Use 'binary' as default mysqldump character set? Implement ISO/IEC 8859-8:1999 changes in 'hebrew' character set.