| Bug #24037 | Lossy Hebrew to Unicode conversion | ||
|---|---|---|---|
| Submitted: | 7 Nov 2006 9:36 | Modified: | 2 Feb 2007 2:54 |
| Reporter: | Domas Mituzas | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S2 (Serious) |
| Version: | 5.1-bk & friends | OS: | |
| Assigned to: | Alexey Kopytov | CPU Architecture: | Any |
| Tags: | bfsm_2006_12_07, character set, hebrew, Unicode | ||
[7 Nov 2006 9:37]
Domas Mituzas
Testcase, revealing lossing conversion of direction characters
Attachment: hebrew.sql (application/octet-stream, text), 211 bytes.
[21 Dec 2006 15:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/17265 ChangeSet@1.2558, 2006-12-21 18:16:46+03:00, kaa@polly.local +5 -0 Fix for the bug #24037 "Lossy Hebrew to Unicode conversion". Added definitions for the following Hebrew characters as specified by the ISO/IEC 8859-8:1999: LEFT-TO-RIGHT EMBEDDING (LRE) RIGHT-TO-LEFT EMBEDDING (RLE) LEFT-TO-RIGHT MARK (LRM) RIGHT-TO-LEFT MARK (RLM)
[22 Dec 2006 12:30]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/17324 ChangeSet@1.2558, 2006-12-22 15:30:37+03:00, kaa@polly.local +5 -0 Fix for the bug #24037 "Lossy Hebrew to Unicode conversion". Added definitions for the following Hebrew characters as specified by the ISO/IEC 8859-8:1999: LEFT-TO-RIGHT MARK (LRM) RIGHT-TO-LEFT MARK (RLM)
[22 Dec 2006 12:49]
Alexander Barkov
The patch http://lists.mysql.com/commits/17324 is ok to push.
[31 Jan 2007 19:11]
Chad MILLER
Available in 4.1.23, 5.0.36, 5.1.15-beta.
[2 Feb 2007 2:54]
Paul DuBois
Noted in 4.1.23, 5.0.36, 5.1.15 changelogs.

Description: FD is left-to-right mark (U+200E) and FE is right-to-left mark (U+200F), as specified in a newer amendment as ISO/IEC 8859-8:1999. Our uint16 to_uni_hebrew_bin[] = { defines those as 0x0000,0x0000 This results in hebrew character set unusable in unicode contexts, as well as default mysqldump output. How to repeat: Due to lossy character set issues, testcase will be attached as file. Suggested fix: Use 'binary' as default mysqldump character set? Implement ISO/IEC 8859-8:1999 changes in 'hebrew' character set.