MySQL Bugs: #24037: Lossy Hebrew to Unicode conversion

Bug #24037	Lossy Hebrew to Unicode conversion
Submitted:	7 Nov 2006 9:36	Modified:	2 Feb 2007 2:54
Reporter:	Domas Mituzas	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Charsets	Severity:	S2 (Serious)
Version:	5.1-bk & friends	OS:
Assigned to:	Alexey Kopytov	CPU Architecture:	Any
Tags:	bfsm_2006_12_07, character set, hebrew, Unicode

Description:
FD is left-to-right mark (U+200E) and FE is right-to-left mark (U+200F), as specified in a newer amendment as ISO/IEC 8859-8:1999.

Our uint16 to_uni_hebrew_bin[] = {

defines those as 0x0000,0x0000

This results in hebrew character set unusable in unicode contexts, as well as default mysqldump output. 

How to repeat:
Due to lossy character set issues, testcase will be attached as file.

Suggested fix:
Use 'binary' as default mysqldump character set?
Implement ISO/IEC 8859-8:1999 changes in 'hebrew' character set.

Testcase, revealing lossing conversion of direction characters

Attachment: hebrew.sql (application/octet-stream, text), 211 bytes.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/17265

ChangeSet@1.2558, 2006-12-21 18:16:46+03:00, kaa@polly.local +5 -0
  Fix for the bug #24037 "Lossy Hebrew to Unicode conversion".
  
  Added definitions for the following Hebrew characters as specified by the ISO/IEC 8859-8:1999:
  
  LEFT-TO-RIGHT EMBEDDING (LRE)
  RIGHT-TO-LEFT EMBEDDING (RLE)
  LEFT-TO-RIGHT MARK (LRM)
  RIGHT-TO-LEFT MARK (RLM)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/17324

ChangeSet@1.2558, 2006-12-22 15:30:37+03:00, kaa@polly.local +5 -0
  Fix for the bug #24037 "Lossy Hebrew to Unicode conversion".
  
  Added definitions for the following Hebrew characters as specified by the ISO/IEC 8859-8:1999:
  
  LEFT-TO-RIGHT MARK (LRM)
  RIGHT-TO-LEFT MARK (RLM)

The patch http://lists.mysql.com/commits/17324 is ok to push.

Available in 4.1.23, 5.0.36, 5.1.15-beta.

Noted in 4.1.23, 5.0.36, 5.1.15 changelogs.