Bug #80099 Wrong Collation cp1256_general_ci for CP1256 Charset
Submitted: 21 Jan 2016 13:18 Modified: 3 Jun 2018 11:19
Reporter: Ehsan Darrudi Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:5.5.47 OS:CentOS
Assigned to: CPU Architecture:Any

[21 Jan 2016 13:18] Ehsan Darrudi
Description:
The provided cp1256_general_ci collation for cp1256 charset (Arabic) has a wrong sort order. More specifically the charcter 'ت' is misplaced. It must appear next to 'ب' but is shown near 'ط'.

reference for the correct order: 

https://en.wikipedia.org/wiki/Arabic_alphabet

The same applies to Persian alphabet.

How to repeat:
1- create a table with cp1256 charset having a varchar column named 'name'
2- add some rows to the table having different Arabic characters for the name columns:
- ا 
- ب
- ت
- ث
- ج
- ح
- خ
- د 
- ذ
- س
- ش
- ط
- ظ

3- then query the table sorting by name. The sort order for 'ت' row is wrong!

Suggested fix:
In the source code:
The problem is rooted in a wrong sort order coded in 'ctype-extra.c' file from the 'strings' directory. In the array variable sort_order_cp1256_general_ci there are two '0x9F' items. The first one must be '0x8F'. Apparently there has been a typo here. :)

The correct code:

static const uchar sort_order_cp1256_general_ci[] = {
0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0A,0x0B,0x0C,0x0D,0x0E,0x0F,
0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,0x18,0x19,0x1A,0x1B,0x1C,0x1D,0x1E,0x1F,
0x20,0x21,0x22,0x23,0x24,0x25,0x26,0x27,0x28,0x29,0x2A,0x2B,0x2C,0x2D,0x2E,0x2F,
0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x3A,0x3B,0x3C,0x3D,0x3E,0x3F,
0x40,0x41,0x45,0x47,0x4A,0x4C,0x52,0x55,0x57,0x59,0x5D,0x5F,0x61,0x63,0x65,0x67,
0x6C,0x6E,0x70,0x72,0x74,0x76,0x7B,0x7D,0x7F,0x81,0x83,0xB9,0xBA,0xBB,0xBC,0xBD,
0xBE,0x41,0x45,0x47,0x4A,0x4C,0x52,0x55,0x57,0x59,0x5D,0x5F,0x61,0x63,0x65,0x67,
0x6C,0x6E,0x70,0x72,0x74,0x76,0x7B,0x7D,0x7F,0x81,0x83,0xBF,0xC0,0xC1,0xC2,0xC3,
0xC4,0x8E,0xC5,0x54,0xC6,0xC7,0xC8,0xC9,0xCA,0xCB,0xCC,0xCD,0x6A,0x92,0x99,0xCE,
0xA5,0xCF,0xD0,0xD1,0xD2,0xD3,0xD4,0xD5,0xD6,0xD7,0xD8,0xD9,0x6A,0xDA,0xDB,0xDC,
0xDD,0xB6,0xDE,0xDF,0xE0,0xE1,0xE2,0xE3,0xE4,0xE5,0xE6,0xE7,0xE8,0xE9,0xEA,0xEB,
0xEC,0xED,0xEE,0xEF,0xF0,0xF1,0xF2,0xF3,0xF4,0xF5,0xB7,0xF6,0xF7,0xF8,0xF9,0xB8,
0xFA,0x85,0x86,0x87,0x88,0x89,0x8A,0x8B,0x8C,0x8D,0x8F,0x90,0x91,0x93,0x94,0x95,
0x96,0x97,0x98,0x9A,0x9B,0x9C,0x9D,0xFB,0x9E,0x9F,0xA0,0xA1,0xAD,0xA2,0xA3,0xA4,
0x43,0xA6,0x44,0xA7,0xA8,0xA9,0xAA,0x49,0x4E,0x4F,0x50,0x51,0xAB,0xAC,0x5B,0x5C,
0xAE,0xAF,0xB0,0xB1,0x69,0xB2,0xB3,0xFC,0xB4,0x78,0xB5,0x79,0x7A,0xFD,0xFE,0xFF
};
[21 Jan 2016 13:25] Ehsan Darrudi
snapshot showing the wrong sort order of cp1256_general_ci

Attachment: bad.png (image/png, text), 3.86 KiB.

[21 Jan 2016 13:25] Ehsan Darrudi
after fixing the code and recompiling mysql

Attachment: good.png (image/png, text), 5.89 KiB.

[3 May 2018 11:19] MySQL Verification Team
Sorry for the delay. Please check with current released version. Thanks.
[4 Jun 2018 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".