Bug #32183 Wrong Thai Sorting in MySQL 5.0.45 charset=utf8
Submitted: 8 Nov 2007 11:53 Modified: 11 Nov 2007 12:13
Reporter: Prateep Kulapalanont Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:5.0.45 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: Mysql 5, Thai Sorting, utf8

[8 Nov 2007 11:53] Prateep Kulapalanont
Description:
Wrong Thai Sorting in MySQL 5.0.45 charset=utf8

as compare with proper Thai Sorting in charset=tis620 

I have test with MySQL 4.xx found OK

For example:

Create tables with 3 fields:   `proper-order`,`ThaiWord`,`Pronounce`

Then put number as suppose to have right order in proper-order

Put ThaiWord with English Pronounce to show the problem
If proper sorting we will see the right sorting in Pronounce as it is English.

We found the results of  sorting between tis620 and utf8 in MySQL 5 in the same data are different. as

In default charset=tis620  >>>>>>>>>>>>>>> Correct
  	         proper-order ThaiWord  Ascending    Pronounce
	 	 	1 	ดาเนียน 	   Danian
	 	 	2 	เดวิด 	          David
	 	 	3 	มาตี้ 	           Marty
	 	 	4 	แมรี 	          Mary

In default charset=utf8    >>>>>>>>>>>>>>>  Wrong 

Full Texts  	  proper-order 	ThaiWord Ascending 	Pronounce
	        	1 	ดาเนียน 	            Danian
	         	3 	มาตี้ 	                    Marty
	        	2 	เดวิด 	                    David
	        	4 	แมรี 	                   Mary

How to repeat:
-- phpMyAdmin SQL Dump
-- version 2.10.2
-- http://www.phpmyadmin.net
-- 
-- Host: localhost
-- Generation Time: Nov 08, 2007 at 06:51 PM
-- Server version: 5.0.45
-- PHP Version: 5.2.4

SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";

-- 
-- Database: `testthai`
-- 

-- --------------------------------------------------------

-- 
-- Table structure for table `thaitis620`
-- 

CREATE TABLE `thaitis620` (
  `proper-order` int(2) NOT NULL,
  `ThaiWord` varchar(20) NOT NULL,
  `Pronounce` varchar(20) NOT NULL,
  KEY `proper-order` (`proper-order`)
) ENGINE=MyISAM DEFAULT CHARSET=tis620;

-- 
-- Dumping data for table `thaitis620`
-- 

INSERT INTO `thaitis620` VALUES (1, 'ดาเนียน', 'Danian');
INSERT INTO `thaitis620` VALUES (2, 'เดวิด', 'David');
INSERT INTO `thaitis620` VALUES (3, 'มาตี้', 'Marty');
INSERT INTO `thaitis620` VALUES (4, 'แมรี', 'Mary');

-- --------------------------------------------------------

-- 
-- Table structure for table `thaiutf8`
-- 

CREATE TABLE `thaiutf8` (
  `proper-order` int(2) NOT NULL,
  `ThaiWord` varchar(20) NOT NULL,
  `Pronounce` varchar(20) NOT NULL,
  KEY `proper-order` (`proper-order`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

-- 
-- Dumping data for table `thaiutf8`
-- 

INSERT INTO `thaiutf8` VALUES (1, 'ดาเนียน', 'Danian');
INSERT INTO `thaiutf8` VALUES (2, 'เดวิด', 'David');
INSERT INTO `thaiutf8` VALUES (3, 'มาตี้', 'Marty');
INSERT INTO `thaiutf8` VALUES (4, 'แมรี', 'Mary');

Suggested fix:
To fix:

Just go to MySQL4 and see how you do it.
Since MySQL4 is ok in both mode!!!
[10 Nov 2007 15:00] Peter Laursen
Looks like a THAI collation should be added to utf8 and ucs2!
[11 Nov 2007 12:13] Valeriy Kravchuk
I think, this:

"Looks like a THAI collation should be added to utf8 and ucs2!"

is a reasonable feature request, as "generic" utf8 collations does not owrk properly for Thai.
[11 Nov 2007 22:10] Peter Laursen
or maybe it should rather be considered a bug with (all ?) existing unicode collations?