Bug #1770 crash on select distinct query with character-set TIS620
Submitted: 6 Nov 2003 9:56 Modified: 24 Aug 2004 12:03
Reporter: sathit jittanupat Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:4.0.15,4.0.16 OS:Windows (windows 2000/xp)
Assigned to: Assigned Account CPU Architecture:Any

[6 Nov 2003 9:56] sathit jittanupat
Description:
2003-10-29

problem: crash on select distinct query with character-set TIS620
platform: WinXP/Win2000, Apache2, Php 4.3.3, MySQL 4.0.16-max-nt 
testing tool: phpMyAdmin 2.5.2-pl1

note: only crash with default-character-set=TIS620 with latin1 not crash ????

1. create table+data in test database with "test-data.sql"

2. use "test-nocrash-query.sql" to see result 28 rows

SELECT DISTINCT 
CONCAT(IF( LOCATE( 'year=\"', dyninfo ) = 1, SUBSTRING( dyninfo, 6+1, LOCATE('\"\r',dyninfo) - 6 -1),IF( LOCATE( '\nyear=\"', dyninfo ), SUBSTRING( dyninfo, LOCATE( '\nyear=\"', dyninfo ) + 7, LOCATE( '\"\r', SUBSTRING( dyninfo, LOCATE( '\nyear=\"', dyninfo ) +7 )) - 1), '' ))) 
AS year 
FROM nuke_nap_entity WHERE class="carframe" HAVING year != '' ORDER BY year
	
3. use "test-crash-query.sql" now it crashed, mysql service stops then connection lost

SELECT DISTINCT 
(IF( LOCATE( 'year=\"', dyninfo ) = 1, SUBSTRING( dyninfo, 6+1, LOCATE('\"\r',dyninfo) - 6 -1), IF( LOCATE( '\nyear=\"', dyninfo ), SUBSTRING( dyninfo, LOCATE( '\nyear=\"', dyninfo ) + 7, LOCATE( '\"\r', SUBSTRING( dyninfo, LOCATE( '\nyear=\"', dyninfo ) +7 )) - 1), '' ))) 
AS year 
FROM nuke_nap_entity WHERE class="carframe" HAVING year != '' ORDER BY year

4. my decriptions
*different between crash and nocrash is
	crash: 	(IF (....)) AS year
	nocrash: CONCAT(IF(....)," ") AS year
 but I don't known the reason.
 
*test with MySQL 3.x not crash all.
*when change to extract other dyninfo's subfield it's not crash
example dyninfo  'type':
	SELECT DISTINCT 
	(IF( LOCATE( 'type=\"', dyninfo ) = 1, SUBSTRING( dyninfo, 6+1, LOCATE('\"\r',dyninfo) - 6 -1), IF( LOCATE( '\ntype=\"', dyninfo ), SUBSTRING( dyninfo, LOCATE( '\ntype=\"', dyninfo ) + 7, LOCATE( '\"\r', SUBSTRING( dyninfo, LOCATE( '\ntype=\"', dyninfo ) +7 )) - 1), '' ))) 
	AS type 
	FROM nuke_nap_entity WHERE class="carframe" HAVING type != '' ORDER BY type

example dyninfo 'model':
	SELECT DISTINCT 
	(IF( LOCATE( 'model=\"', dyninfo ) = 1, SUBSTRING( dyninfo, 7+1, LOCATE('\"\r',dyninfo) - 7 -1), IF( LOCATE( '\nmodel=\"', dyninfo ), SUBSTRING( dyninfo, LOCATE( '\nmodel=\"', dyninfo ) + 8, LOCATE( '\"\r', SUBSTRING( dyninfo, LOCATE( '\nmodel=\"', dyninfo ) +8 )) - 1), '' ))) 
	AS model 
	FROM nuke_nap_entity WHERE class="carframe" HAVING model != '' ORDER BY model

*when change AS year to AS dyn_year, it also crash. Wording 'year' is not the problem.
	SELECT DISTINCT 
	(IF( LOCATE( 'year=\"', dyninfo ) = 1, SUBSTRING( dyninfo, 6+1, LOCATE('\"\r',dyninfo) - 6 -1), IF( LOCATE( '\nyear=\"', dyninfo ), SUBSTRING( dyninfo, LOCATE( '\nyear=\"', dyninfo ) + 7, LOCATE( '\"\r', SUBSTRING( dyninfo, LOCATE( '\nyear=\"', dyninfo ) +7 )) - 1), '' ))) 
	AS dyn_year 
	FROM nuke_nap_entity WHERE class="carframe" HAVING dyn_year != '' ORDER BY dyn_year

*when not use DISTINCT, it's not crash

How to repeat:
I'll attach sql file for create testing data and sql with crash query file 

Suggested fix:
I've just modify the ctype-tis620.c, including in attach file.
build with vc7 and test mysqld-max-nt.exe in windows xp
[6 Nov 2003 9:58] sathit jittanupat
crash with tis620, with ctype-tis620.c patch

Attachment: test-mysql-crash.zip (application/x-zip-compressed, text), 26.38 KiB.

[8 Nov 2003 9:16] sathit jittanupat
revise of ctype-tis620

Attachment: ctype-tis620.zip (application/x-zip-compressed, text), 6.31 KiB.

[14 Nov 2003 12:54] Dean Ellis
Crash verified on 4.0.16/Windows and 4.0.17/Linux.

Thank you.
[25 Nov 2003 2:44] Alexander Barkov
Can you please send me the data and you fix?
Thank you.
[25 Nov 2003 11:06] sathit jittanupat
Sorry, I am Thai with poor english, but I'll try to describe.

First,the data file (*.sql) & my fix code (ctype-tis620.c) already upload in files section.

You must create database with default-character-set=tis620, other character set may not crash.

I think the original code of ctype-tis620.c can cause memory leak in function thai2sortable() 
- when fail with malloc it return the original string
- but the caller function try to free without checking whether it is allocated buffer or original string. 

With my new source I commented out whole function and almost write new function and change the caller function almost of it.

I notice that function thai2sortable() also produces strange sorting order, it don't detect any different between number of space or sign character as !#$%-+ it has no sorting value.

With my working database today I temporary use default-character-set=hp8. It can sort a raw full 8 bit ascii (yes! feel like my old DOS-Database sorting without any locale). I think the diferrence between 8 bit ascii and TIS620 (thai language) only in some characters with type called leading-vowel (only 5 characters: ascii value 0xE0 - 0xE4) that need to swap with it's follow-char. This means when found leading-vowel you must sort with the follow-char instead.

note: I try to rebuild with vc++ version 7 but it can not complete in some modules (luckily mysqld*.exe success, now I've just test it in my computer). I don't understand the errors and warnings, also not familiar with vc++ ide environment. So I'm not sure how stable it is.

thank you
sathit
[13 Dec 2003 12:22] Michael Widenius
I have looked at your proposed ctype-tis620.c and it looks quite nice and clear.

The problem is that I don't know how thai sorting is supposed to work and what is the main difference (in usage) between the original one and yours.

Alexander Barkov will spend some time doing this ASAP;  Anyone who has interested in thai sorting, please contact him at bar@mysql.com and send him your comments.
[23 Dec 2003 8:55] sathit jittanupat
new ctype-tis620.c and data for testing

Attachment: mysql-newtis620.zip (application/x-zip-compressed, text), 40.20 KiB.

[23 Dec 2003 8:58] sathit jittanupat
sceen shot show differences in sorting

Attachment: mysql-sort-bug01.jpg (image/pjpeg, text), 106.88 KiB.

[23 Dec 2003 9:01] sathit jittanupat
screen shot

Attachment: mysql-sort-bug02.jpg (image/pjpeg, text), 168.07 KiB.

[23 Dec 2003 9:50] sathit jittanupat
I've just finished prepare testing data for those who can't read Thai.
It's done by
1. Exported some thainame data then load to MS-Excel Thai
2. Use MS-Excel Thai to sort the "name" column
3. Within Excel, add another column (named "excelorder"), set order no. from 1 to n for each row, then save as CSV
4. restart MYSQL 4.0.16 with default-character-set=tis620
5. use phpMyAdmin import CSV to MYSQL, run ALTER TABLE thainame ORDER BY name, then export to CSV
6. Use MS-Excel open CSV again, add another column (named "mysqlorder"), set order no. from 1 to n for each row, then save as CSV
7. restart MYSQL 4.0.16 with default-character-set=latin1
8. use phpMyAdmin import CSV,run ALTER TABLE thainame ORDER BY name, then export to CSV
9. Use MS-Excel open CSV again, add another column (named "latin1order"), set order no. from 1 to n for each row, then save as CSV
10. remove MYSQL service (win32), then install my new modified tis620 database instead, restart MYSQL 4.0.16 with default-character-set=tis620
11. use phpMyAdmin import CSV,run ALTER TABLE thainame ORDER BY name, then export to CSV
12. Use MS-Excel open CSV again, add another column (named "neworder"), set order no. from 1 to n for each row, then save as CSV
13. use phpMyAdmin load CSV, export result data to thainame.sql.gz 

My testing table "thainame" and new source code ctype-tis620.c already upload in section Files.
I also upload 2 screen shot files to show differences between 4 sort results (excel thai,mysql-tis620,mysql-latin1,mysql-newtis620)
I can't state that which one is right and which one is wrong.

Excel Thai (use win32 thai sort api) may be used as refference, but I also found strange behavior with character minus "-" (see the screen shot). This happened when sorting with english string not only thai string.

My new ctype-tis620.c not only solve crashing bug, but also correct the sorting algorithm. You can see my screen shot for some notices.

Thank you
Sathit Jittanupat
[30 Jan 2004 0:32] Alexander Barkov
Sathit, I'm going to apply your patch today or tomorrow.
ctype-tis620.c states that anyone can use it for any purposes.

The test supplied with the patch doesn't have any notes about usage.
So I'm asking your permission, is it OK to insert your test into our test suit?

Thank you.
[1 Feb 2004 10:06] sathit jittanupat
Dear Mr.Barkov
 You can keep the test data for future testing and use it freely. I have no restriction about it.

sathit
[3 Feb 2004 2:01] Alexander Barkov
Sathit, can you please reduce the size of this test,
which demonstrates that Thai sorting is done well.
Is it possible to remove regular, not interesting lines,
keeping only important points?

Thank you!
[4 Feb 2004 14:38] sathit jittanupat
Mr.Barkov,
  I am preparing a new data for sorting test (thai + english character) with more clearly description for each important points. It may take a time.

Thank you.
[4 Feb 2004 21:06] Alexander Barkov
Deer Jittanupat,
please note it should not be big, please try to stay under 30K.
Thank you very much.
[24 Aug 2004 12:03] Sergei Golubchik
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.