Bug #1770 | crash on select distinct query with character-set TIS620 | ||
---|---|---|---|
Submitted: | 6 Nov 2003 9:56 | Modified: | 24 Aug 2004 12:03 |
Reporter: | sathit jittanupat | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server | Severity: | S1 (Critical) |
Version: | 4.0.15,4.0.16 | OS: | Windows (windows 2000/xp) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[6 Nov 2003 9:56]
sathit jittanupat
[6 Nov 2003 9:58]
sathit jittanupat
crash with tis620, with ctype-tis620.c patch
Attachment: test-mysql-crash.zip (application/x-zip-compressed, text), 26.38 KiB.
[8 Nov 2003 9:16]
sathit jittanupat
revise of ctype-tis620
Attachment: ctype-tis620.zip (application/x-zip-compressed, text), 6.31 KiB.
[14 Nov 2003 12:54]
Dean Ellis
Crash verified on 4.0.16/Windows and 4.0.17/Linux. Thank you.
[25 Nov 2003 2:44]
Alexander Barkov
Can you please send me the data and you fix? Thank you.
[25 Nov 2003 11:06]
sathit jittanupat
Sorry, I am Thai with poor english, but I'll try to describe. First,the data file (*.sql) & my fix code (ctype-tis620.c) already upload in files section. You must create database with default-character-set=tis620, other character set may not crash. I think the original code of ctype-tis620.c can cause memory leak in function thai2sortable() - when fail with malloc it return the original string - but the caller function try to free without checking whether it is allocated buffer or original string. With my new source I commented out whole function and almost write new function and change the caller function almost of it. I notice that function thai2sortable() also produces strange sorting order, it don't detect any different between number of space or sign character as !#$%-+ it has no sorting value. With my working database today I temporary use default-character-set=hp8. It can sort a raw full 8 bit ascii (yes! feel like my old DOS-Database sorting without any locale). I think the diferrence between 8 bit ascii and TIS620 (thai language) only in some characters with type called leading-vowel (only 5 characters: ascii value 0xE0 - 0xE4) that need to swap with it's follow-char. This means when found leading-vowel you must sort with the follow-char instead. note: I try to rebuild with vc++ version 7 but it can not complete in some modules (luckily mysqld*.exe success, now I've just test it in my computer). I don't understand the errors and warnings, also not familiar with vc++ ide environment. So I'm not sure how stable it is. thank you sathit
[13 Dec 2003 12:22]
Michael Widenius
I have looked at your proposed ctype-tis620.c and it looks quite nice and clear. The problem is that I don't know how thai sorting is supposed to work and what is the main difference (in usage) between the original one and yours. Alexander Barkov will spend some time doing this ASAP; Anyone who has interested in thai sorting, please contact him at bar@mysql.com and send him your comments.
[23 Dec 2003 8:55]
sathit jittanupat
new ctype-tis620.c and data for testing
Attachment: mysql-newtis620.zip (application/x-zip-compressed, text), 40.20 KiB.
[23 Dec 2003 8:58]
sathit jittanupat
sceen shot show differences in sorting
Attachment: mysql-sort-bug01.jpg (image/pjpeg, text), 106.88 KiB.
[23 Dec 2003 9:01]
sathit jittanupat
screen shot
Attachment: mysql-sort-bug02.jpg (image/pjpeg, text), 168.07 KiB.
[23 Dec 2003 9:50]
sathit jittanupat
I've just finished prepare testing data for those who can't read Thai. It's done by 1. Exported some thainame data then load to MS-Excel Thai 2. Use MS-Excel Thai to sort the "name" column 3. Within Excel, add another column (named "excelorder"), set order no. from 1 to n for each row, then save as CSV 4. restart MYSQL 4.0.16 with default-character-set=tis620 5. use phpMyAdmin import CSV to MYSQL, run ALTER TABLE thainame ORDER BY name, then export to CSV 6. Use MS-Excel open CSV again, add another column (named "mysqlorder"), set order no. from 1 to n for each row, then save as CSV 7. restart MYSQL 4.0.16 with default-character-set=latin1 8. use phpMyAdmin import CSV,run ALTER TABLE thainame ORDER BY name, then export to CSV 9. Use MS-Excel open CSV again, add another column (named "latin1order"), set order no. from 1 to n for each row, then save as CSV 10. remove MYSQL service (win32), then install my new modified tis620 database instead, restart MYSQL 4.0.16 with default-character-set=tis620 11. use phpMyAdmin import CSV,run ALTER TABLE thainame ORDER BY name, then export to CSV 12. Use MS-Excel open CSV again, add another column (named "neworder"), set order no. from 1 to n for each row, then save as CSV 13. use phpMyAdmin load CSV, export result data to thainame.sql.gz My testing table "thainame" and new source code ctype-tis620.c already upload in section Files. I also upload 2 screen shot files to show differences between 4 sort results (excel thai,mysql-tis620,mysql-latin1,mysql-newtis620) I can't state that which one is right and which one is wrong. Excel Thai (use win32 thai sort api) may be used as refference, but I also found strange behavior with character minus "-" (see the screen shot). This happened when sorting with english string not only thai string. My new ctype-tis620.c not only solve crashing bug, but also correct the sorting algorithm. You can see my screen shot for some notices. Thank you Sathit Jittanupat
[30 Jan 2004 0:32]
Alexander Barkov
Sathit, I'm going to apply your patch today or tomorrow. ctype-tis620.c states that anyone can use it for any purposes. The test supplied with the patch doesn't have any notes about usage. So I'm asking your permission, is it OK to insert your test into our test suit? Thank you.
[1 Feb 2004 10:06]
sathit jittanupat
Dear Mr.Barkov You can keep the test data for future testing and use it freely. I have no restriction about it. sathit
[3 Feb 2004 2:01]
Alexander Barkov
Sathit, can you please reduce the size of this test, which demonstrates that Thai sorting is done well. Is it possible to remove regular, not interesting lines, keeping only important points? Thank you!
[4 Feb 2004 14:38]
sathit jittanupat
Mr.Barkov, I am preparing a new data for sorting test (thai + english character) with more clearly description for each important points. It may take a time. Thank you.
[4 Feb 2004 21:06]
Alexander Barkov
Deer Jittanupat, please note it should not be big, please try to stay under 30K. Thank you very much.
[24 Aug 2004 12:03]
Sergei Golubchik
No feedback was provided. The bug is being suspended because we assume that you are no longer experiencing the problem. If this is not the case and you are able to provide the information that was requested earlier, please do so and change the status of the bug back to "Open". Thank you.