Bug #1417 LIKE operator and tonic accent of vowel letters in Greek language
Submitted: 28 Sep 2003 4:50 Modified: 18 Dec 2003 11:21
Reporter: Demosthenes Koptsis Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:3.23.55-log OS:Linux (Linux)
Assigned to: CPU Architecture:Any

[28 Sep 2003 4:50] Demosthenes Koptsis
Description:
Lets say that we have a TABLE 'Words' fill in with words of Latin and Greek alphabet.
The following SELECT statement with LIKE operator and with Latin character set works prospectivly.

SELECT * FROM Words WHERE word LIKE 'fa%'

OUTPUT: father, family....

In Greek language althougt there are letters with tonic accent on vowel letters
such as Greek small letter omicron with tonic accent (Unicode nr:03CC) or Greek small letter alpha with tonic accent (Unicode nr: 03AC).

There are also and vowel letters without tonic accent such as Greek small letter omicron (Unicode nr: 038F).

Now, in some words with vowel letters with tonic accent the previous SELECT statemnet does not work as it should be.

SELECT * FROM Words WHERE word LIKE 'φά%';

(i use the letters 03D5 and 03AC from unicode standard of Greek language)

OUTPUT: words with the second letter vowel but not 03AC from unicode standard.

How to repeat:
1)Make a table Words and fill in Greek words and English words.
I used the translations from English word 'tax' and word 'lighthouse'.

tax = φόρος
lighthouse = φάρος

These two words are translated into Greek with words which the first letter is 
Greek small letter phi (Unicode nr 03D5). 

The second letter is Greek small letter omicron (Unicode nr: 038F) with tonic accent and Greek small letter alpha with tonic accent (Unicode nr: 03AC) respectively.

2) Run the SELECT statement 

SELECT * FROM Words WHERE word LIKE 'XY%';

XY are the first two letters of a word.
Replae X with the Greek small letter phi (Unicode nr: 03D5)
Try to set Y the a tonic accent vowel letter of the Greek alphabet, let's say the Greek small letter alpha with tonic accent (Unicode nr: 03AC).

3) You should take output not only the translated word 'lighthouse' but also the
translated word 'tax'
[30 Sep 2003 2:37] Alexander Keremidarski
LIKE operator is character set dependand so it is important what character set mysqld is started with

Full Unicode support is available in 4.1, for 3.23 and 4.0 you sould not expect it to work well.
[18 Dec 2003 11:21] Alexander Keremidarski
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Additional info:

"Description: Lets say that we have a TABLE 'Words' fill in with words of Latin
and Greek alphabet."

Prior to 4.1 MySQL supports only one Character Set at a time and it is set by mysqld startup parameter default-character-set

In addition to each character set only one collation applies and it can't be changed. 

As a rsult it is impossible to mix "Latin and Greek alphabet" characters or any other two or more character sets and expect any comparison or sorting operation to work properly for all character sets.

Following character sets are supported in 3.23

mysql> show variables like "character_sets";
+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name  | Value                                                                                               |
+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| character_sets | latin1 big5 czech euc_kr gb2312 gbk sjis tis620 ujis dec8 dos german1 hp8 koi8_ru latin2 swe7 usa7 cp1251 danish hebrew win1251 estonia hungarian koi8_ukr win1251ukr greek win1250 croat cp1257 latin5 |
+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+