Bug #17555 Missing some Cyrillic characters with utf8 character set and collations
Submitted: 19 Feb 2006 11:08 Modified: 13 Jun 2007 19:36
Reporter: Vladimir Vitkovsky Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0.18 OS:Linux (Linux)
Assigned to: CPU Architecture:Any

[19 Feb 2006 11:08] Vladimir Vitkovsky
Description:
If I set utf8_general_ci collation - lose some cyrillic characters (2-3 units) of unicode content.
What is the characters depends on the configure parameters.
I try:
./configure --prefix=/usr/local/mysql --with-charset=cp1251 --with-collation=cp1251_general_ci
OR
./configure --prefix=/usr/local/mysql --with-charset=koi8r --with-collation=koi8r_general_ci

If I set cp1251_general_ci collation for DB whith unicode content - no characters lose.

It seemes somethin like this whas near 4.1.7 version.

How to repeat:
./configure --prefix=/usr/local/mysql --with-charset=cp1251 --with-collation=cp1251_general_ci

CREATE DATABASE `db_for_test` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
USE db_for_test
CREATE TABLE `table_for_test` (
`text_field` TEXT NOT NULL
) TYPE = MYISAM ;
INSERT INTO `table_for_test` ( `text_field` )
VALUES (
'й ц у к е н г ш щ з х ъ ф ы в а п р о л д ж э я ч с м и т ь б ю Й Ц У К Е Н Г Ш Щ З Х Ъ Ф Ы В А П Р О Л Д Ж Э Я Ч С М И Т Ь Б Ю '
);
If console does not support cyrillic, it is possible to create file that contane the same in unicode (INSERT INTO.............. ).
SELECT *
FROM `table_for_test`;

I got result:
й ц у к е н г ѿ щ з х ъ ф ы в а п р о л д ж э я ч с м и т ь б ю 

Й Ц У К Е Н Г Ш Щ З Х Ъ Ф Ы В А П Р О Л Д Ж Э Я Ч С М п Т Ь Б Ю

A you see: "ш" migrate to "ѿ", "И" migrate to "п"
[1 Mar 2006 14:52] Valeriy Kravchuk
Thank you for a problem report. Sorry, but I was not able to repeat the behaviour you described with 5.0.19-BK on SuSE 10:

openxs@linux:~> /usr/local/mysql/bin/mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1 to server version: 5.0.19-debug

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> CREATE DATABASE `db_for_test` DEFAULT CHARACTER SET utf8 COLLATE
    -> utf8_general_ci;
Query OK, 1 row affected (0.00 sec)

mysql> USE db_for_test
Database changed
mysql> CREATE TABLE `table_for_test` (
    -> `text_field` TEXT NOT NULL
    -> ) TYPE = MYISAM ;
Query OK, 0 rows affected, 1 warning (0.06 sec)

mysql> INSERT INTO `table_for_test` ( `text_field` )
    -> VALUES (
    -> 'й ц у к е н г ш щ з х ъ ф ы в а п р о л д ж э я ч с
    '> м и т ь б ю Й Ц У К Е Н Г Ш Щ З Х Ъ Ф Ы В А П Р О Л Д
    '> Ж Э Я Ч С М И Т Ь Б Ю '
    -> );
Query OK, 1 row affected (0.01 sec)

mysql> select * from table_for_test;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| text_field                                     |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| й ц у к е н г ш щ з х ъ ф ы в а п р о л д ж э я ч с
м и т ь б ю Й Ц У К Е Н Г Ш Щ З Х Ъ Ф Ы В А П Р О Л Д
Ж Э Я Ч С М И Т Ь Б Ю  |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select version();
+--------------+
| version()    |
+--------------+
| 5.0.19-debug |
+--------------+
1 row in set (0.00 sec)

mysql> exit
Bye
openxs@linux:~> uname -a
Linux linux 2.6.13-15-default #1 Tue Sep 13 14:56:15 UTC 2005 i686 i686 i386 GNU/Linux
openxs@linux:~> echo $LANG
en_US.UTF-8
[5 Mar 2006 16:43] Vladimir Vitkovsky
I got the same (as You) when I tryed to compile as:
./configure --prefix=/usr/local/mysql --with-charset=utf8
--with-collation=utf8_general_ci
As I described, happends whan I compile:
./configure --prefix=/usr/local/mysql --with-charset=cp1251
--with-collation=cp1251_general_ci
OR
./configure --prefix=/usr/local/mysql --with-charset=koi8r
--with-collation=koi8r_general_ci

If I do the same in MySQL 4.1.18 no simbols lose.
When I upgrade 4.1.18 -> 5.0.18  - I lose some simbols in utf8 as I described.

Probably, I would compile all anew in utf8 and had no efforts but then problems with databases in cp1251 begin.
After recompile in utf8 all sites look as if look utf8 the text in cp1251. When I have restored utf8 bases anew from backup they began to work normally, but bases in cp1251 display the text incorrectly, even after restoration from backup. If I edit the text in cp1251 bases via site the new text is displayed normally, but I did not manage to force to display normally the text from backup.
[6 Apr 2006 15:58] Valeriy Kravchuk
Have you tried to repeat with 5.0.19? Please, do, and inform about the results. Is it repeatable with official MySQL binaries?
[13 Apr 2006 17:58] Vladimir Vitkovsky
I Have tried to repeat it with 5.0.19 & 5.0.20 as:
--with-charset=cp1251
--with-collation=cp1251_general_ci

Result - the same.

I can not try it with official MySQL binaries Because binaries is not compile with cp1251 as default.

Vladimir
[2 Jun 2006 15:22] Valeriy Kravchuk
I think, you just have to build with --with-extra-charsets=all, and then execute "set names cp1251" before loading or selecting results. Please, check if it will work with the latest version, 5.0.22.
[2 Jul 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[13 May 2007 15:25] Nikolai Tsvetkov
Hi,

I noticed the same problem. I use 

mysql  Ver 5.0.27

with 

./configure \
  --prefix=/usr \
  --with-mysqld-user=mysql \
  --with-unix-socket-path=/var/run/mysql/mysql.sock \
  --localstatedir=/var/lib/mysql \
  --enable-assembler \
  --with-raid \
  --without-debug \
  --enable-thread-safe-client \
  --without-bench \
  --with-charset=cp1251 \
  --with-extra-charsets=all
  --with-vio \
  --with-openssl \
  --program-prefix="" \
  --program-suffix="" \

The table itself:

CREATE TABLE `test` (
  `Id` bigint(22) NOT NULL auto_increment,
  `Country` varchar(100) NOT NULL,
  PRIMARY KEY  (`Id`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

And again - ш and И changed with something else.
[13 May 2007 19:36] Valeriy Kravchuk
Please, try to repeat with a newer version, 5.0.41, and inform about the results. 5.0.22 is still very old, and many bugs were fixed.
[13 May 2007 20:04] Nikolai Tsvetkov
Sorry. That was my fault.

I tried to insert the strings with php.
But I relied on the defaults. When I set implicit encoding, everything went fine.

The right method (for google users, like me) :
$link = mysql_connect('host', 'user', 'password');
mysql_query('set names utf8', $link); 

I can confirm that with mysql Version 5.0.27 there are *no* problems with Cyrillic "ш" and "И".
[13 Jun 2007 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".