Bug #37551 Junk detected in data contents sometimes when utf8mb3 character set is used
Submitted: 20 Jun 2008 16:30 Modified: 30 Jul 2010 9:30
Reporter: Hema Sridharan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:mysql-6.0-backup OS:Linux
Assigned to: Alexander Barkov CPU Architecture:Any

[20 Jun 2008 16:30] Hema Sridharan
Description:
1) I create database with character set utf8mb3 and collation utf8mb3_sinhala_ci.
2) I create underlying tables with same character set and collation.
3) Insert some values in to the table.
4) When you select contents from table, junk is detected in the data contents.

set names utf8mb3;
CREATE DATABASE bup_ts character set utf8mb3 collate utf8mb3_sinhala_ci;
use bup_ts;
CREATE TABLE t3 (a varchar(3) COLLATE utf8mb3_sinhala_ci DEFAULT NULL)ENGINE=MyISAM  CHARACTER SET utf8mb3  COLLATE utf8mb3_sinhala_ci;
INSERT INTO t3 VALUES
('a'),('b'),('c'),('d'),('f'),('h'),('i'),('j'),('k'),('l'),('m'),
('n');

How to repeat:
mysql> set names utf8mb3;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE DATABASE bup_ts character set utf8mb3 collate utf8mb3_sinhala_ci;
Query OK, 1 row affected (0.01 sec)

mysql> use bup_ts;
Database changed
mysql> CREATE TABLE t3 (a varchar(3) COLLATE utf8mb3_sinhala_ci DEFAULT NULL)ENGINE=MyISAM  CHARACTER SET utf8mb3  COLLATE utf8mb3_sinhala_ci;
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO t3 VALUES
    -> ('a'),('b'),('c'),('d'),('f'),('h'),('i'),('j'),('k'),('l'),('m'),
    -> ('n');
Query OK, 12 rows affected (0.01 sec)
Records: 12  Duplicates: 0  Warnings: 0

mysql> select * from t3;
+------+
| a    |
+------+
| a¥¥¥ |
| b¥¥¥ |
| c¥¥¥ |
| d¥¥¥ |
| f¥¥¥ |
| h¥¥¥ |
| i¥¥¥ |
| j¥¥¥ |
| k¥¥¥ |
| l¥¥¥ |
| m¥¥¥ |
| n¥¥¥ |
+------+
12 rows in set (0.00 sec)

mysql> show create table t3;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                    |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| t3    | CREATE TABLE `t3` (
  `a` varchar(3) COLLATE utf8mb3_sinhala_ci DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_sinhala_ci |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
[22 Jun 2008 16:36] Sveta Smirnova
Thank you for the report.

Verified as described.

Probably this is effect of collation-connection which is utf8mb3_general_ci by default, but manual should mention converting would not occur.

Workaround: SET collation_connection='utf8mb3_sinhala_ci'; before INSERT
[30 Jul 2010 9:30] Alexander Barkov
This problem was fixed earlier.

mbminlen was set to 3 in a mistake in
my_charset_utf8_sinhala_uca_ci definition.
Now it's correctly set to 1.