Bug #52520 Difference in tinytext utf column metadata
Submitted: 1 Apr 2010 10:57 Modified: 21 Jul 2010 15:27
Reporter: Nidhi Shrotriya Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.5.3-m3 OS:Any
Assigned to: Alexander Barkov CPU Architecture:Any

[1 Apr 2010 10:57] Nidhi Shrotriya
Description:
Difference is seen in length for hex value for a utf8 tinytext column (as below) when column metadata is enabled, between mysql-5.1 & mysql-5.5.3-m3. This looks like a recent change (in Celosia builds) as the results were recorded recently against build to be for Celosia. 
Length for utf8mb4 also seems to be different.

    [java]  select s1, hex(s1), length(s1), char_length(s1), octet_length(s1) from t2_upg_utf8;
    [java]  Catalog    Database    Table    Table_alias    Column    Column_alias    Type    Length    Max length    Is_null    Flags    Decimals    Charsetnr
    [java]  def    testdb_utf    t2_upg_utf8    t2_upg_utf8    s1    s1    252    255    255    Y    16    0    7
    [java] -def                    hex(s1)    253    1530    510    Y    0    0    7
    [java] +def                    hex(s1)    253    510    510    Y    0    0    7
    [java]  def                    length(s1)    8    10    3    Y    32896    0    63
    [java]  def                    char_length(s1)    8    10    3    Y    32896    0    63
    [java]  def                    octet_length(s1)    8    10    3    Y    32896    0    63
    [java] @@ -1264,7 +1264,7 @@
    [java]  select s1, hex(s1), length(s1), char_length(s1), octet_length(s1) from t2_upg_utf8mb4;
    [java]  Catalog    Database    Table    Table_alias    Column    Column_alias    Type    Length    Max length    Is_null    Flags    Decimals    Charsetnr
    [java]  def    testdb_utf    t2_upg_utf8mb4    t2_upg_utf8mb4    s1    s1    252    255    255    Y    16    0    7
    [java] -def                    hex(s1)    253    2040    510    Y    0    0    7
    [java] +def                    hex(s1)    253    504    510    Y    0    0    7
    [java]  def                    length(s1)    8    10    3    Y    32896    0    63
    [java]  def                    char_length(s1)    8    10    3    Y    32896    0    63
    [java]  def                    octet_length(s1)    8    10    3    Y    32896    0    63
    [java]

Tables were created as below-
create table t2_upg_utf8 (s1 tinytext character set utf8);
create table t2_upg_utf8mb4 (s1 tinytext character set utf8mb4);
insert into t2_upg_utf8 select repeat('a',300);
insert into t2_upg_utf8 select repeat('Ñ',300);
insert into t2_upg_utf8 select repeat('aÑ',300);
insert into t2_upg_utf8 select repeat('Ña',300);
insert into t2_upg_utf8 select repeat('ÑÑ',300);
SET NAMES koi8r;

How to repeat:
As mentioned in how to repeat.
[26 Apr 2010 12:30] Alexander Barkov
# From the manual:
http://dev.mysql.com/doc/refman/5.5/en/c-api-datatypes.html

unsigned long length
The width of the field. This corresponds to the display length, in bytes.
The server determines the length value before it generates the result set, so this is the minimum length required for a data type capable of holding the largest possible value from the result column, without knowing in advance the actual values that will be produced by the query for the result set.

So the change in utf8 from 1530 to 510 looks fine.

The value 504 is utf8mb4 is wrong. It should be 510 as well.
[28 May 2010 11:37] Alexander Barkov
How to repeat from command line:

$mysql --default-character-set=latin1 --column-type-info test

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 5.6.99-m4-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> drop table if exists t1; create table t1 (s1 tinytext character set utf8mb4); insert into t1 select repeat('a',300); select hex(s1) from t1 limit 0;
Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected, 1 warning (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 1

Field   1:  `hex(s1)`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  latin1_swedish_ci (8)
Length:     504
Max_length: 0
Decimals:   0
Flags:      

0 rows in set (0.00 sec)
[31 May 2010 6:00] Alexander Barkov
The same problem is repeatable with this SQL script:

mysql> drop table if exists t1, t2; create table t1 (a tinytext) character set utf8; create table t2 as select concat(a) from t1; show create table t2;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

+-------+----------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                         |
+-------+----------------------------------------------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (
  `concat(a)` varchar(85) CHARACTER SET utf8 DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

As of version 5.1 a varchar(255) column was created, which was correct.

5.5 creates varchar(85), which is is too short.
[2 Jun 2010 10:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/109920
[2 Jun 2010 11:53] Alexey Botchkov
ok to push.
[2 Jun 2010 12:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/109972
[2 Jun 2010 12:50] Alexander Barkov
Pushed into mysql-trunk-bugfixing (5.5.5-m3)
Pushed into mysql-next-mr-bar (5.6.99-m4)
[15 Jun 2010 8:13] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100615080459-smuswd9ooeywcxuc) (version source revid:alik@sun.com-20100615080459-smuswd9ooeywcxuc) (merge vers: 5.5.5-m3) (pib:16)
[15 Jun 2010 8:29] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100615080558-cw01bzdqr1bdmmec) (version source revid:marko.makela@oracle.com-20100601134335-ccthwwru23kn09qw) (pib:16)
[21 Jul 2010 15:27] Paul DuBois
Noted in 5.5.5 changelog.

The length and max_length metadata values were incorrect for columns
with the TEXT family of data types that used multibyte character sets
This bug was introduced in MySQL 5.5.3.