Bug #25562 Incorrect calculation of part size of internal tables for TEXT columns
Submitted: 11 Jan 2007 23:24 Modified: 25 Jan 2007 5:11
Reporter: Todd Farmer (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.0 -> OS:Linux (Linux)
Assigned to: Pekka Nousiainen CPU Architecture:Any
Tags: character set

[11 Jan 2007 23:24] Todd Farmer
Description:
When defining an NDB table with a TEXT(max_length_in_chars) column, MySQL creates allocates the space in the internal tables incorrectly when using a multiple-byte character set such as UTF.  For example, a column definition of TEXT(1) will result in a 256-byte header and 0 part size.  If the column is defined as TEXT(1) CHARACTER SET utf8, there is a 256-byte header plus a 2000-byte part size, despite the fact that this can be represented inline in the 256-byte header.

The same problem is seen when defining a column as TEXT(86) CHARACTER SET utf8 - a maximum length of 258 bytes and a value that should fit easily within a 256-byte header + one 2000-byte part.  Instead, the part size created is 4000 bytes.

Based on the above example, 2000 bytes of memory per TEXT column per row are wasted. 

How to repeat:
Create table:

CREATE TABLE text_test (
  id INT primary key,
  first TEXT(1),
  second TEXT(1) CHARACTER SET utf8,
  third TEXT(85),
  fourth TEXT(85) CHARACTER SET utf8,
  fifth TEXT(86),
  sixth TEXT(86) CHARACTER SET utf8
) ENGINE = NDB;

Then run ndb_desc against the table.

Suggested fix:
Correct part size estimation for multi-byte character sets.
[12 Jan 2007 1:30] Todd Farmer
Output from ndb_desc:

linux-waters:~ # /usr/local/mysql/bin/ndb_desc text_test --ndb-connectstring=127.0.0.1 --database=blob_parts
-- text_test --
Version: 1
Fragment type: 5
K Value: 6
Min load factor: 78
Max load factor: 80
Temporary table: no
Number of attributes: 7
Number of primary keys: 1
Length of frm data: 341
Row Checksum: 1
Row GCI: 1
TableStatus: Retrieved
-- Attributes --
id Int PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY
first Text(256,0;0;latin1_swedish_ci) NULL AT=FIXED ST=MEMORY
second Text(256,2000;16;utf8_general_ci) NULL AT=FIXED ST=MEMORY
third Text(256,0;0;latin1_swedish_ci) NULL AT=FIXED ST=MEMORY
fourth Text(256,2000;16;utf8_general_ci) NULL AT=FIXED ST=MEMORY
fifth Text(256,0;0;latin1_swedish_ci) NULL AT=FIXED ST=MEMORY
sixth Text(256,4000;8;utf8_general_ci) NULL AT=FIXED ST=MEMORY

-- Indexes --
PRIMARY KEY(id) - UniqueHashIndex
PRIMARY(id) - OrderedIndex
[23 Jan 2007 5:07] Tomas Ulin
===== sql/field.cc 1.356 vs edited =====
--- 1.356/sql/field.cc	2007-01-22 11:15:18 +08:00
+++ edited/sql/field.cc	2007-01-22 11:09:22 +08:00
@@ -9333,11 +9333,11 @@
   switch (packlength)
   {
   case 1:
-    return 255 * field_charset->mbmaxlen;
+    return 255;
   case 2:
-    return 65535 * field_charset->mbmaxlen;
+    return 65535;
   case 3:
-    return 16777215 * field_charset->mbmaxlen;
+    return 16777215;
   case 4:
     return (uint32) 4294967295U;
   default:
[23 Jan 2007 9:21] Alexander Barkov
If one create a TEXT(N) column, the N is just helps to choose
a proper type: TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT.

TEXT columns (unlike VARCHAR columns) are always limited in BYTES,
not in CHARACTERS.

The limits are always:

256 bytes for TINYTEXT
64K for TEXT
16MB for MEDIUMTEXT
4GB for LONGTEXT.

The fix for field.cc is not correct. It will make many non-ndb test crash.

This bug should be fixed in NDB code.
[23 Jan 2007 11:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/18613

ChangeSet@1.2368, 2007-01-23 12:58:10+01:00, pekka@clam.ndb.mysql.com +1 -0
  ndb - bug#25562 use byte-size max_data_length() when setting blob part size
[24 Jan 2007 2:07] Tomas Ulin
pushed to 5.1.15
[24 Jan 2007 7:55] Tomas Ulin
pushed to 5.0.36
[25 Jan 2007 5:11] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented bugfix in 5.0.36 and 5.1.15 changelogs.