Bug #2065 Inserts hang on gb2312, big5 character sets with fulltext
Submitted: 9 Dec 2003 16:47 Modified: 11 Dec 2003 0:22
Reporter: [ name withheld ] Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Command-line Clients Severity:S1 (Critical)
Version:Ver 14.3 Dist 4.1.1-alpha pc-linux-i686 OS:Linux (Mandrake Linux 9.1)
Assigned to: Assigned Account CPU Architecture:Any

[9 Dec 2003 16:47] [ name withheld ]
Description:
A very simple MyISAM table with gb2312 or big5 FULLTEXT indexing
defined on columns hangs the client on insert, as well as hangs
subsequent inserts even if the
inserts have nothing to do with the gb2312 or big5 columns.

select * from the table then shows empty records.

But an initial (after recreating table) insert that has nothing
to do with gb2312 or big5 succeeds.

 (Used MySQL client and
the python MySQL module, which probably just uses the MySQL
client C code anyway).

(Tested substituting big5, utf8 & latin1 for gb2312 everywhere. Locks up for gb2312 and big5, doesn't lock up for utf8 & latin1)

ALthough it doesn't crash mysqld (as one bug report mentioned),
and I can make queries to different tables, it makes mysqld have
problems restarting, and I have to do 'killall -9 mysqld'
(running the script include just hangs until I do the killall,
which respawns mysqld, the the script runs fine)

Commenting out the FULLTEXT makes this problem disappear.

Also, adding "COLLATE big5_chinese_ci" or
"COLLATE gb2312_chinese_ci" after each
"CHARACTER SET" seemed to work once, but I
couldn't recreate this later. Theoretically,
these are default collations and shouldn't be
needed.

How to repeat:

DROP DATABASE IF EXISTS somedb;
CREATE DATABASE somedb;

use somedb;

CREATE TABLE parss_articles_gb2312
  (article_id       INT(10) NOT NULL AUTO_INCREMENT
  ,feed_id          INT(10) NOT NULL
  ,articleDate      DATETIME NOT NULL
  ,discoveryDate    DATETIME NOT NULL
  ,encoding         VARCHAR(25)
  ,title            VARCHAR(250) CHARACTER SET gb2312
  ,link             VARCHAR(250)
  ,description      LONGTEXT CHARACTER SET gb2312
  ,htmlDescription  LONGTEXT CHARACTER SET gb2312
  ,body             LONGTEXT CHARACTER SET gb2312
  ,guid             VARCHAR(250)
  ,fingerPrint      VARCHAR(250)
  ,firstFeedFlag    CHAR(1) DEFAULT 'N'
  ,PRIMARY KEY (article_id)
  ,INDEX (articleDate)
  ,INDEX (feed_id)
  ,INDEX (fingerPrint)
  ,FULLTEXT (description)
  ,FULLTEXT (body)
  );

insert into parss_articles_gb2312 (encoding,title,link,description) VALUES ('gb2312','junktitle','http://localhost','much ado about nothing');
[10 Dec 2003 6:21] Sergei Golubchik
Please do not submit the same bug more than once. An existing
bug report already describes this very problem. Even if you feel
that your issue is somewhat different, the resolution is likely
to be the same. Because of this, we hope you add your comments
to the original bug instead.

Thank you for your interest in MySQL.

Additional info:

already fixed in 4.1.2
[10 Dec 2003 16:42] [ name withheld ]
Are you referring to CJK bug #2033 as being identical, or which?

#2033 referred to UTF8 as a problem, I have no problem with UTF-8
encoding, I have a problem with GB2132 & BIG5.

#2033 crashed the server. Mine doesn't crash the server, it makes
the client stop responding, but I can start up a new client and
use the same database, only that table is problematic. (Yes, the
server is problematic, but it can keep serving up the same database,
and I can query that table without restarting the server).

#2033 was able to deal with a table of record length 1, mine is not.

#2033 had a problem creating a full-text index, while my table
had the full-text index created.

There's also a bug #1977 that has an issue with full-text indexing,
but it's a more complicated example of an insert, I don't have a
problem with full-text on iso-8859-1 & utf-8, and there's no
indication in the solution of what the issue was that fixed.

If you are referring to another bug report that I missed, please
tell me which one.

From an end user's perspective, I downloaded 4.1.1 binary last
night as 23 Megs over a 56k modem to test out CJK, discovered
a problem, and spent some time documenting in what I hoped was a
helpful bug report to make your job easier. If it had seemed to
me that this bug was obviously a repeat of another, I would have
simply downloaded the latest CVS and tried compiling it myself.
Only 26 minutes more to go now.
[10 Dec 2003 23:37] [ name withheld ]
Fixed by Dec 9 4.1.2 snapshot.
[11 Dec 2003 0:22] Sergei Golubchik
sorry for not being more detailed.
It was the bug reported by a customer and not via bug database, thus it does not have a number.

There was an entry in the manual's changelog:

* Fixed a hang in full-text indexing of EUC-JP (ujis) data.

but it is misleading as all multi-byte charsets besides utf8 were affected, not only ujis.
I changed it to:

* Fixed a hang in full-text indexing of strings in multi-byte (all besides utf8) charsets.