Bug #31950 repair table hangs while processing multicolumn utf8 fulltext index
Submitted: 30 Oct 2007 14:28 Modified: 15 Nov 2007 15:31
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S2 (Serious)
Version:5.0.50 OS:Any
Assigned to: Sergey Vojtovich CPU Architecture:Any
Tags: bfsm_2007_11_01, fulltext, hang

[30 Oct 2007 14:28] Shane Bester
Description:
repair table <table> use_frm hangs in an infinite loop. Here's the stack trace of the thread while it's hung:

mysqld-debug.exe!ft_simple_get_word
mysqld-debug.exe!ft_parse
mysqld-debug.exe!_mi_ft_parse
mysqld-debug.exe!_mi_ft_parserecord
mysqld-debug.exe!sort_ft_key_read
mysqld-debug.exe!find_all_keys
mysqld-debug.exe!_create_index_by_sort
mysqld-debug.exe!mi_repair_by_sort
mysqld-debug.exe!ha_myisam::repair
mysqld-debug.exe!ha_myisam::repair
mysqld-debug.exe!handler::ha_repair
mysqld-debug.exe!mysql_admin_table
mysqld-debug.exe!mysql_repair_table
mysqld-debug.exe!mysql_execute_command
mysqld-debug.exe!mysql_parse
mysqld-debug.exe!dispatch_command
mysqld-debug.exe!do_command
mysqld-debug.exe!handle_one_connection
mysqld-debug.exe!pthread_start
mysqld-debug.exe!_callthreadstart
mysqld-debug.exe!_threadstart

byte ft_simple_get_word(CHARSET_INFO *cs, byte **start, const byte *end,
FT_WORD *word, my_bool skip_stopwords)
{
byte *doc= *start;
uint mwc, length, mbl;
DBUG_ENTER("ft_simple_get_word");

do
{
for (;; doc+= mbl) <-----------mbl is zero, so this loop goes forever!!!
{
if (doc >= end) DBUG_RETURN(0);
if (true_word_char(cs, *doc)) break;
mbl= my_mbcharlen(cs, *(uchar *)doc);
}

table structure is as follows:

mysql> show create table vb_post1\G
*************************** 1. row ***************************
       Table: vb_post1
Create Table: CREATE TABLE `vb_post1` (
  `postid` int(10) unsigned NOT NULL auto_increment,
  `threadid` int(10) unsigned NOT NULL default '0',
  `parentid` int(10) unsigned NOT NULL default '0',
  `username` varchar(100) collate utf8_unicode_ci NOT NULL default '',
  `userid` int(10) unsigned NOT NULL default '0',
  `title` varchar(250) collate utf8_unicode_ci NOT NULL default '',
  `dateline` int(10) unsigned NOT NULL default '0',
  `pagetext` mediumtext collate utf8_unicode_ci NOT NULL,
  `allowsmilie` smallint(6) NOT NULL default '0',
  `showsignature` smallint(6) NOT NULL default '0',
  `ipaddress` varchar(15) collate utf8_unicode_ci NOT NULL default '',
  `iconid` smallint(5) unsigned NOT NULL default '0',
  `visible` smallint(6) NOT NULL default '0',
  `attach` smallint(5) unsigned NOT NULL default '0',
  `infraction` smallint(5) unsigned NOT NULL default '0',
  `reportthreadid` int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (`postid`),
  KEY `userid` (`userid`),
  KEY `threadid` (`threadid`,`userid`),
  KEY `idx_dateline` (`dateline`),
  FULLTEXT KEY `title` (`title`,`pagetext`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.01 sec)

How to repeat:
no simple testcase yet.

Suggested fix:
this seems very similar to the hang in bug #29464 except it's not chinese and the loop is slightly different.
[30 Oct 2007 14:35] MySQL Verification Team
some debug info

Attachment: bug31950_debug_info.txt (text/plain), 4.56 KiB.

[1 Nov 2007 13:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/36868

ChangeSet@1.2547, 2007-11-01 16:27:01+04:00, svoj@mysql.com +1 -0
  BUG#31950 - repair table hangs while processing multicolumn utf8
              fulltext index
  
  Having a table with broken multibyte characters may cause fulltext
  parser dead-loop.
  
  Since normally it is not possible to insert broken multibyte sequence
  into a table, this problem may arise only if table is damaged.
  
  Affected statements are:
  - CHECK/REPAIR against damaged table with fulltext index;
  - boolean mode phrase search against damaged table with or
    without fulltext inex;
  - boolean mode searches without index;
  - nlq searches.
  
  No test case for this fix. Affects 5.0 only.
[14 Nov 2007 9:41] Bugs System
Pushed into 6.0.4-alpha
[14 Nov 2007 9:45] Bugs System
Pushed into 5.1.23-rc
[14 Nov 2007 9:50] Bugs System
Pushed into 5.0.52
[15 Nov 2007 15:31] Paul DuBois
Noted in 5.0.52 changelog.

A column with malformed multi-byte characters could cause the full-text parser to go into an infinite loop.