Bug #1721 False duplicate key warnings on FULLTEXT columns
Submitted: 31 Oct 2003 0:17 Modified: 23 Feb 2004 10:45
Reporter: Tony Butcher Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S2 (Serious)
Version:4.0.16 OS:Linux (Linux SuSE 8.2)
Assigned to: Assigned Account CPU Architecture:Any

[31 Oct 2003 0:17] Tony Butcher
Description:
We have a webmail system in which emails are stored in a MyISAM table. 
The table has FULLTEXT indexes to assist with people searching their 
mail archive. 

However, emails containing spam are also placed in the 'emails' table 
with a label of 'trash'. It appears that where there is spam or a virus in the 
body of an email, it can cause an error along the lines of:

Warning: duplicate entry for key 6.

'Key 6' would be the 'body' column, which has a FULLTEXT index.

We have also been encountering table corruptions with this system 
(certainly on 4.0.14) and have just yesterday upgraded to 4.0.16. No more 
table corruptions yet (CHECK TABLE gives OK), but these INSERT errors 
persist.

How to repeat:
emails CREATE TABLE `emails` (
  `email_id` int(12) NOT NULL auto_increment,
  `user_id` int(12) NOT NULL default '0',
  `mail_from` varchar(255) NOT NULL default '',
  `mail_to` varchar(255) NOT NULL default '',
  `mail_cc` varchar(255) NOT NULL default '',
  `mail_bcc` varchar(255) NOT NULL default '',
  `priority` enum('2 (High)','3 (Normal)','4 (Low)') default NULL,
  `subject` varchar(150) NOT NULL default '',
  `header` text NOT NULL,
  `mime_type` varchar(50) NOT NULL default '',
  `body` text,
  `date` datetime NOT NULL default '0000-00-00 00:00:00',
  `delete_date` datetime default NULL,
  `email_size` float NOT NULL default '0',
  `outgoing` enum('0','1') NOT NULL default '0',
  `spam_rating` float NOT NULL default '0',
  `mode` enum('received','sent','saved_sent','trash','saved_trash') NOT 
NULL default 'received',
  PRIMARY KEY  (`email_id`),
  KEY `user_id` (`user_id`),
  FULLTEXT KEY `mail_from` (`mail_from`),
  FULLTEXT KEY `mail_to` (`mail_to`),
  FULLTEXT KEY `subject` (`subject`),
  FULLTEXT KEY `body` (`body`)
) TYPE=MyISAM

Then do an 'INSERT INTO emails SET...', where 'body' contains 
something binary (such as a virus or spam - which may be because the 
email attachment boundary was not detected).

Suggested fix:
Currently we are getting around this by dropping all FULLTEXT indexes, 
and this allows the INSERT to occur without a warning. 

(We're also considering coding around this, so we make sure we don't 
store binary data in FULLTEXT columns.)
[31 Oct 2003 0:20] Tony Butcher
Corrected version to 4.0.16
[31 Oct 2003 11:15] Dean Ellis
I am unable to duplicate this behaviour using 4.0.16 or 4.0.17.  I am able to duplicate as many binary values for `body` as I like, as one would expect with FULLTEXT indexes not enforcing uniqueness.

If you can duplicate this with a new table and a specific 'binary' created for this test, please submit a test case with complete CREATE TABLE and INSERT statements for further testing.

Thank you
[12 Nov 2003 1:13] Sergei Golubchik
Looking at the code I can see that it is could be possible to get "duplicate entry" error for fulltext key. My guess is that it is indeed related to binary data and character set issues - which could result in some weird comparison problems if done incorrectly. In any case, I need a repeatable test case to fix the bug. Next time you'll get this error, please, try to save the row data that caused the error - I think that even inserting this row into an *empty* table (with identical structure) can cause "duplicate entry" error.
[12 Nov 2003 2:35] Tony Butcher
This error has not recurred (although binaries are still being stored), and 
I've not had the opportunity to find the offending spam/virus.

However, I think it's important to note that I am still getting table 
corruptions (about 1 per day) in my 'emails' and 'mail_cache' table, which 
have single FULLTEXT columns. 

(We even switched machines and this still happens).

I will try to find a specific offender if I can, although when the corrupting 
INSERT happens it succeeds (with a corruption), while the duplicate key 
error actually fails (so is easier to trap).
[23 Feb 2004 10:45] Sergei Golubchik
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.