Bug #67257 FULLTEXT index not being properly maintained on innoDB table
Submitted: 16 Oct 2012 8:43 Modified: 13 Dec 2012 16:01
Reporter: Jan Rusch Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:5.6 OS:Any (MS Windows 7 64bit, Linux)
Assigned to: CPU Architecture:Any
Tags: fts, innodb
Triage: Needs Triage: D2 (Serious)

[16 Oct 2012 8:43] Jan Rusch
Description:
We have a similar problem as described in bug #65512.

FTS searches return different results when using a MyISAM FTS as using an InnoDB FTS. Token length is the same for both engines.

This is non determinisitic. Sometimes the FTS returns the correct result, sometime not, even on the same SELECT on the same server.

Versions affected 5.6.5, 5.6.6, 5.6.7

How to repeat:
As this in non-deterministic, a repetition of the bugs is very difficult:

We use the following table structure for our FTS search. The table is dropped & recreated every night. The FTS index is always created after filling the table and is not changed after creation. The ENGINE is set to MyISAM on one 5.6.7 server and to InnoDB on the other 5.6.7 server.

innodb_ft_min_token_size=2
ft_min_word_len=2

DROP TABLE IF EXISTS `tecdoc`.`search_fulltext`;
CREATE TABLE  `tecdoc`.`search_fulltext` (
  `FTS_DOC_ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `ArtNr` varchar(12) NOT NULL,
  `SearchBez` text,
  `SearchSyn` text,
  `SearchKritWert` text,
  `SearchKritWertBez` text,
  `SearchHersteller` text,
  `SearchTextbaustein` text,
  `SearchFahrzeug` text,
  `SearchKBA` text,
  `SearchMotor` text,
  `SearchBaugruppe` text,
  `SearchVerwendungszweck` text,
  `SearchMotortyp` text,
  `SearchRefNr` text,
  PRIMARY KEY (`ArtNr`),
  UNIQUE KEY `FTS_DOC_ID_INDEX` (`FTS_DOC_ID`),
  FULLTEXT KEY `idx_ft_search_fulltext` (`ArtNr`,`SearchBez`,`SearchSyn`,`SearchKritWert`,`SearchKritWertBez`,`SearchHersteller`,`SearchTextbaustein`,`SearchFahrzeug`,`SearchKBA`,`SearchMotor`,`SearchBaugruppe`,`SearchVerwendungszweck`,`SearchMotortyp`,`SearchRefNr`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

As a query we use e.g.:

SELECT ArtNr
FROM tecdoc.search_fulltext
WHERE MATCH(ArtNr,SearchBez,SearchSyn,SearchKritWert,SearchKritWertBez,SearchHersteller,SearchTextbaustein,SearchFahrzeug,SearchKBA,SearchMotor,SearchBaugruppe,SearchVerwendungszweck,SearchMotortyp,SearchRefNr)
AGAINST ("+Opel* +Anhänge*" IN BOOLEAN MODE)

On the MyISAM FTS the query returns 121 rows (which is correct).

On the InnoDB FTS the query returns 119 rows (which is wrong). If this error occurs, the InnoDB FTS query always return too few rows, never too many.

One of the missing hits is:

SELECT *
FROM search_fulltext
WHERE ArtNr = "614162150651";

on the InnoDB FTS (some parts of result edited to clarify, as texts are much longer):

*************************** 1. row ***************************
FTS_DOC_ID: 22898

ArtNr: 614162150651

TOKEN FOUND HERE --> SearchBez: Elektrosatz Anhängevorrichtung # Electric Kit towbar #...

SearchSyn: Cable Kit # jeux de cables # Kabelsatz

SearchKritWert: Steckdose f_ AHK 7_polig # Socket for trailer coupling 7poles... # Montagezeit <1Std_ # Labour <1Hrs_ # Temps de mont
age <1Heures # Tempo montaggio <1h # Tiempo de montaje <1Horas # Montagetijd <1uren # Monteringstid <1timer # Monterings
tid <1timmar ...

SearchKritWertBez: Fahrzeugausstattung für Fahrzeuge mit Checkkontrollsystem # Vehicle Equipment for vehicles with
check control system # Équipement véhicule pour véhicules avec système de check-control #...

TOKEN FOUND HERE --> SearchHersteller: Thule # OPEL # THULE

SearchTextbaustein: NULL

TOKEN AGAIN HERE --> SearchFahrzeug: MERIVA A # OPEL # MERIVA # 1_7_DTI # 1_7 DTI # 1_8 # 1_6 # 1_6_16V # 1_6 16V # 1_7_CDTI # 1_7 CD
TI # 1_4_16V_Twinport # 1_4 16V Twinport # 1_3_CDTI # 1_3 CDTI # 1_6_Turbo # 1_6 Turbo

SearchKBA: 0035538 # 0035AAU # 0035481 # 0035482 # 0035AAV # 0035516 # 0035AAS # 0035483 # 0035AAW # 0035484 # 0035AAR # 0035AAA # 0035AGX # 0035543 # 0035AAT # 0035AAB # 0035AAC

SearchMotor: Y_17_DTL # Z_14_XEP # Z_16_XEP # Z_13_DTJ # Z_16_LET # Z_17_DTR # A_17_DTR # Z_18_XE # Y_17_DT #
 Z_16_SE # Z_16_XE # Z_17_DTH # Z_17_DT # Z_13_DT # Y_13_DT

SearchBaugruppe: Gancio traino # Trekhaak # Enganche de remolque # Anhængeranordning # ...

SearchVerwendungszweck: NULL

SearchMotortyp: Otto # Petrol Engine # Essence ...

SearchRefNr: 614162150651 # 6141621506 # 61416215 # 735743

1 row in set (0.00 sec)

As one can see the tokens "Anhänge" and "Opel" are both included in this row.
[16 Oct 2012 8:44] Jan Rusch
By checking the query result fn the InnoDB FTS query, I also found the following result:

+--------------+
| ArtNr        |
+--------------+
| 316162340111 |
...
| 614161410651 |
| 614161420651 |
| 614161430651 |
| 614161440651 |
| 614161450651 |
| 614161480651 |
| 614161490651 |
| 614161500651 | <- Double hit
| 614161500651 |
| 614161510642 |
| 614161510642 |
| 614161520651 |
| 614161520651 |
| 614161530651 |
| 614161540651 |
| 614161540651 |
...

This should not be the case on this query with 'ArtNr' beeing a primary key:

Query:

SELECT *
FROM search_fulltext
WHERE ArtNr = "614161500651"

returns

*************************** 1. row ***************************
FTS_DOC_ID: 22849
ArtNr: 614161500651
SearchBez: Schutzkappe AHK # Protection Cap Tow Bar # Capuchon Dispositif d'Attelage # Cappuccio Gancio Tra
ino # Capuchón Protección para Gancho de Remolque # Afdekkap Trekhaak # Kugleskyd # Dragkroksskydd # Védösapka Létére Vo
ntatóhorog # krytka TZ # Nakladka Haka Holowniczego # προστατευτικό κάλυμμα για ρυμούλκα
SearchSyn: NULL
SearchKritWert: NULL
SearchKritWertBez: Farbe schwarz # Colour Black # Couleur noir # Colore nero # Pintura negro # Kleur zwart # Farve
sort # Färg svart # Farge svart # Väri Musta # Szín fekete # Cor preto # Цвет черный # Farba cierny # Barva černá # Kolo
r czarny # Χρώμα μαύρο # Culoare negru # Renk siyah # Boja crna boja # цвят черен # Krāsa melns # spalva juoda # Värv mu
st # Barva crna barva # Color Black
SearchHersteller:  # OPEL # VAUXHALL # GENERAL MOTORS
SearchTextbaustein: NULL
SearchFahrzeug:  #
SearchKBA: NULL
SearchMotor: NULL
SearchBaugruppe: теглич # priekabos prikabintuvas # Piekabes aprīkojums # Vetokoukku  laite # vonóhorog # Priprav
a za vleko # Trailer Hitch # Järelhaagise seadmed # Cupla remorca trailer # Uredjaj za vucu # Römork baglanti tertibati
# żażné zariadenie # Прицепное оборудование # Dispositivo de reboque # Zestaw zaczepu przyczepy # Κοτσαδόρος # Tazne zar
izeni # Släpvagnskoppling # Tilhengerinnretning # Anhængeranordning # Anhängevorrichtung # Dispositif d'attelage # Trekh
aak # Gancio traino # Enganche de remolque
SearchVerwendungszweck: NULL
SearchMotortyp: NULL
SearchRefNr: 6736294 # 93160575 # 614161500651 # 6141615006 # 61416150
1 row in set (0.00 sec)

So why do I get two hits for this row in the query result.

Even when we use DISTINCT:

SELECT DISTINCT ArtNr
FROM tecdoc.search_fulltext
WHERE MATCH(ArtNr,SearchBez,SearchSyn,SearchKritWert,SearchKritWertBez,SearchHersteller,SearchTextbaustein,SearchFahrzeug,SearchKBA,SearchMotor,SearchBaugruppe,SearchVerwendungszweck,SearchMotortyp,SearchRefNr)
AGAINST ("+Opel* +Anhänge*" IN BOOLEAN MODE)
ORDER BY ArtNr

the result is definitly not correct:

+--------------+
| ArtNr        |
+--------------+
| 316162340111 |
| 614160100651 |
| 614160110641 |
| 614160120641 |
| 614160150641 |
| 614160160641 |
| 614160200641 |
| 614160220641 |
| 614160240641 |
| 614160270641 |
| 614160280641 |
| 614160300641 |
| 614160330641 |
| 614160350651 |
| 614160420641 |
| 614160430641 |
| 614160440641 |
| 614160470641 |
| 614160500641 |
| 614160510641 |
| 614160520641 |
| 614160530641 |
| 614160540641 |
| 614160550641 |
| 614160560641 |
| 614160570641 |
| 614160580641 |
| 614160590641 |
| 614160600641 |
| 614160610641 |
| 614160620641 |
| 614160630641 |
| 614160640641 |
| 614161010641 |
| 614161200111 |
| 614161210111 |
| 614161220111 |
| 614161230111 |
| 614161240111 |
| 614161250111 |
| 614161300651 |
| 614161310651 |
| 614161320651 |
| 614161330651 |
| 614161340651 |
| 614161350651 |
| 614161360651 |
| 614161370651 |
| 614161380651 |
| 614161390651 |
| 614161400651 |
| 614161410651 |
| 614161420651 |
| 614161430651 |
| 614161440651 |
| 614161450651 |
| 614161480651 |
| 614161490651 |
| 614161500651 | <-- Double
| 614161500651 |
| 614161510642 | <-- Double
| 614161510642 |
| 614161520651 | <-- Double
| 614161520651 |
| 614161530651 |
| 614161540651 | <-- Double
| 614161540651 |
| 614161550651 | <-- Double
| 614161550651 |
| 614161560651 | <-- Double
| 614161560651 |
| 614161580651 | <-- Double
| 614161580651 |
| 614161590651 |
| 614161600651 |
| 614161610651 |
| 614161630651 |
| 614161640651 |
| 614161650651 |
| 614161660651 |
| 614161680651 |
| 614161710651 |
| 614161730651 |
| 614161730651 |
| 614161740651 |
| 614161750651 |
| 614161760651 |
| 614161770651 |
| 614161790651 |
| 614161800651 |
| 614161820651 |
| 614161840651 |
| 614161850651 |
| 614161860651 |
| 614161870651 |
| 614161900651 |
| 614161910651 |
| 614162000651 |
| 614162060651 |
| 614162070651 |
| 614162080651 |
| 614162090651 |
| 614162100651 |
| 614162110651 |
| 614162120651 |
| 614162130651 |
| 614162140651 |
| 614162160651 |
| 614162170651 |
| 614162180651 |
| 614162190651 |
| 614162200651 |
| 614162210651 |
| 614162220651 |
| 614162230651 |
| 614162240651 |
| 614162250651 |
| 614162260651 |
| 614162270651 |
+--------------+
119 rows in set (0.00 sec)

Same query on MyISAM table:

+--------------+
| ArtNr        |
+--------------+
| 316162340111 |
| 614002000651 |
| 614160100651 |
| 614160110641 |
| 614160120641 |
| 614160150641 |
| 614160160641 |
| 614160200641 |
| 614160220641 |
| 614160240641 |
| 614160270641 |
| 614160280641 |
| 614160300641 |
| 614160330641 |
| 614160350651 |
| 614160420641 |
| 614160430641 |
| 614160440641 |
| 614160470641 |
| 614160500641 |
| 614160510641 |
| 614160520641 |
| 614160530641 |
| 614160540641 |
| 614160550641 |
| 614160560641 |
| 614160570641 |
| 614160580641 |
| 614160590641 |
| 614160600641 |
| 614160610641 |
| 614160620641 |
| 614160630641 |
| 614160640641 |
| 614161010641 |
| 614161200111 |
| 614161210111 |
| 614161220111 |
| 614161230111 |
| 614161240111 |
| 614161250111 |
| 614161300651 |
| 614161310651 |
| 614161320651 |
| 614161330651 |
| 614161340651 |
| 614161350651 |
| 614161360651 |
| 614161370651 |
| 614161380651 |
| 614161390651 |
| 614161400651 |
| 614161410651 |
| 614161420651 |
| 614161430651 |
| 614161440651 |
| 614161450651 |
| 614161480651 |
| 614161490651 |
| 614161500651 |
| 614161510642 |
| 614161520651 |
| 614161530651 |
| 614161540651 |
| 614161550651 |
| 614161560651 |
| 614161580651 |
| 614161590651 |
| 614161600651 |
| 614161610651 |
| 614161630651 |
| 614161640651 |
| 614161650651 |
| 614161660651 |
| 614161680651 |
| 614161710651 |
| 614161730651 |
| 614161740651 |
| 614161750651 |
| 614161760651 |
| 614161770651 |
| 614161790651 |
| 614161800651 |
| 614161820651 |
| 614161840651 |
| 614161850651 |
| 614161860651 |
| 614161870651 |
| 614161900651 |
| 614161910651 |
| 614161920651 |
| 614161930651 |
| 614161940651 |
| 614161950651 |
| 614162000651 |
| 614162020651 |
| 614162030651 |
| 614162040651 |
| 614162050651 |
| 614162060651 |
| 614162070651 |
| 614162080651 |
| 614162090651 |
| 614162100651 |
| 614162110651 |
| 614162120651 |
| 614162130651 |
| 614162140651 |
| 614162150651 |
| 614162160651 |
| 614162170651 |
| 614162180651 |
| 614162190651 |
| 614162200651 |
| 614162210651 |
| 614162220651 |
| 614162230651 |
| 614162240651 |
| 614162250651 |
| 614162260651 |
| 614162270651 |
+--------------+
121 rows in set (0.77 sec)

No double hits...

Any ideas?
[29 Oct 2012 2:23] Jimmy Yang
Jan, will you be able to provide a database base dump that has the problem? Or in some way, a simplified repro case?
[29 Oct 2012 15:49] Todd Farmer
Development staff is unable to reproduce the described behavior.  Anybody able to provide a reproducible test case is encouraged to do so in order to maximize possibility that this can be fixed by GA.
[9 Nov 2012 9:28] Jan Rusch
Still not working in 5.6.8
[30 Nov 2012 19:28] Sveta Smirnova
Jan,

could you please re-upload file to our FTP server: Jimmy could not find it for some reason and now it should be expired, because 21-day lifetime.
[10 Dec 2012 11:38] Jan Rusch
Hi,

I just read the comment and reuploaded the file bug-data-67257.zip

Jan
[11 Dec 2012 18:01] Sveta Smirnova
Jan,

thank you very much! We will work with this file.
[11 Dec 2012 19:32] Sveta Smirnova
Bug is verified.

But in my case (server started from our MTR: ./mtr --start innodb) select from InnoDB table returned single row.

And there is one more anomaly: count(ArtNt) from InnoDB table returns 122 (proper value):

count from InnoDB table
122
select from InnoDB table
316162340111
count from MyISAM table
122

Select from MyISAM table returns 122 rows.
[12 Dec 2012 9:17] Jimmy Yang
Jan, thank you for the report. Issue found and fix will checked in ASAP.
[13 Dec 2012 16:01] Paul Dubois
Noted in 5.6.10, 5.7.1 changelogs.

Full-text searches in InnoDB tables could return incorrect results.