Bug #484 Strange join behaviour with fulltext indexes
Submitted: 23 May 2003 9:48 Modified: 16 Jun 2003 13:21
Reporter: Stephen Brandon Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.0.12-standard OS:MacOS (Mac OSX (Darwin))
Assigned to: Sergei Golubchik CPU Architecture:Any

[23 May 2003 9:48] Stephen Brandon
Description:
I left join 2 tables. Some rows in first table do not have a corresponding row in second table, while some do. If I qualify on fulltext index of column in second table, some rows show up that ARE NOT JOINED AT ALL, *if* there is another similar record - note: I don't know exactly how similar they have to be, but I had a similar case to that below, where changing one of the dates in the example causes the bad behaviour to disappear.

If I simply change the condition to a standard "like", the bad behaviour disappears.

eg GOOD:
select * from concert2 left join entity2 on venue_id = entity_id where name like '%aberdeen%';
+----------+------------+---------------------+--------------------+-----------+
| venue_id | venue_text | dt                  | name               | entity_id |
+----------+------------+---------------------+--------------------+-----------+
|        1 | A1         | 2003-05-23 19:30:00 | Aberdeen Town Hall |         1 |
+----------+------------+---------------------+--------------------+-----------+
1 row in set (0.00 sec)

BAD:

select * from concert2 left join entity2 on venue_id = entity_id where match(name) against('aberdeen' IN BOOLEAN MODE);
+----------+------------+---------------------+--------------------+-----------+
| venue_id | venue_text | dt                  | name               | entity_id |
+----------+------------+---------------------+--------------------+-----------+
|        1 | A1         | 2003-05-23 19:30:00 | Aberdeen Town Hall |         1 |
|     NULL | A2         | 2003-05-23 19:30:00 | NULL               |      NULL |
+----------+------------+---------------------+--------------------+-----------+
2 rows in set (0.00 sec)

How to repeat:
use test;

CREATE TEMPORARY TABLE concert2 (
  venue_id int(11) default NULL,
  venue_text varchar(255) default NULL,
  dt datetime default NULL
) TYPE=MyISAM;

INSERT INTO concert2 (venue_id, venue_text, dt) VALUES (1, 'A1', '2003-05-23 19:30:00');
INSERT INTO concert2 (venue_id, venue_text, dt) VALUES (NULL, 'A2', '2003-05-23 19:30:00');

CREATE TEMPORARY TABLE entity2 (
  name varchar(255) NOT NULL default '',
  entity_id int(11) NOT NULL auto_increment,
  PRIMARY KEY  (entity_id),
  FULLTEXT KEY name (name)
) TYPE=MyISAM;

INSERT INTO entity2 (name, entity_id) VALUES ('Aberdeen Town Hall', 1);
INSERT INTO entity2 (name, entity_id) VALUES ('Glasgow Royal Concert Hall', 2);
INSERT INTO entity2 (name, entity_id) VALUES ('Queen\'s Hall, Edinburgh', 3);

select 'this is what is supposed to happen' as message;

select * from concert2 left join entity2 on venue_id = entity_id where name like '%aberdeen%' and dt = '2003-05-23 19:30:00';

select 'this shows fulltext searching causing joins to kind-of succeed just because there is another similar row in the db' as message;

select * from concert2 left join entity2 on venue_id = entity_id where match(name) against('aberdeen'IN BOOLEAN MODE) and dt = '2003-05-23 19:30:00';

update concert2 set dt = '2003-05-23 19:30:01' where venue_text = 'A2';
select * from concert2 left join entity2 on venue_id = entity_id where match(name) against('aberdeen' IN BOOLEAN MODE) and dt = '2003-05-23 19:30:01';

Suggested fix:
??
[27 May 2003 6:23] Stephen Brandon
A simpler test case showed that (as far as I can see right now) a fulltext search always "succeeds" as far as the join is concerned. So if I join tablea, tableb 
where match (b.ft) against ('string' in boolean mode) then the join ALWAYS succeeds, whereas substituting the fulltext for an equivalent LIKE search will not 
join the tables if the condition does not match.
[29 May 2003 2:22] Stephen Brandon
Just to clarify my last comment: if I do the following:

select * from tablea left join tableb on tablea.id = tableb.id WHERE match(tableb.name) against("text" in boolean mode)

I get a HUGE result that is not restricted in the way I would expect, because the "match" always appears to succeed. If I change to a LIKE then it works properly.
[16 Jun 2003 13:21] Sergei Golubchik
fixed in 4.0.14