Bug #22343 boolean fulltext search becomes case-sensitive
Submitted: 14 Sep 2006 12:55 Modified: 25 Sep 2006 13:28
Reporter: [ name withheld ] Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:5.0.26-BK, 5.0.22 OS:Linux (Linux (Fedora Core 5))
Assigned to: Sergey Vojtovich CPU Architecture:Any
Tags: boolean, case-sensitive, fulltext

[14 Sep 2006 12:55] [ name withheld ]
Description:
I have problems doing a full text search with multiple tables. 

In some cases searching with linked tables becomes case-sensitive.

And btw if there is no entry for the left join the match does not find anything. 

How to repeat:
CREATE TABLE `spots` (
  `id` int(11) unsigned NOT NULL auto_increment,
  `product` text,  `title` text,  `postproduction` int(11) default '0',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `spots` VALUES ("id","Tobies Test","Testtitle",1);

CREATE TABLE `postproduction` (
  `id` int(4) unsigned NOT NULL auto_increment,
  `name` text,  `adress` text,  `city` text,  `homepage` text,  `email` text,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `postproduction` VALUES (1,'blank postproduction',NULL,NULL,NULL,NULL);

I get no result for:

SELECT spots.* FROM  spots  
LEFT JOIN postproduction ON postproduction.id = spots.postproduction 
WHERE   MATCH (spots.product,spots.title,spots.postproduction,postproduction.name ) against ( "tobies"  IN BOOLEAN MODE) ;

but I get a result for the same query with big "T":

SELECT spots.* FROM  spots  
LEFT JOIN postproduction ON postproduction.id = spots.postproduction 
WHERE   MATCH (spots.product,spots.title,spots.postproduction,postproduction.name ) against ( "Tobies"  IN BOOLEAN MODE) ;

or by removing the left join

SELECT spots.* FROM  spots  WHERE   MATCH (spots.product,spots.title ) against ( "tobies"  IN BOOLEAN MODE);
[14 Sep 2006 15:11] Valeriy Kravchuk
Thank you for a bug report. Verified just as described, also with 5.0.26-BK. Only one table is needed to demonstrate a problem:

mysql> SELECT spots.* FROM  spots WHERE   MATCH (spots.product,spots.title) aga
inst ( 'tobies'  IN BOOLEAN MODE);
+----+-------------+-----------+----------------+
| id | product     | title     | postproduction |
+----+-------------+-----------+----------------+
|  1 | Tobies Test | Testtitle |              1 |
+----+-------------+-----------+----------------+
1 row in set (0.01 sec)

mysql> SELECT spots.* FROM  spots WHERE   MATCH (spots.product,spots.title, spo
ts.postproduction) against ( 'tobies'  IN BOOLEAN MODE);
Empty set (0.00 sec)

mysql> SELECT spots.* FROM  spots WHERE   MATCH (spots.product,spots.title, spo
ts.postproduction) against ( 'Tobies'  IN BOOLEAN MODE);
+----+-------------+-----------+----------------+
| id | product     | title     | postproduction |
+----+-------------+-----------+----------------+
|  1 | Tobies Test | Testtitle |              1 |
+----+-------------+-----------+----------------+
1 row in set (0.00 sec)

mysql> select version();
+--------------+
| version()    |
+--------------+
| 5.0.26-debug |
+--------------+
1 row in set (0.00 sec)

So, looks like adding INT column to MATCH lists affects case-sensitivity for searches. This looks like a bug for me.
[25 Sep 2006 13:28] Sergei Golubchik
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/

When you mix arguments in different character sets in one function, they are all converted to a single common character set. Integer converted to a sting gets a binary character set. As a result the argument of MATCH is converted to a binary character set:

mysql> select charset(concat("a","b"));
+--------------------------+
| charset(concat("a","b")) |
+--------------------------+
| latin1                   |
+--------------------------+
1 row in set (0.00 sec)

mysql> select charset(concat("a","b",4));
+----------------------------+
| charset(concat("a","b",4)) |
+----------------------------+
| binary                     |
+----------------------------+
1 row in set (0.00 sec)

In a binary character set a string is a sequence of bytes, the concept of "letter case" is not applicable:

mysql> select upper(concat("a","b"));
+------------------------+
| upper(concat("a","b")) |
+------------------------+
| AB                     |
+------------------------+
1 row in set (0.00 sec)

mysql> select upper(concat("a","b",4));
+--------------------------+
| upper(concat("a","b",4)) |
+--------------------------+
| ab4                      |
+--------------------------+
1 row in set (0.00 sec)