Bug #2647 Fulltext Search don't get all Records
Submitted: 5 Feb 2004 1:24 Modified: 6 Feb 2004 1:12
Reporter: Wolfram Keil Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.0.16 OS:Linux (Linux)
Assigned to: Sergei Golubchik CPU Architecture:Any

[5 Feb 2004 1:24] Wolfram Keil
Description:
We have a product-searchengine with estimated 2.000.000 Records in one table.

Over this table we perform a Fulltext-Search in Boolean Mode like this:

SELECT * FROM PR_PRODUCTS WHERE MATCH(PROD_NAME, PROD_BESCHR) AGAINST('Stichsäge' IN BOOLEAN MODE) ... (Sorting and so on)
2 Records

The strange thing is, that some records with the word "stichsäge" (jigsaw) won't be found.

Some other confusing examples:

SELECT * FROM PR_PRODUCTS WHERE MATCH(PROD_NAME, PROD_BESCHR) AGAINST('+Stichsäge* +makita' IN BOOLEAN MODE) ... (Sorting and so on)
14 Records

SELECT * FROM PR_PRODUCTS WHERE MATCH(PROD_NAME, PROD_BESCHR) AGAINST('+Stichsäge*' IN BOOLEAN MODE) ... (Sorting and so on)
2 Records ?!?

SELECT PROD_NAME, PROD_BESCHR FROM PR_PRODUCTS WHERE MATCH(PROD_NAME, PROD_BESCHR) AGAINST('+Stichsäge +makita' IN BOOLEAN MODE)
0 Records

Here the live-test:

http://www.preisroboter.de/search.php?search=stichs%E4ge&join=2&x=17&y=12

http://www.preisroboter.de/search.php?search=%2Bstichs%E4ge*&join=2&x=48&y=13

http://www.preisroboter.de/search.php?search=%2Bstichs%E4ge*+%2Bmakita&join=2&x=34&y=12

Other searches with "ä" like "Bandsäge" are ok.

Server Configuration:

back_log=50
basedir=/usr/local/mysql-standard-4.0.16-pc-linux-i686/
binlog_cache_size=32768
bulk_insert_buffer_size=8388608
character_set=latin1_de
character_sets=latin1 big5 czech euc_kr gb2312 gbk latin1_de sjis tis620 ujis dec8 dos german1 hp8 koi8_ru latin2 swe7 usa7 cp1251 danish hebrew win1251 estonia hungarian koi8_ukr win1251ukr greek win1250 croat cp1257 latin5
concurrent_insert=ON
connect_timeout=5
convert_character_set=
datadir=/RAID/mysqldata/
default_week_format=0
delay_key_write=ON
delayed_insert_limit=100
delayed_insert_timeout=300
delayed_queue_size=1000
flush=OFF
flush_time=0
ft_boolean_syntax=+ -><()~*:""&|
ft_min_word_len=1
ft_max_word_len=254
ft_max_word_len_for_sort=20
ft_stopword_file=
have_bdb=NO
have_crypt=YES
have_innodb=YES
have_isam=YES
have_raid=NO
have_symlink=YES
have_openssl=NO
have_query_cache=YES
init_file=
innodb_additional_mem_pool_size=1048576
innodb_buffer_pool_size=8388608
innodb_data_file_path=ibdata1:10M:autoextend
innodb_data_home_dir=
innodb_file_io_threads=4
innodb_force_recovery=0
innodb_thread_concurrency=8
innodb_flush_log_at_trx_commit=1
innodb_fast_shutdown=ON
innodb_flush_method=
innodb_lock_wait_timeout=50
innodb_log_arch_dir=./
innodb_log_archive=OFF
innodb_log_buffer_size=1048576
innodb_log_file_size=5242880
innodb_log_files_in_group=2
innodb_log_group_home_dir=./
innodb_mirrored_log_groups=1
innodb_max_dirty_pages_pct=90
interactive_timeout=28800
join_buffer_size=131072
key_buffer_size=502652928
language=/usr/local/mysql-standard-4.0.16-pc-linux-i686/share/mysql/german/
large_files_support=ON
local_infile=ON
locked_in_memory=OFF
log=OFF
log_update=OFF
log_bin=OFF
log_slave_updates=OFF
log_slow_queries=OFF
log_warnings=OFF
long_query_time=10
low_priority_updates=OFF
lower_case_table_names=OFF
max_allowed_packet=1047552
max_binlog_cache_size=4294967295
max_binlog_size=1073741824
max_connections=500
max_connect_errors=10
max_delayed_threads=20
max_heap_table_size=16777216
max_join_size=4294967295
max_relay_log_size=0
max_seeks_for_key=4294967295
max_sort_length=1024
max_user_connections=0
max_tmp_tables=32
max_write_lock_count=4294967295
myisam_max_extra_sort_file_size=268435456
myisam_max_sort_file_size=2147483647
myisam_repair_threads=1
myisam_recover_options=OFF
myisam_sort_buffer_size=67108864
net_buffer_length=16384
net_read_timeout=30
net_retry_count=10
net_write_timeout=60
new=OFF
open_files_limit=1134
pid_file=/usr/local/mysql/data/prmysql1.pid
log_error=
port=3306
protocol_version=10
query_alloc_block_size=8192
query_cache_limit=1048576
query_cache_size=33554432
query_cache_type=ON
query_prealloc_size=8192
range_alloc_block_size=2048
read_buffer_size=2093056
read_only=OFF
read_rnd_buffer_size=262144
rpl_recovery_rank=0
server_id=0
slave_net_timeout=3600
skip_external_locking=ON
skip_networking=OFF
skip_show_database=OFF
slow_launch_time=2
socket=/tmp/mysql.sock
sort_buffer_size=2097144
sql_mode=0
table_cache=64
table_type=MYISAM
thread_cache_size=8
thread_stack=126976
tx_isolation=REPEATABLE-READ
timezone=CET
tmp_table_size=33554432
tmpdir=/tmp/
transaction_alloc_block_size=8192
transaction_prealloc_size=4096
version=4.0.16-standard
wait_timeout=28800

How to repeat:
http://www.preisroboter.de/search.php?search=stichs%E4ge&join=2&x=17&y=12

http://www.preisroboter.de/search.php?search=%2Bstichs%E4ge*&join=2&x=48&y=13

http://www.preisroboter.de/search.php?search=%2Bstichs%E4ge*+%2Bmakita&join=2&x=34&y=12
[5 Feb 2004 2:58] Sergei Golubchik
Sorry, but urls are not enough.
I need an sql script that, being loaded like "mysql <your_script",
exhibits the bug.
I cannot debug mysqld mysqld being given an url :)
[5 Feb 2004 3:44] Wolfram Keil
OK. Here is the table definition:

CREATE TABLE `PR_PRODUCTS` (
  `ID` int(11) NOT NULL auto_increment,
  `PROD_NAME` varchar(150) NOT NULL default '',
  `PROD_BESCHR` varchar(150) NOT NULL default '',
  `PROD_PREIS` decimal(9,2) NOT NULL default '0.00',
  `DEEPLINK` varchar(250) default NULL,
  `PROD_IMG` varchar(150) default NULL,
  `TIMEST` timestamp(14) NOT NULL,
  `URL` varchar(250) NOT NULL default '',
  `SHOP_ID` int(10) NOT NULL default '0',
  `DELIVERY_TIME` char(2) NOT NULL default '99',
  `PROD_ID` varchar(20) NOT NULL default '',
  `VERSANDKOSTENFREI` int(11) NOT NULL default '0',
  PRIMARY KEY  (`ID`),
  KEY `prodforshop` (`SHOP_ID`),
  FULLTEXT KEY `ft_name_dessc` (`PROD_NAME`,`PROD_BESCHR`)
) TYPE=MyISAM DELAY_KEY_WRITE=1; 

Data:
INSERT INTO PR_PRODUCTS VALUES
(41085965,'Stichsäge','Hitachi CJ120VA',224.00,'http://www.24tools.de/shop/pd574757009.htm','http://www.24tools.de/shop/assets/thumb/cj120...),
(41085964,'Stichsäge','Hitachi CJ120V',224.00,'http://www.24tools.de/shop/pd-1343197077.htm','http://www.24tools.de/shop/assets/thumb/cj1...),
(41085770,'Tacker','Makita T221DW',317.00,'http://www.24tools.de/shop/pd-2065692261.htm','http://www.24tools.de/shop/assets/thumb/t22...),
(41085769,'Astschere','Makita 4604DZ',640.00,'http://www.24tools.de/shop/pd-1268444441.htm','http://www.24tools.de/shop/assets/thumb/460...),
(41085768,'Astschere','Makita 4604DW',1252.00,'http://www.24tools.de/shop/pd1227722833.htm','http://www.24tools.de/shop/assets/thumb/4604...),
(41085767,'Säbelsäge','Makita 4390DW',217.00,'http://www.24tools.de/shop/pd-526118357.htm','http://www.24tools.de/shop/assets/thumb/4390...),
(41085766,'Stichsäge','Makita 4334DWF',455.00,'http://www.24tools.de/shop/pd1798065783.htm','http://www.24tools.de/shop/assets/thumb/4334...),
(41085765,'Stichsäge','Makita 4333DWD',366.00,'http://www.24tools.de/shop/pd768705889.htm','http://www.24tools.de/shop/assets/thumb/4333d...),
(41085764,'Stichsäge','Makita 4331DWD',361.00,'http://www.24tools.de/shop/pd-115702597.htm','http://www.24tools.de/shop/assets/thumb/4331...),
(41085763,'Stichsäge','Makita 4300DW',206.00,'http://www.24tools.de/shop/pd143098473.htm','http://www.24tools.de/shop/assets/thumb/4300d...),
(41085757,'Stichsäge','Makita 4324',107.00,'http://www.24tools.de/shop/pd-1199312123.htm','http://www.24tools.de/shop/assets/thumb/432...),
(41085756,'Stichsäge','Makita 4323',101.00,'http://www.24tools.de/shop/pd-1714572609.htm','http://www.24tools.de/shop/assets/thumb/432...),
(41085754,'Säbelsäge','Makita JR140DWD',422.00,'http://www.24tools.de/shop/pd-1497955395.htm','http://www.24tools.de/shop/assets/thumb/jr1...),
(41085753,'Säbelsäge','Makita BJR240SH',667.00,'http://www.24tools.de/shop/pd1872677163.htm','http://www.24tools.de/shop/assets/thumb/bjr2...),
(41085749,'Säbelsäge','Makita JR3000VT',242.00,'http://www.24tools.de/shop/pd863130991.htm','http://www.24tools.de/shop/assets/thumb/jr300...),
(41085750,'Säbelsäge','Makita JR3020',354.00,'http://www.24tools.de/shop/pd1398674741.htm','http://www.24tools.de/shop/assets/thumb/jr30...),
(41085751,'Säbelsäge','Makita JR3030',260.00,'http://www.24tools.de/shop/pd1672753771.htm','http://www.24tools.de/shop/assets/thumb/jr30...),
(41085752,'Säbelsäge','Makita JR3030T',277.00,'http://www.24tools.de/shop/pd1014213265.htm','http://www.24tools.de/shop/assets/thumb/jr30...),
(41085748,'Säbelsäge','Makita JR3000V',213.00,'http://www.24tools.de/shop/pd-1734028903.htm','http://www.24tools.de/shop/assets/thumb/jr3...),
(41085747,'Kettensäge','Makita UC120DWAE',328.00,'http://www.24tools.de/shop/pd-736147243.htm','http://www.24tools.de/shop/assets/thumb/uc12...),
(41085746,'Kettensäge','Makita UC4010A',150.00,'http://www.24tools.de/shop/pd-52134871.htm','http://www.24tools.de/shop/assets/thumb/uc401...)

Query 1:
SELECT PR_PRODUCTS.ID AS ID, PR_PRODUCTS.VERSANDKOSTENFREI AS VERSANDKOSTENFREI, PR_PRODUCTS.PROD_NAME AS PROD_NAME,PR_PRODUCTS.PROD_PREIS AS PROD_PREIS, PR_PRODUCTS.PROD_BESCHR AS PROD_BESCHR, PR_PRODUCTS.DELIVERY_TIME AS DELIVERY_TIME, PR_PRODUCTS.URL AS PURL, PR_PRODUCTS.PROD_IMG AS PROD_IMG, PR_PRODUCTS.SHOP_ID AS SHOP_ID, PR_PRODUCTS.PROD_ID AS PROD_ID FROM PR_PRODUCTS WHERE (PR_PRODUCTS.DELIVERY_TIME <= '99') AND (PR_PRODUCTS.PROD_PREIS >= 0) AND (PR_PRODUCTS.PROD_PREIS < 999999999) AND ( MATCH(PR_PRODUCTS.PROD_NAME, PR_PRODUCTS.PROD_BESCHR) AGAINST ('+stichsäge*' IN BOOLEAN MODE)) ORDER BY PROD_PREIS DESC LIMIT 0,10

Query 2:
SELECT PR_PRODUCTS.ID AS ID, PR_PRODUCTS.VERSANDKOSTENFREI AS VERSANDKOSTENFREI, PR_PRODUCTS.PROD_NAME AS PROD_NAME,PR_PRODUCTS.PROD_PREIS AS PROD_PREIS, PR_PRODUCTS.PROD_BESCHR AS PROD_BESCHR, PR_PRODUCTS.DELIVERY_TIME AS DELIVERY_TIME, PR_PRODUCTS.URL AS PURL, PR_PRODUCTS.PROD_IMG AS PROD_IMG, PR_PRODUCTS.SHOP_ID AS SHOP_ID, PR_PRODUCTS.PROD_ID AS PROD_ID FROM PR_PRODUCTS WHERE (PR_PRODUCTS.DELIVERY_TIME <= '99') AND (PR_PRODUCTS.PROD_PREIS >= 0) AND (PR_PRODUCTS.PROD_PREIS < 999999999) AND ( MATCH(PR_PRODUCTS.PROD_NAME, PR_PRODUCTS.PROD_BESCHR) AGAINST ('+stichsäge* +makita' IN BOOLEAN MODE)) ORDER BY PROD_PREIS DESC LIMIT 0,10

Query 2 ist more restrictive than Query 1. 

Query 2 brings more Results..
[6 Feb 2004 1:12] Sergei Golubchik
I cannot repeat it on my 4.0.18 - looks like the bug was already fixed.

Try to upgrade to 4.0.18 (it should be out VERY soon).