Bug #2101 | fulltext index bug - ignores words like "microsoft" (honest!) | ||
---|---|---|---|
Submitted: | 11 Dec 2003 19:59 | Modified: | 12 Dec 2003 14:39 |
Reporter: | Martin Virtel | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 4.0.16 | OS: | Linux (linux) |
Assigned to: | Dean Ellis | CPU Architecture: | Any |
[11 Dec 2003 19:59]
Martin Virtel
[12 Dec 2003 7:19]
Dean Ellis
I cannot repeat this behavior with 4.0.16 or the latest development sources. Thank you.
[12 Dec 2003 12:58]
Martin Virtel
I was not kidding. I re-tried and repeated the bug. Please run the following MySQL script and see if your results differ, that is if you get a count of 1 for both "match" and "like" searches. MV Script-------------------------------------------------------------------- use test; drop table if exists fttest ; create table fttest ( title char(80) ) ; create fulltext index title_text on fttest(title); insert into fttest values (" microsoft cisco doesn't find microsoft "); select "match microsoft", count(title) from fttest where match title against ("microsoft" in boolean mode); select "find microsoft", count(title) from fttest where title like " %microsoft%"; select "match cisco", count(title) from fttest where match title against ("cisco" in boolean mode); select "find cisco ", count(title) from fttest where title like " %cisco%"; Results------------------------------------------------------------------------ match microsoft count(title) match microsoft 0 find microsoft count(title) find microsoft 1 match cisco count(title) match cisco 0 find cisco count(title) find cisco 1
[12 Dec 2003 13:09]
Dean Ellis
All report a count of 1, unless I place "microsoft" in a stopwords file... As you are using an empty stopword file, and as our internal stopword list does not contain "microsoft", this remains unrepeatable.
[12 Dec 2003 14:07]
Martin Virtel
Thanks for the persistence. My last idea is to check if differences in the mysql settings can explain what is happening, especially since I use german1 as a character set so I can search for äü etc. Below is my output of "show variables". M Variable_name Value back_log 50 basedir /usr/local/mysql-standard-4.0.16-pc-linux-i686/ binlog_cache_size 32768 bulk_insert_buffer_size 8388608 character_set german1 character_sets latin1 big5 czech euc_kr gb2312 gbk latin1_de sjis tis620 ujis dec8 dos german1 hp8 koi8_ru latin2 swe7 usa7 cp1251 danish hebrew win1251 estonia hungarian koi8_ukr win1251ukr greek win1250 croat cp1257 latin5 concurrent_insert ON connect_timeout 5 convert_character_set datadir /var/lib/mysql.4.0.16/ default_week_format 0 delay_key_write ON delayed_insert_limit 100 delayed_insert_timeout 300 delayed_queue_size 1000 flush OFF flush_time 0 ft_boolean_syntax + -><()~*:""&| ft_min_word_len 2 ft_max_word_len 80 ft_max_word_len_for_sort 20 ft_stopword_file /var/lib/mysql/stopwords have_bdb NO have_crypt YES have_innodb YES have_isam YES have_raid NO have_symlink YES have_openssl NO have_query_cache YES init_file innodb_additional_mem_pool_size 1048576 innodb_buffer_pool_size 8388608 innodb_data_file_path ibdata1:10M:autoextend innodb_data_home_dir innodb_file_io_threads 4 innodb_force_recovery 0 innodb_thread_concurrency 8 innodb_flush_log_at_trx_commit 1 innodb_fast_shutdown ON innodb_flush_method innodb_lock_wait_timeout 50 innodb_log_arch_dir ./ innodb_log_archive OFF innodb_log_buffer_size 1048576 innodb_log_file_size 5242880 innodb_log_files_in_group 2 innodb_log_group_home_dir ./ innodb_mirrored_log_groups 1 innodb_max_dirty_pages_pct 90 interactive_timeout 28800 join_buffer_size 131072 key_buffer_size 8388600 language /usr/local/mysql-standard-4.0.16-pc-linux-i686/share/mysql/english/ large_files_support ON local_infile ON locked_in_memory OFF log OFF log_update OFF log_bin OFF log_slave_updates OFF log_slow_queries ON log_warnings OFF long_query_time 10 low_priority_updates OFF lower_case_table_names OFF max_allowed_packet 1048576 max_binlog_cache_size 4294967295 max_binlog_size 1073741824 max_connections 1000 max_connect_errors 10 max_delayed_threads 20 max_heap_table_size 16777216 max_join_size 4294967295 max_relay_log_size 0 max_seeks_for_key 4294967295 max_sort_length 1024 max_user_connections 0 max_tmp_tables 32 max_write_lock_count 4294967295 myisam_max_extra_sort_file_size 268435456 myisam_max_sort_file_size 2147483647 myisam_repair_threads 1 myisam_recover_options OFF myisam_sort_buffer_size 8388608 net_buffer_length 16384 net_read_timeout 30 net_retry_count 10 net_write_timeout 60 new OFF open_files_limit 5010 pid_file /var/lib/mysql.4.0.16//xbox.newsradar.org.pid log_error port 3306 protocol_version 10 query_alloc_block_size 8192 query_cache_limit 1048576 query_cache_size 0 query_cache_type ON query_prealloc_size 8192 range_alloc_block_size 2048 read_buffer_size 131072 read_only OFF read_rnd_buffer_size 262144 rpl_recovery_rank 0 server_id 0 slave_net_timeout 3600 skip_external_locking ON skip_networking OFF skip_show_database OFF slow_launch_time 2 socket /var/lib/mysql/mysql.sock sort_buffer_size 2097144 sql_mode 0 table_cache 64 table_type MYISAM thread_cache_size 0 thread_stack 126976 tx_isolation REPEATABLE-READ timezone UTC tmp_table_size 33554432 tmpdir /tmp/ transaction_alloc_block_size 8192 transaction_prealloc_size 4096 version 4.0.16-standard-log wait_timeout 28800
[12 Dec 2003 14:39]
Dean Ellis
No luck. Cannot repeat unless I explicitly place microsoft into the stopwords, even with german1 character set.
[13 Dec 2003 6:52]
Martin Virtel
Thanks, Dean. In fact you were right and somebody meddled with the customized stopwords file at our site, that was the source of the bug. Sorry for not checking that earlier - MV