Bug #5141 | creation error of full text search on UTF-8 encoding | ||
---|---|---|---|
Submitted: | 22 Aug 2004 14:53 | Modified: | 30 Aug 2004 11:41 |
Reporter: | Woojong Koh | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 4.1.3,5.0.0,5.0.1 | OS: | Linux (fedora core 2) |
Assigned to: | Sergei Golubchik | CPU Architecture: | Any |
[22 Aug 2004 14:53]
Woojong Koh
[22 Aug 2004 15:22]
Woojong Koh
cf. error message of repair table query mysql> repair table pages; +----------------------+--------+----------+-------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +----------------------+--------+----------+-------------------------------------------------+ | search-backup3.pages | repair | error | 22 when fixing table | | search-backup3.pages | repair | error | Can't copy datafile-header to tempfile, error 9 | | search-backup3.pages | repair | status | Operation failed | +----------------------+--------+----------+-------------------------------------------------+ 3 rows in set (20 min 37.54 sec)
[23 Aug 2004 17:59]
Hartmut Holzgraefe
Can you please add the output of "SHOW VARIABLES", "SHOW STATUS" and an "ls -l" of the database data directory?
[23 Aug 2004 22:15]
Woojong Koh
mysql> show variables; +---------------------------------+----------------------------------------+ | Variable_name | Value | +---------------------------------+----------------------------------------+ | back_log | 50 | | basedir | /usr/local/mysql/ | | binlog_cache_size | 32768 | | bulk_insert_buffer_size | 8388608 | | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/local/mysql/share/mysql/charsets/ | | collation_connection | utf8_general_ci | | collation_database | utf8_general_ci | | collation_server | utf8_general_ci | | concurrent_insert | ON | | connect_timeout | 5 | | datadir | /home/data/ | | date_format | %Y-%m-%d | | datetime_format | %Y-%m-%d %H:%i:%s | | default_week_format | 0 | | delay_key_write | ON | | delayed_insert_limit | 100 | | delayed_insert_timeout | 300 | | delayed_queue_size | 1000 | | expire_logs_days | 0 | | flush | OFF | | flush_time | 0 | | ft_boolean_syntax | + -><()~*:""&| | | ft_max_word_len | 84 | | ft_min_word_len | 1 | | ft_query_expansion_limit | 20 | | ft_stopword_file | /home/data/stopword | | group_concat_max_len | 1024 | | have_archive | NO | | have_bdb | NO | | have_compress | YES | | have_crypt | YES | | have_innodb | YES | | have_isam | NO | | have_geometry | YES | | have_ndbcluster | NO | | have_openssl | NO | | have_query_cache | YES | | have_raid | NO | | have_rtree_keys | YES | | have_symlink | YES | | init_connect | | | init_file | | | init_slave | | | innodb_additional_mem_pool_size | 1048576 | | innodb_buffer_pool_awe_mem_mb | 0 | | innodb_buffer_pool_size | 8388608 | | innodb_data_file_path | ibdata1:10M:autoextend | | innodb_data_home_dir | | | innodb_fast_shutdown | ON | | innodb_file_io_threads | 4 | | innodb_file_per_table | OFF | | innodb_flush_log_at_trx_commit | 1 | | innodb_flush_method | | | innodb_force_recovery | 0 | | innodb_lock_wait_timeout | 50 | | innodb_log_arch_dir | | | innodb_log_archive | OFF | | innodb_log_buffer_size | 1048576 | | innodb_log_file_size | 5242880 | | innodb_log_files_in_group | 2 | | innodb_log_group_home_dir | ./ | | innodb_max_dirty_pages_pct | 90 | | innodb_mirrored_log_groups | 1 | | innodb_open_files | 300 | | innodb_thread_concurrency | 8 | | interactive_timeout | 28800 | | join_buffer_size | 131072 | | key_buffer_size | 134217728 | | key_cache_age_threshold | 300 | | key_cache_block_size | 1024 | | key_cache_division_limit | 100 | | language | /usr/local/mysql/share/mysql/english/ | | large_files_support | ON | | license | GPL | | local_infile | ON | | locked_in_memory | OFF | | log | OFF | | log_bin | ON | | log_error | | | log_slave_updates | OFF | | log_slow_queries | OFF | | log_update | OFF | | log_warnings | 1 | | long_query_time | 10 | | low_priority_updates | OFF | | lower_case_file_system | OFF | | lower_case_table_names | 0 | | max_allowed_packet | 3144704 | | max_binlog_cache_size | 4294967295 | | max_binlog_size | 1073741824 | | max_connect_errors | 99999999 | | max_connections | 1000 | | max_delayed_threads | 20 | | max_error_count | 64 | | max_heap_table_size | 16777216 | | max_insert_delayed_threads | 20 | | max_join_size | 4294967295 | | max_length_for_sort_data | 1024 | | max_relay_log_size | 0 | | max_seeks_for_key | 4294967295 | | max_sort_length | 1024 | | max_tmp_tables | 32 | | max_user_connections | 0 | | max_write_lock_count | 4294967295 | | myisam_data_pointer_size | 4 | | myisam_max_extra_sort_file_size | 10737418240 | | myisam_max_sort_file_size | 10737418240 | | myisam_recover_options | OFF | | myisam_repair_threads | 1 | | myisam_sort_buffer_size | 268435456 | | net_buffer_length | 8192 | | net_read_timeout | 30 | | net_retry_count | 10 | | net_write_timeout | 60 | | new | OFF | | old_passwords | OFF | | open_files_limit | 5010 | | optimizer_prune_level | 1 | | optimizer_search_depth | 62 | | pid_file | /home/data/oorr.net.pid | | port | 3306 | | preload_buffer_size | 32768 | | protocol_version | 10 | | query_alloc_block_size | 8192 | | query_cache_limit | 1048576 | | query_cache_min_res_unit | 4096 | | query_cache_size | 0 | | query_cache_type | ON | | query_prealloc_size | 8192 | | range_alloc_block_size | 2048 | | read_buffer_size | 258048 | | read_only | OFF | | read_rnd_buffer_size | 520192 | | relay_log_purge | ON | | rpl_recovery_rank | 0 | | secure_auth | OFF | | server_id | 1 | | skip_external_locking | ON | | skip_networking | OFF | | skip_show_database | OFF | | slave_net_timeout | 3600 | | slow_launch_time | 2 | | socket | /tmp/mysql.sock | | sort_buffer_size | 524280 | | sql_mode | | | sql_updatable_view_key | YES | | storage_engine | MyISAM | | sync_binlog | 0 | | sync_frm | ON | | system_time_zone | KST | | table_cache | 64 | | table_type | MyISAM | | thread_cache_size | 0 | | thread_stack | 196608 | | time_format | %H:%i:%s | | time_zone | SYSTEM | | tmp_table_size | 33554432 | | tmpdir | /tmp:/home2 | | transaction_alloc_block_size | 8192 | | transaction_prealloc_size | 4096 | | tx_isolation | REPEATABLE-READ | | version | 5.0.1-alpha-log | | version_comment | Source distribution | | version_compile_machine | i686 | | version_compile_os | pc-linux | | wait_timeout | 28800 | +---------------------------------+----------------------------------------+ 172 rows in set (0.00 sec) mysql> show status; +--------------------------+-----------+ | Variable_name | Value | +--------------------------+-----------+ | Aborted_clients | 52 | | Aborted_connects | 0 | | Binlog_cache_disk_use | 0 | | Binlog_cache_use | 0 | | Bytes_received | 661126807 | | Bytes_sent | 904602283 | | Com_admin_commands | 0 | | Com_alter_db | 0 | | Com_alter_table | 4 | | Com_analyze | 0 | | Com_backup_table | 0 | | Com_begin | 0 | | Com_change_db | 235 | | Com_change_master | 0 | | Com_check | 0 | | Com_checksum | 0 | | Com_commit | 0 | | Com_create_db | 1 | | Com_create_function | 0 | | Com_create_index | 0 | | Com_create_table | 2 | | Com_delete | 0 | | Com_delete_multi | 0 | | Com_do | 0 | | Com_drop_db | 1 | | Com_drop_function | 0 | | Com_drop_index | 0 | | Com_drop_table | 0 | | Com_drop_user | 0 | | Com_flush | 0 | | Com_grant | 0 | | Com_ha_close | 0 | | Com_ha_open | 0 | | Com_ha_read | 0 | | Com_help | 0 | | Com_insert | 48597009 | | Com_insert_select | 0 | | Com_kill | 0 | | Com_load | 0 | | Com_load_master_data | 0 | | Com_load_master_table | 0 | | Com_lock_tables | 0 | | Com_optimize | 0 | | Com_preload_keys | 0 | | Com_purge | 0 | | Com_purge_before_date | 0 | | Com_rename_table | 0 | | Com_repair | 1 | | Com_replace | 0 | | Com_replace_select | 0 | | Com_reset | 0 | | Com_restore_table | 0 | | Com_revoke | 0 | | Com_revoke_all | 0 | | Com_rollback | 0 | | Com_savepoint | 0 | | Com_select | 808751 | | Com_set_option | 602 | | Com_show_binlog_events | 0 | | Com_show_binlogs | 0 | | Com_show_charsets | 0 | | Com_show_collations | 198 | | Com_show_column_types | 0 | | Com_show_create_db | 0 | | Com_show_create_table | 0 | | Com_show_databases | 0 | | Com_show_errors | 0 | | Com_show_fields | 0 | | Com_show_grants | 0 | | Com_show_innodb_status | 0 | | Com_show_keys | 0 | | Com_show_logs | 0 | | Com_show_master_status | 0 | | Com_show_new_master | 0 | | Com_show_open_tables | 0 | | Com_show_privileges | 0 | | Com_show_processlist | 3 | | Com_show_slave_hosts | 0 | | Com_show_slave_status | 0 | | Com_show_status | 5 | | Com_show_storage_engines | 0 | | Com_show_tables | 0 | | Com_show_variables | 199 | | Com_show_warnings | 0 | | Com_slave_start | 0 | | Com_slave_stop | 0 | | Com_truncate | 0 | | Com_unlock_tables | 0 | | Com_update | 0 | | Com_update_multi | 0 | | Com_prepare_sql | 0 | | Com_execute_sql | 0 | | Com_dealloc_sql | 0 | | Connections | 247 | | Created_tmp_disk_tables | 0 | | Created_tmp_files | 2 | | Created_tmp_tables | 0 | | Delayed_errors | 0 | | Delayed_insert_threads | 0 | | Delayed_writes | 0 | | Flush_commands | 2 | | Handler_commit | 0 | | Handler_delete | 0 | | Handler_read_first | 18 | | Handler_read_key | 810414 | | Handler_read_next | 46199648 | | Handler_read_prev | 0 | | Handler_read_rnd | 543 | | Handler_read_rnd_next | 621772 | | Handler_rollback | 1 | | Handler_update | 0 | | Handler_write | 48626948 | | Handler_discover | 0 | | Key_blocks_not_flushed | 0 | | Key_blocks_used | 115980 | | Key_blocks_unused | 0 | | Key_read_requests | 258563552 | | Key_reads | 740290 | | Key_write_requests | 50696105 | | Key_writes | 29728862 | | Last_query_cost | 1.199000 | | Max_used_connections | 104 | | Not_flushed_delayed_rows | 0 | | Open_files | 69 | | Open_streams | 0 | | Open_tables | 64 | | Opened_tables | 14064 | | Qcache_free_blocks | 0 | | Qcache_free_memory | 0 | | Qcache_hits | 0 | | Qcache_inserts | 0 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 0 | | Qcache_queries_in_cache | 0 | | Qcache_total_blocks | 0 | | Questions | 49442447 | | Rpl_status | NULL | | Select_full_join | 0 | | Select_full_range_join | 0 | | Select_range | 0 | | Select_range_check | 0 | | Select_scan | 5 | | Slave_open_temp_tables | 0 | | Slave_running | OFF | | Slow_launch_threads | 1 | | Slow_queries | 128 | | Sort_merge_passes | 0 | | Sort_range | 24 | | Sort_rows | 543 | | Sort_scan | 0 | | Table_locks_immediate | 29043355 | | Table_locks_waited | 20395028 | | Threads_cached | 0 | | Threads_connected | 1 | | Threads_created | 246 | | Threads_running | 1 | | Uptime | 216558 | +--------------------------+-----------+ 157 rows in set (0.00 sec) [root@oorr search-backup3]# ls -l total 6042568 -rw-r----- 1 mysql mysql 61 Aug 21 01:54 db.opt -rw-r----- 1 mysql mysql 3910298096 Aug 21 16:51 links.MYD -rw-r----- 1 mysql mysql 192686080 Aug 21 17:03 links.MYI -rw-r----- 1 mysql mysql 8664 Aug 21 01:54 links.frm -rw-rw---- 1 mysql mysql 1263708292 Aug 21 17:05 pages.MYD -rw-rw---- 1 mysql mysql 814797824 Aug 22 22:19 pages.MYI -rw-rw---- 1 mysql mysql 8882 Aug 21 17:04 pages.frm [root@oorr search-backup3]#
[23 Aug 2004 22:40]
Woojong Koh
http://147.46.127.204/pages.tar.gz this is a sample table which error occured to. most records are in Korean, so you may can't understand the contents.
[24 Aug 2004 13:50]
MySQL Verification Team
Making full text search to work with non-Western charsets / collations is a new feature that will be done in the future.
[24 Aug 2004 14:36]
Woojong Koh
As of MySQL 4.1.1, full-text searches can be used with most multi-byte character sets. The exception is that for Unicode, the utf8 character set can be used, but not the ucs2 character set. from http://dev.mysql.com/doc/mysql/en/Fulltext_Restrictions.html
[24 Aug 2004 18:42]
MySQL Verification Team
Yes, that is true. Fulltext search works with multi-byte charsets, including utf8 in 4.1. But it works ONLY with western collations and western charsets, including multi-byte ones. This is because word stoppers, like blank and others are defined for western charsets only. A feature to make it work for eastern charsets is already in our WorkLog, but it will not come so soon.
[24 Aug 2004 21:07]
Woojong Koh
it can't be a problem about word stoppers and etc. 'cause we use utf-8 encoding and we also distinguish words by blank. it works successfully when index creation is success. I think it is not a problem about a new feature but a bug about creating index. Can you trace a function stack when the bug occur? I think it is a trivial mistake when a special case. cause It works sucessfully with a small scale table. also match () against () query also works successfully in utf-8 encoding. please reconsider it.
[24 Aug 2004 21:10]
Woojong Koh
and I found this error could be occur to western charset and collation. please check below link. http://lists.mysql.com/mysql/157526
[30 Aug 2004 11:41]
Sergei Golubchik
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bugfix, yourself. More information about accessing the source trees is available at http://www.mysql.com/doc/en/Installing_source_tree.html Additional info: fixed in 4.1.5