Bug #5141 creation error of full text search on UTF-8 encoding
Submitted: 22 Aug 2004 14:53 Modified: 30 Aug 2004 11:41
Reporter: Woojong Koh Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.3,5.0.0,5.0.1 OS:Linux (fedora core 2)
Assigned to: Sergei Golubchik CPU Architecture:Any

[22 Aug 2004 14:53] Woojong Koh
Description:
I'm using full text search for a small search engine.

when I create full text index on small scale records, it works successfully but when on large scale, error occurs.

I'm using UTF-8 encoding for CJK and I tested it in 4.1.3, 5.0.0, 5.0.1.

I created fulltext index by "create fulltext index idx_1 on pages (title,body);" and query is ok. but when I use the table, 

ERROR 1016 (HY000): Can't open file: 'pages.MYI' (errno: 144)

this error occurs.

I used myisamchk for recovery, (options -r, -r -q and etc)

myisamchk: error: 22 when fixing table

this error occurs.

the size of pages.myd is 1.2Gb and the size of pages.myi is 778Mb. I saw the same problem in mysql mailing list. but solution is not replyed.

How to repeat:
the charset must be utf-8.
insert many record over 1gb. (text type column)
and create fulltext index. in 5.0.0, 144 when fixing table error occurs. but in other versions, query is ok but when you use the table, Can't open file error occurs and recover using 'repair table' query or myisamchk is impossible.

If you can't repeat this bug, I can offer the example table files. (somewhat big)
[22 Aug 2004 15:22] Woojong Koh
cf. error message of repair table query

mysql> repair table pages;
+----------------------+--------+----------+-------------------------------------------------+
| Table                | Op     | Msg_type | Msg_text             |
+----------------------+--------+----------+-------------------------------------------------+
| search-backup3.pages | repair | error    | 22 when fixing table             |
| search-backup3.pages | repair | error    | Can't copy datafile-header to tempfile, error 9 |
| search-backup3.pages | repair | status   | Operation failed             |
+----------------------+--------+----------+-------------------------------------------------+
3 rows in set (20 min 37.54 sec)
[23 Aug 2004 17:59] Hartmut Holzgraefe
Can you please add the output of "SHOW VARIABLES", "SHOW STATUS" 
and an "ls -l" of the database data directory?
[23 Aug 2004 22:15] Woojong Koh
mysql> show variables;
+---------------------------------+----------------------------------------+
| Variable_name                   | Value                                  |
+---------------------------------+----------------------------------------+
| back_log                        | 50                                     |
| basedir                         | /usr/local/mysql/                      |
| binlog_cache_size               | 32768                                  |
| bulk_insert_buffer_size         | 8388608                                |
| character_set_client            | utf8                                   |
| character_set_connection        | utf8                                   |
| character_set_database          | utf8                                   |
| character_set_results           | utf8                                   |
| character_set_server            | utf8                                   |
| character_set_system            | utf8                                   |
| character_sets_dir              | /usr/local/mysql/share/mysql/charsets/ |
| collation_connection            | utf8_general_ci                        |
| collation_database              | utf8_general_ci                        |
| collation_server                | utf8_general_ci                        |
| concurrent_insert               | ON                                     |
| connect_timeout                 | 5                                      |
| datadir                         | /home/data/                            |
| date_format                     | %Y-%m-%d                               |
| datetime_format                 | %Y-%m-%d %H:%i:%s                      |
| default_week_format             | 0                                      |
| delay_key_write                 | ON                                     |
| delayed_insert_limit            | 100                                    |
| delayed_insert_timeout          | 300                                    |
| delayed_queue_size              | 1000                                   |
| expire_logs_days                | 0                                      |
| flush                           | OFF                                    |
| flush_time                      | 0                                      |
| ft_boolean_syntax               | + -><()~*:""&|                         |
| ft_max_word_len                 | 84                                     |
| ft_min_word_len                 | 1                                      |
| ft_query_expansion_limit        | 20                                     |
| ft_stopword_file                | /home/data/stopword                    |
| group_concat_max_len            | 1024                                   |
| have_archive                    | NO                                     |
| have_bdb                        | NO                                     |
| have_compress                   | YES                                    |
| have_crypt                      | YES                                    |
| have_innodb                     | YES                                    |
| have_isam                       | NO                                     |
| have_geometry                   | YES                                    |
| have_ndbcluster                 | NO                                     |
| have_openssl                    | NO                                     |
| have_query_cache                | YES                                    |
| have_raid                       | NO                                     |
| have_rtree_keys                 | YES                                    |
| have_symlink                    | YES                                    |
| init_connect                    |                                        |
| init_file                       |                                        |
| init_slave                      |                                        |
| innodb_additional_mem_pool_size | 1048576                                |
| innodb_buffer_pool_awe_mem_mb   | 0                                      |
| innodb_buffer_pool_size         | 8388608                                |
| innodb_data_file_path           | ibdata1:10M:autoextend                 |
| innodb_data_home_dir            |                                        |
| innodb_fast_shutdown            | ON                                     |
| innodb_file_io_threads          | 4                                      |
| innodb_file_per_table           | OFF                                    |
| innodb_flush_log_at_trx_commit  | 1                                      |
| innodb_flush_method             |                                        |
| innodb_force_recovery           | 0                                      |
| innodb_lock_wait_timeout        | 50                                     |
| innodb_log_arch_dir             |                                        |
| innodb_log_archive              | OFF                                    |
| innodb_log_buffer_size          | 1048576                                |
| innodb_log_file_size            | 5242880                                |
| innodb_log_files_in_group       | 2                                      |
| innodb_log_group_home_dir       | ./                                     |
| innodb_max_dirty_pages_pct      | 90                                     |
| innodb_mirrored_log_groups      | 1                                      |
| innodb_open_files               | 300                                    |
| innodb_thread_concurrency       | 8                                      |
| interactive_timeout             | 28800                                  |
| join_buffer_size                | 131072                                 |
| key_buffer_size                 | 134217728                              |
| key_cache_age_threshold         | 300                                    |
| key_cache_block_size            | 1024                                   |
| key_cache_division_limit        | 100                                    |
| language                        | /usr/local/mysql/share/mysql/english/  |
| large_files_support             | ON                                     |
| license                         | GPL                                    |
| local_infile                    | ON                                     |
| locked_in_memory                | OFF                                    |
| log                             | OFF                                    |
| log_bin                         | ON                                     |
| log_error                       |                                        |
| log_slave_updates               | OFF                                    |
| log_slow_queries                | OFF                                    |
| log_update                      | OFF                                    |
| log_warnings                    | 1                                      |
| long_query_time                 | 10                                     |
| low_priority_updates            | OFF                                    |
| lower_case_file_system          | OFF                                    |
| lower_case_table_names          | 0                                      |
| max_allowed_packet              | 3144704                                |
| max_binlog_cache_size           | 4294967295                             |
| max_binlog_size                 | 1073741824                             |
| max_connect_errors              | 99999999                               |
| max_connections                 | 1000                                   |
| max_delayed_threads             | 20                                     |
| max_error_count                 | 64                                     |
| max_heap_table_size             | 16777216                               |
| max_insert_delayed_threads      | 20                                     |
| max_join_size                   | 4294967295                             |
| max_length_for_sort_data        | 1024                                   |
| max_relay_log_size              | 0                                      |
| max_seeks_for_key               | 4294967295                             |
| max_sort_length                 | 1024                                   |
| max_tmp_tables                  | 32                                     |
| max_user_connections            | 0                                      |
| max_write_lock_count            | 4294967295                             |
| myisam_data_pointer_size        | 4                                      |
| myisam_max_extra_sort_file_size | 10737418240                            |
| myisam_max_sort_file_size       | 10737418240                            |
| myisam_recover_options          | OFF                                    |
| myisam_repair_threads           | 1                                      |
| myisam_sort_buffer_size         | 268435456                              |
| net_buffer_length               | 8192                                   |
| net_read_timeout                | 30                                     |
| net_retry_count                 | 10                                     |
| net_write_timeout               | 60                                     |
| new                             | OFF                                    |
| old_passwords                   | OFF                                    |
| open_files_limit                | 5010                                   |
| optimizer_prune_level           | 1                                      |
| optimizer_search_depth          | 62                                     |
| pid_file                        | /home/data/oorr.net.pid                |
| port                            | 3306                                   |
| preload_buffer_size             | 32768                                  |
| protocol_version                | 10                                     |
| query_alloc_block_size          | 8192                                   |
| query_cache_limit               | 1048576                                |
| query_cache_min_res_unit        | 4096                                   |
| query_cache_size                | 0                                      |
| query_cache_type                | ON                                     |
| query_prealloc_size             | 8192                                   |
| range_alloc_block_size          | 2048                                   |
| read_buffer_size                | 258048                                 |
| read_only                       | OFF                                    |
| read_rnd_buffer_size            | 520192                                 |
| relay_log_purge                 | ON                                     |
| rpl_recovery_rank               | 0                                      |
| secure_auth                     | OFF                                    |
| server_id                       | 1                                      |
| skip_external_locking           | ON                                     |
| skip_networking                 | OFF                                    |
| skip_show_database              | OFF                                    |
| slave_net_timeout               | 3600                                   |
| slow_launch_time                | 2                                      |
| socket                          | /tmp/mysql.sock                        |
| sort_buffer_size                | 524280                                 |
| sql_mode                        |                                        |
| sql_updatable_view_key          | YES                                    |
| storage_engine                  | MyISAM                                 |
| sync_binlog                     | 0                                      |
| sync_frm                        | ON                                     |
| system_time_zone                | KST                                    |
| table_cache                     | 64                                     |
| table_type                      | MyISAM                                 |
| thread_cache_size               | 0                                      |
| thread_stack                    | 196608                                 |
| time_format                     | %H:%i:%s                               |
| time_zone                       | SYSTEM                                 |
| tmp_table_size                  | 33554432                               |
| tmpdir                          | /tmp:/home2                            |
| transaction_alloc_block_size    | 8192                                   |
| transaction_prealloc_size       | 4096                                   |
| tx_isolation                    | REPEATABLE-READ                        |
| version                         | 5.0.1-alpha-log                        |
| version_comment                 | Source distribution                    |
| version_compile_machine         | i686                                   |
| version_compile_os              | pc-linux                               |
| wait_timeout                    | 28800                                  |
+---------------------------------+----------------------------------------+
172 rows in set (0.00 sec)

mysql> show status;
+--------------------------+-----------+
| Variable_name            | Value     |
+--------------------------+-----------+
| Aborted_clients          | 52        |
| Aborted_connects         | 0         |
| Binlog_cache_disk_use    | 0         |
| Binlog_cache_use         | 0         |
| Bytes_received           | 661126807 |
| Bytes_sent               | 904602283 |
| Com_admin_commands       | 0         |
| Com_alter_db             | 0         |
| Com_alter_table          | 4         |
| Com_analyze              | 0         |
| Com_backup_table         | 0         |
| Com_begin                | 0         |
| Com_change_db            | 235       |
| Com_change_master        | 0         |
| Com_check                | 0         |
| Com_checksum             | 0         |
| Com_commit               | 0         |
| Com_create_db            | 1         |
| Com_create_function      | 0         |
| Com_create_index         | 0         |
| Com_create_table         | 2         |
| Com_delete               | 0         |
| Com_delete_multi         | 0         |
| Com_do                   | 0         |
| Com_drop_db              | 1         |
| Com_drop_function        | 0         |
| Com_drop_index           | 0         |
| Com_drop_table           | 0         |
| Com_drop_user            | 0         |
| Com_flush                | 0         |
| Com_grant                | 0         |
| Com_ha_close             | 0         |
| Com_ha_open              | 0         |
| Com_ha_read              | 0         |
| Com_help                 | 0         |
| Com_insert               | 48597009  |
| Com_insert_select        | 0         |
| Com_kill                 | 0         |
| Com_load                 | 0         |
| Com_load_master_data     | 0         |
| Com_load_master_table    | 0         |
| Com_lock_tables          | 0         |
| Com_optimize             | 0         |
| Com_preload_keys         | 0         |
| Com_purge                | 0         |
| Com_purge_before_date    | 0         |
| Com_rename_table         | 0         |
| Com_repair               | 1         |
| Com_replace              | 0         |
| Com_replace_select       | 0         |
| Com_reset                | 0         |
| Com_restore_table        | 0         |
| Com_revoke               | 0         |
| Com_revoke_all           | 0         |
| Com_rollback             | 0         |
| Com_savepoint            | 0         |
| Com_select               | 808751    |
| Com_set_option           | 602       |
| Com_show_binlog_events   | 0         |
| Com_show_binlogs         | 0         |
| Com_show_charsets        | 0         |
| Com_show_collations      | 198       |
| Com_show_column_types    | 0         |
| Com_show_create_db       | 0         |
| Com_show_create_table    | 0         |
| Com_show_databases       | 0         |
| Com_show_errors          | 0         |
| Com_show_fields          | 0         |
| Com_show_grants          | 0         |
| Com_show_innodb_status   | 0         |
| Com_show_keys            | 0         |
| Com_show_logs            | 0         |
| Com_show_master_status   | 0         |
| Com_show_new_master      | 0         |
| Com_show_open_tables     | 0         |
| Com_show_privileges      | 0         |
| Com_show_processlist     | 3         |
| Com_show_slave_hosts     | 0         |
| Com_show_slave_status    | 0         |
| Com_show_status          | 5         |
| Com_show_storage_engines | 0         |
| Com_show_tables          | 0         |
| Com_show_variables       | 199       |
| Com_show_warnings        | 0         |
| Com_slave_start          | 0         |
| Com_slave_stop           | 0         |
| Com_truncate             | 0         |
| Com_unlock_tables        | 0         |
| Com_update               | 0         |
| Com_update_multi         | 0         |
| Com_prepare_sql          | 0         |
| Com_execute_sql          | 0         |
| Com_dealloc_sql          | 0         |
| Connections              | 247       |
| Created_tmp_disk_tables  | 0         |
| Created_tmp_files        | 2         |
| Created_tmp_tables       | 0         |
| Delayed_errors           | 0         |
| Delayed_insert_threads   | 0         |
| Delayed_writes           | 0         |
| Flush_commands           | 2         |
| Handler_commit           | 0         |
| Handler_delete           | 0         |
| Handler_read_first       | 18        |
| Handler_read_key         | 810414    |
| Handler_read_next        | 46199648  |
| Handler_read_prev        | 0         |
| Handler_read_rnd         | 543       |
| Handler_read_rnd_next    | 621772    |
| Handler_rollback         | 1         |
| Handler_update           | 0         |
| Handler_write            | 48626948  |
| Handler_discover         | 0         |
| Key_blocks_not_flushed   | 0         |
| Key_blocks_used          | 115980    |
| Key_blocks_unused        | 0         |
| Key_read_requests        | 258563552 |
| Key_reads                | 740290    |
| Key_write_requests       | 50696105  |
| Key_writes               | 29728862  |
| Last_query_cost          | 1.199000  |
| Max_used_connections     | 104       |
| Not_flushed_delayed_rows | 0         |
| Open_files               | 69        |
| Open_streams             | 0         |
| Open_tables              | 64        |
| Opened_tables            | 14064     |
| Qcache_free_blocks       | 0         |
| Qcache_free_memory       | 0         |
| Qcache_hits              | 0         |
| Qcache_inserts           | 0         |
| Qcache_lowmem_prunes     | 0         |
| Qcache_not_cached        | 0         |
| Qcache_queries_in_cache  | 0         |
| Qcache_total_blocks      | 0         |
| Questions                | 49442447  |
| Rpl_status               | NULL      |
| Select_full_join         | 0         |
| Select_full_range_join   | 0         |
| Select_range             | 0         |
| Select_range_check       | 0         |
| Select_scan              | 5         |
| Slave_open_temp_tables   | 0         |
| Slave_running            | OFF       |
| Slow_launch_threads      | 1         |
| Slow_queries             | 128       |
| Sort_merge_passes        | 0         |
| Sort_range               | 24        |
| Sort_rows                | 543       |
| Sort_scan                | 0         |
| Table_locks_immediate    | 29043355  |
| Table_locks_waited       | 20395028  |
| Threads_cached           | 0         |
| Threads_connected        | 1         |
| Threads_created          | 246       |
| Threads_running          | 1         |
| Uptime                   | 216558    |
+--------------------------+-----------+
157 rows in set (0.00 sec)

[root@oorr search-backup3]# ls -l
total 6042568
-rw-r-----  1 mysql mysql         61 Aug 21 01:54 db.opt
-rw-r-----  1 mysql mysql 3910298096 Aug 21 16:51 links.MYD
-rw-r-----  1 mysql mysql  192686080 Aug 21 17:03 links.MYI
-rw-r-----  1 mysql mysql       8664 Aug 21 01:54 links.frm
-rw-rw----  1 mysql mysql 1263708292 Aug 21 17:05 pages.MYD
-rw-rw----  1 mysql mysql  814797824 Aug 22 22:19 pages.MYI
-rw-rw----  1 mysql mysql       8882 Aug 21 17:04 pages.frm
[root@oorr search-backup3]#
[23 Aug 2004 22:40] Woojong Koh
http://147.46.127.204/pages.tar.gz

this is a sample table which error occured to. most records are in Korean, so you may can't understand the contents.
[24 Aug 2004 13:50] MySQL Verification Team
Making full text search to work with non-Western charsets / collations is a new feature that will be done in the future.
[24 Aug 2004 14:36] Woojong Koh
As of MySQL 4.1.1, full-text searches can be used with most multi-byte character sets. The exception is that for Unicode, the utf8 character set can be used, but not the ucs2 character set. 

from http://dev.mysql.com/doc/mysql/en/Fulltext_Restrictions.html
[24 Aug 2004 18:42] MySQL Verification Team
Yes, that is true.

Fulltext search works with multi-byte charsets, including utf8 in 4.1.

But it works ONLY with western collations and western charsets, including multi-byte ones. 

This is because word stoppers, like blank and others are defined for western charsets only. 

A feature to make it work for eastern charsets is already in our WorkLog, but it will
not come so soon.
[24 Aug 2004 21:07] Woojong Koh
it can't be a problem about word stoppers and etc. 'cause we use utf-8 encoding and we also distinguish words by blank. it works successfully when index creation is success. I think it is not a problem about a new feature but a bug about creating index. Can you trace a function stack when the bug occur? I think it is a trivial mistake when a special case. cause It works sucessfully with a small scale table. also match () against () query also works successfully in utf-8 encoding. please reconsider it.
[24 Aug 2004 21:10] Woojong Koh
and I found this error could be occur to western charset and collation.

please check below link.

http://lists.mysql.com/mysql/157526
[30 Aug 2004 11:41] Sergei Golubchik
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

fixed in 4.1.5