Bug #7876 Unable to full-text index utf8 column
Submitted: 13 Jan 2005 14:48 Modified: 15 Jan 2005 12:14
Reporter: digesh kapadia Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S3 (Non-critical)
Version:4.1.7 RPM OS:Linux (Linux 2.6.5-1.327smp / x86_64)
Assigned to: CPU Architecture:Any

[13 Jan 2005 14:48] digesh kapadia
Description:
Server Type: Dedicated MySQL Full-text server with Dual Opteron 64 Bit Processor
with 16GB of memory.

I am trying to full-text index an utf8 column of 1 record its been 16 hours and
still it hasn't created the index. The data content is html code of a webpage.

Some of the variables settings is as follows:

Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
collation_connection | latin1_swedish_ci |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci 
ft_boolean_syntax        | + -><()~*:""&| |
| ft_max_word_len          | 84             |
| ft_min_word_len          | 3              |
| ft_query_expansion_limit | 20             |
| ft_stopword_file         | (built-in)     
key_buffer_size          | 4294963200 |
| key_cache_age_threshold  | 300        |
| key_cache_block_size     | 1024       |
| key_cache_division_limit | 100        |
max_allowed_packet         | 33553408             |
| max_binlog_cache_size      | 18446744073709551615 |
| max_binlog_size            | 1073741824           |
| max_connect_errors         | 10                   |
| max_connections            | 1024                 |
| max_delayed_threads        | 20                   |
| max_error_count            | 64                   |
| max_heap_table_size        | 16777216             |
| max_insert_delayed_threads | 20                   |
| max_join_size              | 18446744073709551615 |
| max_length_for_sort_data   | 1024                 |
| max_relay_log_size         | 0                    |
| max_seeks_for_key          | 18446744073709551615 |
| max_sort_length            | 4194304              |
| max_tmp_tables             | 32                   |
| max_user_connections       | 0                    |
| max_write_lock_count       | 18446744073709551615 |

How to repeat:
In order to test the scenario I created 2 tables as follows:

CREATE TABLE href_test_non_utf8
(
id int(10) unsigned NOT NULL auto_increment,
content text,
PRIMARY KEY  (id),
FULLTEXT KEY ft_content(content)
)ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=1935228928 AVG_ROW_LENGTH=40000;

CREATE TABLE href_test_utf8
(
id int(10) unsigned NOT NULL auto_increment,
content text CHARACTER SET UTF8,
PRIMARY KEY  (id),
FULLTEXT KEY ft_content(content)
)ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=1935228928 AVG_ROW_LENGTH=40000;

The create statement works fine, insert into href_test_utf8 which contains the
utf8 column with full-text key stays in the state of Repair by Sorting.

The full-text index on href_test_non_utf8 table was created with seconds but not the other one.
[15 Jan 2005 12:14] Aleksey Kishkin
duplicate of bug#7707